* [RFC PATCH v2 0/3] KVM: VMX: Fix VM entry failure on PT_MODE_HOST_GUEST while host is using PT @ 2022-09-21 16:45 Xiaoyao Li 2022-09-21 16:45 ` [RFC PATCH v2 1/3] perf/core: Expose perf_event_{en,dis}able_local() Xiaoyao Li ` (2 more replies) 0 siblings, 3 replies; 19+ messages in thread From: Xiaoyao Li @ 2022-09-21 16:45 UTC (permalink / raw) To: Peter Zijlstra, Ingo Molnar, Arnaldo Carvalho de Melo, Mark Rutland, Alexander Shishkin, Jiri Olsa, Namhyung Kim, Sean Christopherson, Paolo Bonzini, wei.w.wang, kan.liang Cc: xiaoyao.li, linux-perf-users, linux-kernel, kvm There is one bug in KVM that can hit vm-entry failure 100% on platform supporting PT_MODE_HOST_GUEST mode following below steps: 1. #modprobe -r kvm_intel 2. #modprobe kvm_intel pt_mode=1 3. start a VM with QEMU 4. on host: #perf record -e intel_pt// The vm-entry failure happens because it violates the requirement stated in Intel SDM 26.2.1.1 VM-Execution Control Fields If the logical processor is operating with Intel PT enabled (if IA32_RTIT_CTL.TraceEn = 1) at the time of VM entry, the "load IA32_RTIT_CTL" VM-entry control must be 0. On PT_MODE_HOST_GUEST node, vm-entry load RTIT is always set. Thus KVM needs to ensure IA32_RTIT_CTL.TraceEn is 0 before VM-entry. Currently KVM manually WRMSR(IA32_RTIT_CTL) to clear TraceEn bit. However, it doesn't work everytime since there is a posibility that IA32_RTIT_CTL.TraceEn is re-enabled in PT PMI handler before vm-entry. This series tries to fix the issue by exposing and calling perf driver API to stop host PT event (if any) before vm-entry and resume PT event after vm-exit. Perf API can prevent PT PMI handler from re-enabling PT. By the way, drop the save/restore of PT MSRs of host because the resume of PT event after vm-exit doesn't rely on the previous value of PT MSRs. Changes in v1: - Export perf_event_{en,dis}able_local() and pt_get_curr_event() for KVM to stop/resume PT event; (Suggested-by Wang, Wei W <wei.w.wang@intel.com>) - Drop the save/restore of host PT MSRs. v1: https://lore.kernel.org/all/20220825085625.867763-1-xiaoyao.li@intel.com/ Xiaoyao Li (3): perf/core: Expose perf_event_{en,dis}able_local() perf/x86/intel/pt: Introduce and export pt_get_curr_event() KVM: VMX: Stop/resume host PT before/after VMX transition when PT_MODE_HOST_GUEST arch/x86/events/intel/pt.c | 8 ++++++++ arch/x86/include/asm/perf_event.h | 2 ++ arch/x86/kvm/vmx/vmx.c | 31 +++++++++++++------------------ arch/x86/kvm/vmx/vmx.h | 2 +- include/linux/perf_event.h | 1 + kernel/events/core.c | 7 +++++++ 6 files changed, 32 insertions(+), 19 deletions(-) -- 2.27.0 ^ permalink raw reply [flat|nested] 19+ messages in thread
* [RFC PATCH v2 1/3] perf/core: Expose perf_event_{en,dis}able_local() 2022-09-21 16:45 [RFC PATCH v2 0/3] KVM: VMX: Fix VM entry failure on PT_MODE_HOST_GUEST while host is using PT Xiaoyao Li @ 2022-09-21 16:45 ` Xiaoyao Li 2022-09-22 12:16 ` Wang, Wei W 2022-09-21 16:45 ` [RFC PATCH v2 2/3] perf/x86/intel/pt: Introduce and export pt_get_curr_event() Xiaoyao Li 2022-09-21 16:45 ` [RFC PATCH v2 3/3] KVM: VMX: Stop/resume host PT before/after VMX transition when PT_MODE_HOST_GUEST Xiaoyao Li 2 siblings, 1 reply; 19+ messages in thread From: Xiaoyao Li @ 2022-09-21 16:45 UTC (permalink / raw) To: Peter Zijlstra, Ingo Molnar, Arnaldo Carvalho de Melo, Mark Rutland, Alexander Shishkin, Jiri Olsa, Namhyung Kim, Sean Christopherson, Paolo Bonzini, wei.w.wang, kan.liang Cc: xiaoyao.li, linux-perf-users, linux-kernel, kvm KVM needs them to disable/enable an Intel PT perf event before vm-entry/after vm-exit. Signed-off-by: Xiaoyao Li <xiaoyao.li@intel.com> --- include/linux/perf_event.h | 1 + kernel/events/core.c | 7 +++++++ 2 files changed, 8 insertions(+) diff --git a/include/linux/perf_event.h b/include/linux/perf_event.h index ee8b9ecdc03b..fc5f3952d6a2 100644 --- a/include/linux/perf_event.h +++ b/include/linux/perf_event.h @@ -1472,6 +1472,7 @@ extern int perf_swevent_get_recursion_context(void); extern void perf_swevent_put_recursion_context(int rctx); extern u64 perf_swevent_set_period(struct perf_event *event); extern void perf_event_enable(struct perf_event *event); +extern void perf_event_enable_local(struct perf_event *event); extern void perf_event_disable(struct perf_event *event); extern void perf_event_disable_local(struct perf_event *event); extern void perf_event_disable_inatomic(struct perf_event *event); diff --git a/kernel/events/core.c b/kernel/events/core.c index 2621fd24ad26..8324bb99c6bf 100644 --- a/kernel/events/core.c +++ b/kernel/events/core.c @@ -2446,6 +2446,7 @@ void perf_event_disable_local(struct perf_event *event) { event_function_local(event, __perf_event_disable, NULL); } +EXPORT_SYMBOL_GPL(perf_event_disable_local); /* * Strictly speaking kernel users cannot create groups and therefore this @@ -2984,6 +2985,12 @@ static void _perf_event_enable(struct perf_event *event) event_function_call(event, __perf_event_enable, NULL); } +void perf_event_enable_local(struct perf_event *event) +{ + event_function_local(event, __perf_event_enable, NULL); +} +EXPORT_SYMBOL_GPL(perf_event_enable_local); + /* * See perf_event_disable(); */ -- 2.27.0 ^ permalink raw reply related [flat|nested] 19+ messages in thread
* RE: [RFC PATCH v2 1/3] perf/core: Expose perf_event_{en,dis}able_local() 2022-09-21 16:45 ` [RFC PATCH v2 1/3] perf/core: Expose perf_event_{en,dis}able_local() Xiaoyao Li @ 2022-09-22 12:16 ` Wang, Wei W 0 siblings, 0 replies; 19+ messages in thread From: Wang, Wei W @ 2022-09-22 12:16 UTC (permalink / raw) To: Li, Xiaoyao, Peter Zijlstra, Ingo Molnar, Arnaldo Carvalho de Melo, Mark Rutland, Alexander Shishkin, Jiri Olsa, Namhyung Kim, Christopherson,, Sean, Paolo Bonzini, kan.liang Cc: linux-perf-users, linux-kernel, kvm On Thursday, September 22, 2022 12:45 AM, Li, Xiaoyao wrote: > KVM needs them to disable/enable an Intel PT perf event before vm-entry/after > vm-exit. I would explain more in the commit here: Export perf_event_disable_local and perf_event_enable_local for perf users to disable and enable perf events on the current local CPU. One usage is in PT virtualization by KVM: - before VMEnter to guest, KVM calls perf_event_disable_local to disable the host PT event running on the current CPU, and this reuses the PT driver to save the related h/w states. - after VMExit to host, KVM calls perf_event_enable_local to resume the host PT event on the current CPU by having the PT driver load the previously saved states into h/w. > > Signed-off-by: Xiaoyao Li <xiaoyao.li@intel.com> > --- > include/linux/perf_event.h | 1 + > kernel/events/core.c | 7 +++++++ > 2 files changed, 8 insertions(+) > > diff --git a/include/linux/perf_event.h b/include/linux/perf_event.h index > ee8b9ecdc03b..fc5f3952d6a2 100644 > --- a/include/linux/perf_event.h > +++ b/include/linux/perf_event.h > @@ -1472,6 +1472,7 @@ extern int > perf_swevent_get_recursion_context(void); > extern void perf_swevent_put_recursion_context(int rctx); extern u64 > perf_swevent_set_period(struct perf_event *event); extern void > perf_event_enable(struct perf_event *event); > +extern void perf_event_enable_local(struct perf_event *event); > extern void perf_event_disable(struct perf_event *event); extern void > perf_event_disable_local(struct perf_event *event); extern void > perf_event_disable_inatomic(struct perf_event *event); diff --git > a/kernel/events/core.c b/kernel/events/core.c index > 2621fd24ad26..8324bb99c6bf 100644 > --- a/kernel/events/core.c > +++ b/kernel/events/core.c > @@ -2446,6 +2446,7 @@ void perf_event_disable_local(struct perf_event > *event) { > event_function_local(event, __perf_event_disable, NULL); } > +EXPORT_SYMBOL_GPL(perf_event_disable_local); > > /* > * Strictly speaking kernel users cannot create groups and therefore this @@ > -2984,6 +2985,12 @@ static void _perf_event_enable(struct perf_event *event) > event_function_call(event, __perf_event_enable, NULL); } > > +void perf_event_enable_local(struct perf_event *event) { > + event_function_local(event, __perf_event_enable, NULL); } > +EXPORT_SYMBOL_GPL(perf_event_enable_local); > + > /* > * See perf_event_disable(); > */ > -- > 2.27.0 ^ permalink raw reply [flat|nested] 19+ messages in thread
* [RFC PATCH v2 2/3] perf/x86/intel/pt: Introduce and export pt_get_curr_event() 2022-09-21 16:45 [RFC PATCH v2 0/3] KVM: VMX: Fix VM entry failure on PT_MODE_HOST_GUEST while host is using PT Xiaoyao Li 2022-09-21 16:45 ` [RFC PATCH v2 1/3] perf/core: Expose perf_event_{en,dis}able_local() Xiaoyao Li @ 2022-09-21 16:45 ` Xiaoyao Li 2022-09-22 5:14 ` Xiaoyao Li 2022-09-22 12:33 ` Liang, Kan 2022-09-21 16:45 ` [RFC PATCH v2 3/3] KVM: VMX: Stop/resume host PT before/after VMX transition when PT_MODE_HOST_GUEST Xiaoyao Li 2 siblings, 2 replies; 19+ messages in thread From: Xiaoyao Li @ 2022-09-21 16:45 UTC (permalink / raw) To: Peter Zijlstra, Ingo Molnar, Arnaldo Carvalho de Melo, Mark Rutland, Alexander Shishkin, Jiri Olsa, Namhyung Kim, Sean Christopherson, Paolo Bonzini, wei.w.wang, kan.liang Cc: xiaoyao.li, linux-perf-users, linux-kernel, kvm KVM supports PT_MODE_HOST_GUEST mode for Intel PT where host and guest have separate Intel PT configurations and they work independently. Currently, in this mode, when both host and guest enable PT, KVM manually clears MSR_IA32_RTIT_CTL.TRACEEN to disable host PT so that it can context switch the other PT MSRs. However, PT PMI can be delivered after this point and before the VM-entry. As a result, the re-enabling of PT leads to VM-entry failure of guest. To solve the problem, introduce and export pt_get_curr_event() for KVM to get current pt event. Along with perf_event_{dis, en}able_local(), With them, KVM can avoid PT re-enabling in PT PMI handler. Signed-off-by: Xiaoyao Li <xiaoyao.li@intel.com> --- arch/x86/events/intel/pt.c | 8 ++++++++ arch/x86/include/asm/perf_event.h | 2 ++ 2 files changed, 10 insertions(+) diff --git a/arch/x86/events/intel/pt.c b/arch/x86/events/intel/pt.c index 82ef87e9a897..62bfc45c11c9 100644 --- a/arch/x86/events/intel/pt.c +++ b/arch/x86/events/intel/pt.c @@ -1624,6 +1624,14 @@ static void pt_event_stop(struct perf_event *event, int mode) } } +struct perf_event *pt_get_curr_event(void) +{ + struct pt *pt = this_cpu_ptr(&pt_ctx); + + return pt->handle.event; +} +EXPORT_SYMBOL_GPL(pt_get_curr_event); + static long pt_event_snapshot_aux(struct perf_event *event, struct perf_output_handle *handle, unsigned long size) diff --git a/arch/x86/include/asm/perf_event.h b/arch/x86/include/asm/perf_event.h index f6fc8dd51ef4..7c3533392cf5 100644 --- a/arch/x86/include/asm/perf_event.h +++ b/arch/x86/include/asm/perf_event.h @@ -553,11 +553,13 @@ static inline int x86_perf_get_lbr(struct x86_pmu_lbr *lbr) #ifdef CONFIG_CPU_SUP_INTEL extern void intel_pt_handle_vmx(int on); + extern struct perf_event *pt_get_curr_event(void); #else static inline void intel_pt_handle_vmx(int on) { } +struct perf_event *pt_get_curr_event(void) { return NULL; } #endif #if defined(CONFIG_PERF_EVENTS) && defined(CONFIG_CPU_SUP_AMD) -- 2.27.0 ^ permalink raw reply related [flat|nested] 19+ messages in thread
* Re: [RFC PATCH v2 2/3] perf/x86/intel/pt: Introduce and export pt_get_curr_event() 2022-09-21 16:45 ` [RFC PATCH v2 2/3] perf/x86/intel/pt: Introduce and export pt_get_curr_event() Xiaoyao Li @ 2022-09-22 5:14 ` Xiaoyao Li 2022-09-22 12:33 ` Liang, Kan 1 sibling, 0 replies; 19+ messages in thread From: Xiaoyao Li @ 2022-09-22 5:14 UTC (permalink / raw) To: Peter Zijlstra, Ingo Molnar, Arnaldo Carvalho de Melo, Mark Rutland, Alexander Shishkin, Jiri Olsa, Namhyung Kim, Sean Christopherson, Paolo Bonzini, wei.w.wang, kan.liang Cc: linux-perf-users, linux-kernel, kvm On 9/22/2022 12:45 AM, Xiaoyao Li wrote: > KVM supports PT_MODE_HOST_GUEST mode for Intel PT where host and guest > have separate Intel PT configurations and they work independently. > > Currently, in this mode, when both host and guest enable PT, KVM manually > clears MSR_IA32_RTIT_CTL.TRACEEN to disable host PT so that it can > context switch the other PT MSRs. However, PT PMI can be delivered after > this point and before the VM-entry. As a result, the re-enabling of PT > leads to VM-entry failure of guest. > > To solve the problem, introduce and export pt_get_curr_event() for KVM > to get current pt event. Along with perf_event_{dis, en}able_local(), > With them, KVM can avoid PT re-enabling in PT PMI handler. > > Signed-off-by: Xiaoyao Li <xiaoyao.li@intel.com> > --- > arch/x86/events/intel/pt.c | 8 ++++++++ > arch/x86/include/asm/perf_event.h | 2 ++ > 2 files changed, 10 insertions(+) > > diff --git a/arch/x86/events/intel/pt.c b/arch/x86/events/intel/pt.c > index 82ef87e9a897..62bfc45c11c9 100644 > --- a/arch/x86/events/intel/pt.c > +++ b/arch/x86/events/intel/pt.c > @@ -1624,6 +1624,14 @@ static void pt_event_stop(struct perf_event *event, int mode) > } > } > > +struct perf_event *pt_get_curr_event(void) > +{ > + struct pt *pt = this_cpu_ptr(&pt_ctx); > + > + return pt->handle.event; > +} > +EXPORT_SYMBOL_GPL(pt_get_curr_event); > + > static long pt_event_snapshot_aux(struct perf_event *event, > struct perf_output_handle *handle, > unsigned long size) > diff --git a/arch/x86/include/asm/perf_event.h b/arch/x86/include/asm/perf_event.h > index f6fc8dd51ef4..7c3533392cf5 100644 > --- a/arch/x86/include/asm/perf_event.h > +++ b/arch/x86/include/asm/perf_event.h > @@ -553,11 +553,13 @@ static inline int x86_perf_get_lbr(struct x86_pmu_lbr *lbr) > > #ifdef CONFIG_CPU_SUP_INTEL > extern void intel_pt_handle_vmx(int on); > + extern struct perf_event *pt_get_curr_event(void); > #else > static inline void intel_pt_handle_vmx(int on) > { > > } > +struct perf_event *pt_get_curr_event(void) { return NULL; } My fault, this needs to be static inline ... > #endif > > #if defined(CONFIG_PERF_EVENTS) && defined(CONFIG_CPU_SUP_AMD) ^ permalink raw reply [flat|nested] 19+ messages in thread
* Re: [RFC PATCH v2 2/3] perf/x86/intel/pt: Introduce and export pt_get_curr_event() 2022-09-21 16:45 ` [RFC PATCH v2 2/3] perf/x86/intel/pt: Introduce and export pt_get_curr_event() Xiaoyao Li 2022-09-22 5:14 ` Xiaoyao Li @ 2022-09-22 12:33 ` Liang, Kan 2022-09-22 12:58 ` Wang, Wei W 1 sibling, 1 reply; 19+ messages in thread From: Liang, Kan @ 2022-09-22 12:33 UTC (permalink / raw) To: Xiaoyao Li, Peter Zijlstra, Ingo Molnar, Arnaldo Carvalho de Melo, Mark Rutland, Alexander Shishkin, Jiri Olsa, Namhyung Kim, Sean Christopherson, Paolo Bonzini, wei.w.wang Cc: linux-perf-users, linux-kernel, kvm On 2022-09-21 12:45 p.m., Xiaoyao Li wrote: > KVM supports PT_MODE_HOST_GUEST mode for Intel PT where host and guest > have separate Intel PT configurations and they work independently. > > Currently, in this mode, when both host and guest enable PT, KVM manually > clears MSR_IA32_RTIT_CTL.TRACEEN to disable host PT so that it can > context switch the other PT MSRs. However, PT PMI can be delivered after > this point and before the VM-entry. As a result, the re-enabling of PT > leads to VM-entry failure of guest. > > To solve the problem, introduce and export pt_get_curr_event() for KVM > to get current pt event. I don't think the current pt event is created by KVM. IIUC, the patch basically expose the event created by the other user to KVM. That doesn't sounds correct. I think it should be perf's responsibility to decide which events should be disabled, and which MSRs should be switched. Because only perf can see all the events. The users should only see the events they created. Thanks, Kan > Along with perf_event_{dis, en}able_local(), > With them, KVM can avoid PT re-enabling in PT PMI handler. > > Signed-off-by: Xiaoyao Li <xiaoyao.li@intel.com> > --- > arch/x86/events/intel/pt.c | 8 ++++++++ > arch/x86/include/asm/perf_event.h | 2 ++ > 2 files changed, 10 insertions(+) > > diff --git a/arch/x86/events/intel/pt.c b/arch/x86/events/intel/pt.c > index 82ef87e9a897..62bfc45c11c9 100644 > --- a/arch/x86/events/intel/pt.c > +++ b/arch/x86/events/intel/pt.c > @@ -1624,6 +1624,14 @@ static void pt_event_stop(struct perf_event *event, int mode) > } > } > > +struct perf_event *pt_get_curr_event(void) > +{ > + struct pt *pt = this_cpu_ptr(&pt_ctx); > + > + return pt->handle.event; > +} > +EXPORT_SYMBOL_GPL(pt_get_curr_event); > + > static long pt_event_snapshot_aux(struct perf_event *event, > struct perf_output_handle *handle, > unsigned long size) > diff --git a/arch/x86/include/asm/perf_event.h b/arch/x86/include/asm/perf_event.h > index f6fc8dd51ef4..7c3533392cf5 100644 > --- a/arch/x86/include/asm/perf_event.h > +++ b/arch/x86/include/asm/perf_event.h > @@ -553,11 +553,13 @@ static inline int x86_perf_get_lbr(struct x86_pmu_lbr *lbr) > > #ifdef CONFIG_CPU_SUP_INTEL > extern void intel_pt_handle_vmx(int on); > + extern struct perf_event *pt_get_curr_event(void); > #else > static inline void intel_pt_handle_vmx(int on) > { > > } > +struct perf_event *pt_get_curr_event(void) { return NULL; } > #endif > > #if defined(CONFIG_PERF_EVENTS) && defined(CONFIG_CPU_SUP_AMD) ^ permalink raw reply [flat|nested] 19+ messages in thread
* RE: [RFC PATCH v2 2/3] perf/x86/intel/pt: Introduce and export pt_get_curr_event() 2022-09-22 12:33 ` Liang, Kan @ 2022-09-22 12:58 ` Wang, Wei W 2022-09-22 13:34 ` Peter Zijlstra 2022-09-26 15:32 ` Liang, Kan 0 siblings, 2 replies; 19+ messages in thread From: Wang, Wei W @ 2022-09-22 12:58 UTC (permalink / raw) To: Liang, Kan, Li, Xiaoyao, Peter Zijlstra, Ingo Molnar, Arnaldo Carvalho de Melo, Mark Rutland, Alexander Shishkin, Jiri Olsa, Namhyung Kim, Christopherson,, Sean, Paolo Bonzini Cc: linux-perf-users, linux-kernel, kvm On Thursday, September 22, 2022 8:34 PM, Liang, Kan wrote: > > To solve the problem, introduce and export pt_get_curr_event() for KVM > > to get current pt event. > > I don't think the current pt event is created by KVM. IIUC, the patch basically > expose the event created by the other user to KVM. That doesn't sounds > correct. Yes, that's the host PT event running on the current CPU. Not created by KVM. > > I think it should be perf's responsibility to decide which events should be > disabled, and which MSRs should be switched. Because only perf can see all the > events. The users should only see the events they created. For other pmu cases, yes. For PT, its management is simpler than other pmu resources and PT PMU is much simpler. It doesn't have a scheduler to manage events. I think the usage here is similar to the CPU thread scheduling case. When we have CPUs isolated from the CPU scheduler, i.e. no scheduler for those CPUs, basically we rely on users to ping tasks to those CPUs (e.g. taskset). For the commit log, probably we don't need those KVM details here. Simplify a bit: Add a function to expose the current running PT event to users. One usage is in KVM, it needs to get and disable the running host PT event before VMEnter to the guest and resumes the event after VMexit to host. ^ permalink raw reply [flat|nested] 19+ messages in thread
* Re: [RFC PATCH v2 2/3] perf/x86/intel/pt: Introduce and export pt_get_curr_event() 2022-09-22 12:58 ` Wang, Wei W @ 2022-09-22 13:34 ` Peter Zijlstra 2022-09-22 13:59 ` Wang, Wei W 2022-09-26 15:32 ` Liang, Kan 1 sibling, 1 reply; 19+ messages in thread From: Peter Zijlstra @ 2022-09-22 13:34 UTC (permalink / raw) To: Wang, Wei W Cc: Liang, Kan, Li, Xiaoyao, Ingo Molnar, Arnaldo Carvalho de Melo, Mark Rutland, Alexander Shishkin, Jiri Olsa, Namhyung Kim, Christopherson,, Sean, Paolo Bonzini, linux-perf-users, linux-kernel, kvm On Thu, Sep 22, 2022 at 12:58:49PM +0000, Wang, Wei W wrote: > Add a function to expose the current running PT event to users. One usage is in KVM, > it needs to get and disable the running host PT event before VMEnter to the guest and > resumes the event after VMexit to host. You cannot just kill a host event like that. If there is a host event, the guest looses out. ^ permalink raw reply [flat|nested] 19+ messages in thread
* RE: [RFC PATCH v2 2/3] perf/x86/intel/pt: Introduce and export pt_get_curr_event() 2022-09-22 13:34 ` Peter Zijlstra @ 2022-09-22 13:59 ` Wang, Wei W 2022-09-22 14:09 ` Peter Zijlstra 0 siblings, 1 reply; 19+ messages in thread From: Wang, Wei W @ 2022-09-22 13:59 UTC (permalink / raw) To: Peter Zijlstra Cc: Liang, Kan, Li, Xiaoyao, Ingo Molnar, Arnaldo Carvalho de Melo, Mark Rutland, Alexander Shishkin, Jiri Olsa, Namhyung Kim, Christopherson,, Sean, Paolo Bonzini, linux-perf-users, linux-kernel, kvm On Thursday, September 22, 2022 9:35 PM, Peter Zijlstra > On Thu, Sep 22, 2022 at 12:58:49PM +0000, Wang, Wei W wrote: > > > Add a function to expose the current running PT event to users. One > > usage is in KVM, it needs to get and disable the running host PT event > > before VMEnter to the guest and resumes the event after VMexit to host. > > You cannot just kill a host event like that. If there is a host event, the guest > looses out. OK. The intention was to pause the event (that only profiles host info) when switching to guest, and resume when switching back to host. I don't think it is proper to directly disable it via clear MSR_IA32_RTIT_CTL, e.g. there may be dangling PT bit in the interrupt status register, this invokes the PT interrupt handler to re-enable it (for more details about the issue please see patch 3 commit). Would you have any suggestions to pause it? Thanks, Wei ^ permalink raw reply [flat|nested] 19+ messages in thread
* Re: [RFC PATCH v2 2/3] perf/x86/intel/pt: Introduce and export pt_get_curr_event() 2022-09-22 13:59 ` Wang, Wei W @ 2022-09-22 14:09 ` Peter Zijlstra 2022-09-22 14:42 ` Wang, Wei W 0 siblings, 1 reply; 19+ messages in thread From: Peter Zijlstra @ 2022-09-22 14:09 UTC (permalink / raw) To: Wang, Wei W Cc: Liang, Kan, Li, Xiaoyao, Ingo Molnar, Arnaldo Carvalho de Melo, Mark Rutland, Alexander Shishkin, Jiri Olsa, Namhyung Kim, Christopherson,, Sean, Paolo Bonzini, linux-perf-users, linux-kernel, kvm On Thu, Sep 22, 2022 at 01:59:53PM +0000, Wang, Wei W wrote: > On Thursday, September 22, 2022 9:35 PM, Peter Zijlstra > > On Thu, Sep 22, 2022 at 12:58:49PM +0000, Wang, Wei W wrote: > > > > > Add a function to expose the current running PT event to users. One > > > usage is in KVM, it needs to get and disable the running host PT event > > > before VMEnter to the guest and resumes the event after VMexit to host. > > > > You cannot just kill a host event like that. If there is a host event, the guest > > looses out. > > OK. The intention was to pause the event (that only profiles host info) when switching to guest, > and resume when switching back to host. If the even doesn't profile guest context, then yes. If it does profile guest context, you can't. ^ permalink raw reply [flat|nested] 19+ messages in thread
* RE: [RFC PATCH v2 2/3] perf/x86/intel/pt: Introduce and export pt_get_curr_event() 2022-09-22 14:09 ` Peter Zijlstra @ 2022-09-22 14:42 ` Wang, Wei W 2022-09-26 15:48 ` Liang, Kan 0 siblings, 1 reply; 19+ messages in thread From: Wang, Wei W @ 2022-09-22 14:42 UTC (permalink / raw) To: Peter Zijlstra Cc: Liang, Kan, Li, Xiaoyao, Ingo Molnar, Arnaldo Carvalho de Melo, Mark Rutland, Alexander Shishkin, Jiri Olsa, Namhyung Kim, Christopherson,, Sean, Paolo Bonzini, linux-perf-users, linux-kernel, kvm On Thursday, September 22, 2022 10:10 PM, Peter Zijlstra wrote: > On Thu, Sep 22, 2022 at 01:59:53PM +0000, Wang, Wei W wrote: > > On Thursday, September 22, 2022 9:35 PM, Peter Zijlstra > > > On Thu, Sep 22, 2022 at 12:58:49PM +0000, Wang, Wei W wrote: > > > > > > > Add a function to expose the current running PT event to users. > > > > One usage is in KVM, it needs to get and disable the running host > > > > PT event before VMEnter to the guest and resumes the event after > VMexit to host. > > > > > > You cannot just kill a host event like that. If there is a host > > > event, the guest looses out. > > > > OK. The intention was to pause the event (that only profiles host > > info) when switching to guest, and resume when switching back to host. > > If the even doesn't profile guest context, then yes. If it does profile guest > context, you can't. Seems better to add this one: +int perf_event_disable_local_exclude_guest(struct perf_event *event) +{ + struct perf_event_attr *attr = &event->attr; + + if (!attr->exclude_guest) + return -EPERM; + + event_function_local(event, __perf_event_disable, NULL); + + return 0; +} +EXPORT_SYMBOL_GPL(perf_event_disable_local_exclude_guest); ^ permalink raw reply [flat|nested] 19+ messages in thread
* Re: [RFC PATCH v2 2/3] perf/x86/intel/pt: Introduce and export pt_get_curr_event() 2022-09-22 14:42 ` Wang, Wei W @ 2022-09-26 15:48 ` Liang, Kan 2022-09-26 17:24 ` Jim Mattson 0 siblings, 1 reply; 19+ messages in thread From: Liang, Kan @ 2022-09-26 15:48 UTC (permalink / raw) To: Wang, Wei W, Peter Zijlstra Cc: Li, Xiaoyao, Ingo Molnar, Arnaldo Carvalho de Melo, Mark Rutland, Alexander Shishkin, Jiri Olsa, Namhyung Kim, Christopherson,, Sean, Paolo Bonzini, linux-perf-users, linux-kernel, kvm On 2022-09-22 10:42 a.m., Wang, Wei W wrote: > On Thursday, September 22, 2022 10:10 PM, Peter Zijlstra wrote: >> On Thu, Sep 22, 2022 at 01:59:53PM +0000, Wang, Wei W wrote: >>> On Thursday, September 22, 2022 9:35 PM, Peter Zijlstra >>>> On Thu, Sep 22, 2022 at 12:58:49PM +0000, Wang, Wei W wrote: >>>> >>>>> Add a function to expose the current running PT event to users. >>>>> One usage is in KVM, it needs to get and disable the running host >>>>> PT event before VMEnter to the guest and resumes the event after >> VMexit to host. >>>> >>>> You cannot just kill a host event like that. If there is a host >>>> event, the guest looses out. >>> >>> OK. The intention was to pause the event (that only profiles host >>> info) when switching to guest, and resume when switching back to host. >> >> If the even doesn't profile guest context, then yes. If it does profile guest >> context, you can't. > > Seems better to add this one: If the guest host mode is enabled, I think the PT driver should not allow the perf tool to create a host event with !exclude_guest. Thanks, Kan > > +int perf_event_disable_local_exclude_guest(struct perf_event *event) > +{ > + struct perf_event_attr *attr = &event->attr; > + > + if (!attr->exclude_guest) > + return -EPERM; > + > + event_function_local(event, __perf_event_disable, NULL); > + > + return 0; > +} > +EXPORT_SYMBOL_GPL(perf_event_disable_local_exclude_guest); > ^ permalink raw reply [flat|nested] 19+ messages in thread
* Re: [RFC PATCH v2 2/3] perf/x86/intel/pt: Introduce and export pt_get_curr_event() 2022-09-26 15:48 ` Liang, Kan @ 2022-09-26 17:24 ` Jim Mattson 2022-09-26 18:08 ` Liang, Kan 0 siblings, 1 reply; 19+ messages in thread From: Jim Mattson @ 2022-09-26 17:24 UTC (permalink / raw) To: Liang, Kan Cc: Wang, Wei W, Peter Zijlstra, Li, Xiaoyao, Ingo Molnar, Arnaldo Carvalho de Melo, Mark Rutland, Alexander Shishkin, Jiri Olsa, Namhyung Kim, Christopherson,, Sean, Paolo Bonzini, linux-perf-users, linux-kernel, kvm On Mon, Sep 26, 2022 at 9:55 AM Liang, Kan <kan.liang@linux.intel.com> wrote: > > > > On 2022-09-22 10:42 a.m., Wang, Wei W wrote: > > On Thursday, September 22, 2022 10:10 PM, Peter Zijlstra wrote: > >> On Thu, Sep 22, 2022 at 01:59:53PM +0000, Wang, Wei W wrote: > >>> On Thursday, September 22, 2022 9:35 PM, Peter Zijlstra > >>>> On Thu, Sep 22, 2022 at 12:58:49PM +0000, Wang, Wei W wrote: > >>>> > >>>>> Add a function to expose the current running PT event to users. > >>>>> One usage is in KVM, it needs to get and disable the running host > >>>>> PT event before VMEnter to the guest and resumes the event after > >> VMexit to host. > >>>> > >>>> You cannot just kill a host event like that. If there is a host > >>>> event, the guest looses out. > >>> > >>> OK. The intention was to pause the event (that only profiles host > >>> info) when switching to guest, and resume when switching back to host. > >> > >> If the even doesn't profile guest context, then yes. If it does profile guest > >> context, you can't. > > > > Seems better to add this one: > > If the guest host mode is enabled, I think the PT driver should not > allow the perf tool to create a host event with !exclude_guest. While I agree that guest events should generally have priority over host events, this is not consistent with the way "normal" PMU events are handled. ^ permalink raw reply [flat|nested] 19+ messages in thread
* Re: [RFC PATCH v2 2/3] perf/x86/intel/pt: Introduce and export pt_get_curr_event() 2022-09-26 17:24 ` Jim Mattson @ 2022-09-26 18:08 ` Liang, Kan 2022-09-27 14:27 ` Wang, Wei W 0 siblings, 1 reply; 19+ messages in thread From: Liang, Kan @ 2022-09-26 18:08 UTC (permalink / raw) To: Jim Mattson Cc: Wang, Wei W, Peter Zijlstra, Li, Xiaoyao, Ingo Molnar, Arnaldo Carvalho de Melo, Mark Rutland, Alexander Shishkin, Jiri Olsa, Namhyung Kim, Christopherson,, Sean, Paolo Bonzini, linux-perf-users, linux-kernel, kvm On 2022-09-26 1:24 p.m., Jim Mattson wrote: > On Mon, Sep 26, 2022 at 9:55 AM Liang, Kan <kan.liang@linux.intel.com> wrote: >> >> >> >> On 2022-09-22 10:42 a.m., Wang, Wei W wrote: >>> On Thursday, September 22, 2022 10:10 PM, Peter Zijlstra wrote: >>>> On Thu, Sep 22, 2022 at 01:59:53PM +0000, Wang, Wei W wrote: >>>>> On Thursday, September 22, 2022 9:35 PM, Peter Zijlstra >>>>>> On Thu, Sep 22, 2022 at 12:58:49PM +0000, Wang, Wei W wrote: >>>>>> >>>>>>> Add a function to expose the current running PT event to users. >>>>>>> One usage is in KVM, it needs to get and disable the running host >>>>>>> PT event before VMEnter to the guest and resumes the event after >>>> VMexit to host. >>>>>> >>>>>> You cannot just kill a host event like that. If there is a host >>>>>> event, the guest looses out. >>>>> >>>>> OK. The intention was to pause the event (that only profiles host >>>>> info) when switching to guest, and resume when switching back to host. >>>> >>>> If the even doesn't profile guest context, then yes. If it does profile guest >>>> context, you can't. >>> >>> Seems better to add this one: >> >> If the guest host mode is enabled, I think the PT driver should not >> allow the perf tool to create a host event with !exclude_guest. > > While I agree that guest events should generally have priority over > host events, this is not consistent with the way "normal" PMU events > are handled. Only when the two events try to access a resource at the same time, we have to decide whether priority an event or share between events. But I don't think this is the case here. From my understanding of the host-guest mode, the host PT event never traces the guest no matter whether the guest enables PT. So when VM-entry, there is only a guest PT event or no event. If so, I think the perf tool should warn the user if they try to create a host event with !exclude_guest, since the host event never traces a guest. Thanks, Kan ^ permalink raw reply [flat|nested] 19+ messages in thread
* RE: [RFC PATCH v2 2/3] perf/x86/intel/pt: Introduce and export pt_get_curr_event() 2022-09-26 18:08 ` Liang, Kan @ 2022-09-27 14:27 ` Wang, Wei W 2022-09-27 16:52 ` Liang, Kan 0 siblings, 1 reply; 19+ messages in thread From: Wang, Wei W @ 2022-09-27 14:27 UTC (permalink / raw) To: Liang, Kan, Jim Mattson Cc: Peter Zijlstra, Li, Xiaoyao, Ingo Molnar, Arnaldo Carvalho de Melo, Mark Rutland, Alexander Shishkin, Jiri Olsa, Namhyung Kim, Christopherson,, Sean, Paolo Bonzini, linux-perf-users, linux-kernel, kvm On Tuesday, September 27, 2022 2:09 AM, Liang, Kan wrote: > From my understanding of the host-guest mode, the host PT event never > traces the guest no matter whether the guest enables PT. So when VM-entry, > there is only a guest PT event or no event. If so, I think the perf tool should > warn the user if they try to create a host event with !exclude_guest, since the > host event never traces a guest. Probably not from the perf side. It's actually an issue of the KVM's pt_mode: How can KVM prevent host from profiling the guest when in host-guest mode? Just a warning from the perf side wouldn't be enough. I think that's a wrong assumption from KVM side. I had a plan to fix this one. Exposing the host event can tell KVM if the host is profiling the guest or not (i.e. host-guest is allowed). ^ permalink raw reply [flat|nested] 19+ messages in thread
* Re: [RFC PATCH v2 2/3] perf/x86/intel/pt: Introduce and export pt_get_curr_event() 2022-09-27 14:27 ` Wang, Wei W @ 2022-09-27 16:52 ` Liang, Kan 0 siblings, 0 replies; 19+ messages in thread From: Liang, Kan @ 2022-09-27 16:52 UTC (permalink / raw) To: Wang, Wei W, Jim Mattson Cc: Peter Zijlstra, Li, Xiaoyao, Ingo Molnar, Arnaldo Carvalho de Melo, Mark Rutland, Alexander Shishkin, Jiri Olsa, Namhyung Kim, Christopherson,, Sean, Paolo Bonzini, linux-perf-users, linux-kernel, kvm On 2022-09-27 10:27 a.m., Wang, Wei W wrote: > On Tuesday, September 27, 2022 2:09 AM, Liang, Kan wrote: >> From my understanding of the host-guest mode, the host PT event never >> traces the guest no matter whether the guest enables PT. So when VM-entry, >> there is only a guest PT event or no event. If so, I think the perf tool should >> warn the user if they try to create a host event with !exclude_guest, since the >> host event never traces a guest. > > Probably not from the perf side. It's actually an issue of the KVM's pt_mode: > How can KVM prevent host from profiling the guest when in host-guest mode? I don't think it's KVM's job to prevent host from profiling the guest. However, the current KVM implementation implicitly disables the guest profiling from the host. If I understand correct, this patch series just change it to an explicit way (Just for kernel. Perf tool still doesn't know about it.). There is no substantial change. I think it should be PT driver who decide which event should be scheduled. So the first step is to let the PT driver understand the host-guest mode. Then if a perf user in the host tries to profile with !exclude_guest with the host-guest mode, the pt driver can deny the request. The perf tool than throw a warning, e.g., "The VMM is running with host-guest mode. Perf cannot profile guest. Please remove the guest profiling setting or switch to the host-only mode.". > Just a warning from the perf side wouldn't be enough. I think that's a wrong > assumption from KVM side. I had a plan to fix this one. Exposing the host event > can tell KVM if the host is profiling the guest or not (i.e. host-guest is allowed). I don't think KVM should know such information. Thanks, Kan ^ permalink raw reply [flat|nested] 19+ messages in thread
* Re: [RFC PATCH v2 2/3] perf/x86/intel/pt: Introduce and export pt_get_curr_event() 2022-09-22 12:58 ` Wang, Wei W 2022-09-22 13:34 ` Peter Zijlstra @ 2022-09-26 15:32 ` Liang, Kan 1 sibling, 0 replies; 19+ messages in thread From: Liang, Kan @ 2022-09-26 15:32 UTC (permalink / raw) To: Wang, Wei W, Li, Xiaoyao, Peter Zijlstra, Ingo Molnar, Arnaldo Carvalho de Melo, Mark Rutland, Alexander Shishkin, Jiri Olsa, Namhyung Kim, Christopherson,, Sean, Paolo Bonzini Cc: linux-perf-users, linux-kernel, kvm On 2022-09-22 8:58 a.m., Wang, Wei W wrote: > On Thursday, September 22, 2022 8:34 PM, Liang, Kan wrote: >>> To solve the problem, introduce and export pt_get_curr_event() for KVM >>> to get current pt event. >> >> I don't think the current pt event is created by KVM. IIUC, the patch basically >> expose the event created by the other user to KVM. That doesn't sounds >> correct. > > Yes, that's the host PT event running on the current CPU. Not created by KVM. > >> >> I think it should be perf's responsibility to decide which events should be >> disabled, and which MSRs should be switched. Because only perf can see all the >> events. The users should only see the events they created. > > For other pmu cases, yes. For PT, its management is simpler than other pmu > resources and PT PMU is much simpler. It doesn't have a scheduler to manage > events. > Right, but I think we'd better to create a simpler scheduler (just for two events) in the PT driver, since you have two co-existing PT events now, one is from the host and the other is from the guest. I don't think it's KVM's responsibility to schedule events. So I think the process should be something as below. - Let KVM create a PT event if the guest request one. - In VM-entry, just invoke the perf_event_enable_local(guest). The PT driver should schedule out the host event or whatever events and schedule in the guest event. - In VM-exit, just invoke the perf_event_disable_local(guest). The PT driver should schedule in the host event or whatever events and schedule out the guest event. I still don't think we want to expose the host event. Thanks, Kan > I think the usage here is similar to the CPU thread scheduling case. When we have > CPUs isolated from the CPU scheduler, i.e. no scheduler for those CPUs, basically > we rely on users to ping tasks to those CPUs (e.g. taskset). > > For the commit log, probably we don't need those KVM details here. Simplify a bit: > > Add a function to expose the current running PT event to users. One usage is in KVM, > it needs to get and disable the running host PT event before VMEnter to the guest and > resumes the event after VMexit to host. ^ permalink raw reply [flat|nested] 19+ messages in thread
* [RFC PATCH v2 3/3] KVM: VMX: Stop/resume host PT before/after VMX transition when PT_MODE_HOST_GUEST 2022-09-21 16:45 [RFC PATCH v2 0/3] KVM: VMX: Fix VM entry failure on PT_MODE_HOST_GUEST while host is using PT Xiaoyao Li 2022-09-21 16:45 ` [RFC PATCH v2 1/3] perf/core: Expose perf_event_{en,dis}able_local() Xiaoyao Li 2022-09-21 16:45 ` [RFC PATCH v2 2/3] perf/x86/intel/pt: Introduce and export pt_get_curr_event() Xiaoyao Li @ 2022-09-21 16:45 ` Xiaoyao Li 2022-09-22 12:34 ` Wang, Wei W 2 siblings, 1 reply; 19+ messages in thread From: Xiaoyao Li @ 2022-09-21 16:45 UTC (permalink / raw) To: Peter Zijlstra, Ingo Molnar, Arnaldo Carvalho de Melo, Mark Rutland, Alexander Shishkin, Jiri Olsa, Namhyung Kim, Sean Christopherson, Paolo Bonzini, wei.w.wang, kan.liang Cc: xiaoyao.li, linux-perf-users, linux-kernel, kvm Current implementation in pt_guest_enter() has two issues when pt mode is PT_MODE_HOST_GUEST. 1. It relies on VM_ENTRY_LOAD_IA32_RTIT_CTL to disable host's Intel PT for the case that host enables PT while guest not. However, it causes VM entry failure due to violating the requirement stated in SDM "VM-Execution Control Fields" If the logical processor is operating with Intel PT enabled (if IA32_RTIT_CTL.TraceEn = 1) at the time of VM entry, the "load IA32_RTIT_CTL" VM-entry control must be 0. 2. In the case both host and guest enable Intel PT, it disables host's Intel PT by manually clearing MSR_IA32_RTIT_CTL for the purpose to context switch host and guest's PT configurations. However, PT PMI can be delivered later and before VM entry. In the PT PMI handler, it will a) update the host PT MSRs which leads to what KVM stores in vmx->pt_desc.host becomes stale, and b) re-enable Intel PT which leads to VM entry failure as #1. To fix the above two issues, 1) grab and store host PT perf event and disable/enable host PT before vm-enter/ after vm-exit. 2) drop host pt_ctx and the logic to save/restore host PT MSRs since host PT driver doesn't rely on the previous value of PT MSR, i.e., the re-enabling of PT event after VM-exit re-initializes all the PT MSRs that it cares. Signed-off-by: Xiaoyao Li <xiaoyao.li@intel.com> --- arch/x86/kvm/vmx/vmx.c | 31 +++++++++++++------------------ arch/x86/kvm/vmx/vmx.h | 2 +- 2 files changed, 14 insertions(+), 19 deletions(-) diff --git a/arch/x86/kvm/vmx/vmx.c b/arch/x86/kvm/vmx/vmx.c index c9b49a09e6b5..df1a16264bb6 100644 --- a/arch/x86/kvm/vmx/vmx.c +++ b/arch/x86/kvm/vmx/vmx.c @@ -1124,37 +1124,32 @@ static inline void pt_save_msr(struct pt_ctx *ctx, u32 addr_range) static void pt_guest_enter(struct vcpu_vmx *vmx) { + struct perf_event *event; + if (vmx_pt_mode_is_system()) return; - /* - * GUEST_IA32_RTIT_CTL is already set in the VMCS. - * Save host state before VM entry. - */ - rdmsrl(MSR_IA32_RTIT_CTL, vmx->pt_desc.host.ctl); - if (vmx->pt_desc.guest.ctl & RTIT_CTL_TRACEEN) { - wrmsrl(MSR_IA32_RTIT_CTL, 0); - pt_save_msr(&vmx->pt_desc.host, vmx->pt_desc.num_address_ranges); + event = pt_get_curr_event(); + if (event) + perf_event_disable_local(event); + vmx->pt_desc.host_event = event; + + if (vmx->pt_desc.guest.ctl & RTIT_CTL_TRACEEN) pt_load_msr(&vmx->pt_desc.guest, vmx->pt_desc.num_address_ranges); - } } static void pt_guest_exit(struct vcpu_vmx *vmx) { + struct perf_event *event = vmx->pt_desc.host_event; + if (vmx_pt_mode_is_system()) return; - if (vmx->pt_desc.guest.ctl & RTIT_CTL_TRACEEN) { + if (vmx->pt_desc.guest.ctl & RTIT_CTL_TRACEEN) pt_save_msr(&vmx->pt_desc.guest, vmx->pt_desc.num_address_ranges); - pt_load_msr(&vmx->pt_desc.host, vmx->pt_desc.num_address_ranges); - } - /* - * KVM requires VM_EXIT_CLEAR_IA32_RTIT_CTL to expose PT to the guest, - * i.e. RTIT_CTL is always cleared on VM-Exit. Restore it if necessary. - */ - if (vmx->pt_desc.host.ctl) - wrmsrl(MSR_IA32_RTIT_CTL, vmx->pt_desc.host.ctl); + if (event) + perf_event_enable_local(event); } void vmx_set_host_fs_gs(struct vmcs_host_state *host, u16 fs_sel, u16 gs_sel, diff --git a/arch/x86/kvm/vmx/vmx.h b/arch/x86/kvm/vmx/vmx.h index 24d58c2ffaa3..4c20bdabc85b 100644 --- a/arch/x86/kvm/vmx/vmx.h +++ b/arch/x86/kvm/vmx/vmx.h @@ -66,7 +66,7 @@ struct pt_desc { u64 ctl_bitmask; u32 num_address_ranges; u32 caps[PT_CPUID_REGS_NUM * PT_CPUID_LEAVES]; - struct pt_ctx host; + struct perf_event *host_event; struct pt_ctx guest; }; -- 2.27.0 ^ permalink raw reply related [flat|nested] 19+ messages in thread
* RE: [RFC PATCH v2 3/3] KVM: VMX: Stop/resume host PT before/after VMX transition when PT_MODE_HOST_GUEST 2022-09-21 16:45 ` [RFC PATCH v2 3/3] KVM: VMX: Stop/resume host PT before/after VMX transition when PT_MODE_HOST_GUEST Xiaoyao Li @ 2022-09-22 12:34 ` Wang, Wei W 0 siblings, 0 replies; 19+ messages in thread From: Wang, Wei W @ 2022-09-22 12:34 UTC (permalink / raw) To: Li, Xiaoyao, Peter Zijlstra, Ingo Molnar, Arnaldo Carvalho de Melo, Mark Rutland, Alexander Shishkin, Jiri Olsa, Namhyung Kim, Christopherson,, Sean, Paolo Bonzini, kan.liang Cc: linux-perf-users, linux-kernel, kvm On Thursday, September 22, 2022 12:45 AM, Li, Xiaoyao wrote: > Current implementation in pt_guest_enter() has two issues when pt mode is > PT_MODE_HOST_GUEST. > > 1. It relies on VM_ENTRY_LOAD_IA32_RTIT_CTL to disable host's Intel PT > for the case that host enables PT while guest not. > > However, it causes VM entry failure due to violating the requirement > stated in SDM "VM-Execution Control Fields" > > If the logical processor is operating with Intel PT enabled (if > IA32_RTIT_CTL.TraceEn = 1) at the time of VM entry, the "load > IA32_RTIT_CTL" VM-entry control must be 0. > > 2. In the case both host and guest enable Intel PT, it disables host's > Intel PT by manually clearing MSR_IA32_RTIT_CTL for the purpose to > context switch host and guest's PT configurations. > > However, PT PMI can be delivered later and before VM entry. In the PT > PMI handler, it will a) update the host PT MSRs which leads to what KVM > stores in vmx->pt_desc.host becomes stale, and b) re-enable Intel PT > which leads to VM entry failure as #1. > > To fix the above two issues, 1) grab and store host PT perf event and > disable/enable host PT before vm-enter/ after vm-exit. 2) drop host pt_ctx and > the logic to save/restore host PT MSRs since host PT driver doesn't rely on the > previous value of PT MSR, i.e., the re-enabling of PT event after VM-exit > re-initializes all the PT MSRs that it cares. I would also re-write the commit: Current KVM implementation directly modifies the hardware states when it wants to stop or start a host PT event on VMX transitions. This is not proper because: 1) host perf event is well managed by the perf subsystem. Getting it stopped/started needs to go through the perf subsystem to update the related metadata (e.g. event state). 2) Simply modifying the MSR_IA32_RTIT_CTL isn't a complete way to disable the host PT event, as there may be special cases, for example, a dangling PT bit in the interrupt status register after PT has been stopped may cause the PT interrupt handler to enable PT while KVM assumes PT has been disabled after it clears MSR_IA32_RTIT_CTL (for more details, please check SDM "VM-Execution Control Fields" section for PT related requirements). Not properly handling such cases can result in VMEntry failures. Those have already been properly handled by the PT driver (i.e. pt_event_stop) called from the perf core. 3) stop/start a host PT event needs to save/restore the related MSR states for an event switching. This has also been properly supported by the PT driver. It is an extra burden for KVM to maintain another version of function to do such save/restore. For those reasons, change KVM to get the running host PT event from the PT driver, and call perf_event_disable/enable_local to disable/enable the host PT event running on the CPU. This will reuse perf and PT driver to switch in/out the host PT event, with proper management of the perf event. > > Signed-off-by: Xiaoyao Li <xiaoyao.li@intel.com> > --- > arch/x86/kvm/vmx/vmx.c | 31 +++++++++++++------------------ > arch/x86/kvm/vmx/vmx.h | 2 +- > 2 files changed, 14 insertions(+), 19 deletions(-) > > diff --git a/arch/x86/kvm/vmx/vmx.c b/arch/x86/kvm/vmx/vmx.c index > c9b49a09e6b5..df1a16264bb6 100644 > --- a/arch/x86/kvm/vmx/vmx.c > +++ b/arch/x86/kvm/vmx/vmx.c > @@ -1124,37 +1124,32 @@ static inline void pt_save_msr(struct pt_ctx *ctx, > u32 addr_range) > > static void pt_guest_enter(struct vcpu_vmx *vmx) { > + struct perf_event *event; > + > if (vmx_pt_mode_is_system()) > return; > > - /* > - * GUEST_IA32_RTIT_CTL is already set in the VMCS. > - * Save host state before VM entry. > - */ > - rdmsrl(MSR_IA32_RTIT_CTL, vmx->pt_desc.host.ctl); > - if (vmx->pt_desc.guest.ctl & RTIT_CTL_TRACEEN) { > - wrmsrl(MSR_IA32_RTIT_CTL, 0); > - pt_save_msr(&vmx->pt_desc.host, > vmx->pt_desc.num_address_ranges); > + event = pt_get_curr_event(); > + if (event) > + perf_event_disable_local(event); > + vmx->pt_desc.host_event = event; > + > + if (vmx->pt_desc.guest.ctl & RTIT_CTL_TRACEEN) > pt_load_msr(&vmx->pt_desc.guest, > vmx->pt_desc.num_address_ranges); > - } > } > > static void pt_guest_exit(struct vcpu_vmx *vmx) { > + struct perf_event *event = vmx->pt_desc.host_event; > + > if (vmx_pt_mode_is_system()) > return; > > - if (vmx->pt_desc.guest.ctl & RTIT_CTL_TRACEEN) { > + if (vmx->pt_desc.guest.ctl & RTIT_CTL_TRACEEN) > pt_save_msr(&vmx->pt_desc.guest, > vmx->pt_desc.num_address_ranges); As they are only used for guest msrs now, probably we can rename them to pt_save_guest_msrs. Also for the load side. ^ permalink raw reply [flat|nested] 19+ messages in thread
end of thread, other threads:[~2022-09-27 16:54 UTC | newest] Thread overview: 19+ messages (download: mbox.gz / follow: Atom feed) -- links below jump to the message on this page -- 2022-09-21 16:45 [RFC PATCH v2 0/3] KVM: VMX: Fix VM entry failure on PT_MODE_HOST_GUEST while host is using PT Xiaoyao Li 2022-09-21 16:45 ` [RFC PATCH v2 1/3] perf/core: Expose perf_event_{en,dis}able_local() Xiaoyao Li 2022-09-22 12:16 ` Wang, Wei W 2022-09-21 16:45 ` [RFC PATCH v2 2/3] perf/x86/intel/pt: Introduce and export pt_get_curr_event() Xiaoyao Li 2022-09-22 5:14 ` Xiaoyao Li 2022-09-22 12:33 ` Liang, Kan 2022-09-22 12:58 ` Wang, Wei W 2022-09-22 13:34 ` Peter Zijlstra 2022-09-22 13:59 ` Wang, Wei W 2022-09-22 14:09 ` Peter Zijlstra 2022-09-22 14:42 ` Wang, Wei W 2022-09-26 15:48 ` Liang, Kan 2022-09-26 17:24 ` Jim Mattson 2022-09-26 18:08 ` Liang, Kan 2022-09-27 14:27 ` Wang, Wei W 2022-09-27 16:52 ` Liang, Kan 2022-09-26 15:32 ` Liang, Kan 2022-09-21 16:45 ` [RFC PATCH v2 3/3] KVM: VMX: Stop/resume host PT before/after VMX transition when PT_MODE_HOST_GUEST Xiaoyao Li 2022-09-22 12:34 ` Wang, Wei W
This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox; as well as URLs for NNTP newsgroup(s).