All of lore.kernel.org
 help / color / mirror / Atom feed
* [PATCH] Move VMEnter and VMExit tracepoints closer to the actual event
@ 2021-05-19 18:23 Stefano De Venuto
  2021-05-20  6:05 ` Dario Faggioli
                   ` (2 more replies)
  0 siblings, 3 replies; 12+ messages in thread
From: Stefano De Venuto @ 2021-05-19 18:23 UTC (permalink / raw)
  To: linux-kernel
  Cc: pbonzini, seanjc, vkuznets, wanpengli, jmattson, x86, hpa, kvm,
	rostedt, y.karadz, Stefano De Venuto, Dario Faggioli

The kvm_entry and kvm_exit tracepoints are still quite far from the
actual VMEnters/VMExits. This means that in a trace we can find host
events after a kvm_entry event and before a kvm_exit one, as in this
example:

           trace.dat:  CPU 0/KVM-4594  [001]  2.167191: kvm_entry:
           trace.dat:  CPU 0/KVM-4594  [001]  2.167192: write_msr: 48, value 0
           trace.dat:  CPU 0/KVM-4594  [001]  2.167192: rcu_utilization: Start context switch
           trace.dat:  CPU 0/KVM-4594  [001]  2.167192: rcu_utilization: End context switch
trace-tumbleweed.dat:     <idle>-0     [000]  2.167196: hrtimer_cancel:
trace-tumbleweed.dat:     <idle>-0     [000]  2.167197: hrtimer_expire_entry:
trace-tumbleweed.dat:     <idle>-0     [000]  2.167201: hrtimer_expire_exit:
trace-tumbleweed.dat:     <idle>-0     [000]  2.167201: hrtimer_start:
           trace.dat:  CPU 0/KVM-4594  [001]  2.167203: read_msr: 48, value 0
           trace.dat:  CPU 0/KVM-4594  [001]  2.167203: write_msr: 48, value 4
           trace.dat:  CPU 0/KVM-4594  [001]  2.167204: kvm_exit: 

This patch moves the tracepoints closer to the events, for both Intel
and AMD, so that a combined host-guest trace will offer a more
realistic representation of what is really happening, as shown here:

           trace.dat:  CPU 0/KVM-2553  [000]  2.190290: write_msr: 48, value 0
           trace.dat:  CPU 0/KVM-2553  [000]  2.190290: rcu_utilization: Start context switch
           trace.dat:  CPU 0/KVM-2553  [000]  2.190290: rcu_utilization: End context switch
           trace.dat:  CPU 0/KVM-2553  [000]  2.190290: kvm_entry:
trace-tumbleweed.dat:     <idle>-0     [000]  2.190290: write_msr:
trace-tumbleweed.dat:     <idle>-0     [000]  2.190290: cpu_idle:
           trace.dat:  CPU 0/KVM-2553  [000]  2.190291: kvm_exit:
           trace.dat:  CPU 0/KVM-2553  [000]  2.190291: read_msr: 48, value 0
           trace.dat:  CPU 0/KVM-2553  [000]  2.190291: write_msr: 48, value 4 

Signed-off-by: Stefano De Venuto <stefano.devenuto99@gmail.com>
Signed-off-by: Dario Faggioli <dfaggioli@suse.com>
---
 arch/x86/kvm/svm/svm.c |  8 ++++----
 arch/x86/kvm/vmx/vmx.c | 10 +++++-----
 2 files changed, 9 insertions(+), 9 deletions(-)

diff --git a/arch/x86/kvm/svm/svm.c b/arch/x86/kvm/svm/svm.c
index 05eca131eaf2..c77d4866e239 100644
--- a/arch/x86/kvm/svm/svm.c
+++ b/arch/x86/kvm/svm/svm.c
@@ -3275,8 +3275,6 @@ static int handle_exit(struct kvm_vcpu *vcpu, fastpath_t exit_fastpath)
 	struct kvm_run *kvm_run = vcpu->run;
 	u32 exit_code = svm->vmcb->control.exit_code;
 
-	trace_kvm_exit(exit_code, vcpu, KVM_ISA_SVM);
-
 	/* SEV-ES guests must use the CR write traps to track CR registers. */
 	if (!sev_es_guest(vcpu->kvm)) {
 		if (!svm_is_intercept(svm, INTERCEPT_CR0_WRITE))
@@ -3707,6 +3705,8 @@ static noinstr void svm_vcpu_enter_exit(struct kvm_vcpu *vcpu)
 
 	kvm_guest_enter_irqoff();
 
+	trace_kvm_entry(vcpu);
+
 	if (sev_es_guest(vcpu->kvm)) {
 		__svm_sev_es_vcpu_run(vmcb_pa);
 	} else {
@@ -3725,6 +3725,8 @@ static noinstr void svm_vcpu_enter_exit(struct kvm_vcpu *vcpu)
 		vmload(__sme_page_pa(sd->save_area));
 	}
 
+	trace_kvm_exit(svm->vmcb->control.exit_code, vcpu, KVM_ISA_SVM);
+
 	kvm_guest_exit_irqoff();
 }
 
@@ -3732,8 +3734,6 @@ static __no_kcsan fastpath_t svm_vcpu_run(struct kvm_vcpu *vcpu)
 {
 	struct vcpu_svm *svm = to_svm(vcpu);
 
-	trace_kvm_entry(vcpu);
-
 	svm->vmcb->save.rax = vcpu->arch.regs[VCPU_REGS_RAX];
 	svm->vmcb->save.rsp = vcpu->arch.regs[VCPU_REGS_RSP];
 	svm->vmcb->save.rip = vcpu->arch.regs[VCPU_REGS_RIP];
diff --git a/arch/x86/kvm/vmx/vmx.c b/arch/x86/kvm/vmx/vmx.c
index 4bceb5ca3a89..33c732101b83 100644
--- a/arch/x86/kvm/vmx/vmx.c
+++ b/arch/x86/kvm/vmx/vmx.c
@@ -6661,6 +6661,8 @@ static noinstr void vmx_vcpu_enter_exit(struct kvm_vcpu *vcpu,
 {
 	kvm_guest_enter_irqoff();
 
+	trace_kvm_entry(vcpu);
+
 	/* L1D Flush includes CPU buffer clear to mitigate MDS */
 	if (static_branch_unlikely(&vmx_l1d_should_flush))
 		vmx_l1d_flush(vcpu);
@@ -6675,6 +6677,9 @@ static noinstr void vmx_vcpu_enter_exit(struct kvm_vcpu *vcpu,
 
 	vcpu->arch.cr2 = native_read_cr2();
 
+	vmx->exit_reason.full = vmcs_read32(VM_EXIT_REASON);
+	trace_kvm_exit(vmx->exit_reason.full, vcpu, KVM_ISA_VMX);
+
 	kvm_guest_exit_irqoff();
 }
 
@@ -6693,8 +6698,6 @@ static fastpath_t vmx_vcpu_run(struct kvm_vcpu *vcpu)
 	if (vmx->emulation_required)
 		return EXIT_FASTPATH_NONE;
 
-	trace_kvm_entry(vcpu);
-
 	if (vmx->ple_window_dirty) {
 		vmx->ple_window_dirty = false;
 		vmcs_write32(PLE_WINDOW, vmx->ple_window);
@@ -6814,15 +6817,12 @@ static fastpath_t vmx_vcpu_run(struct kvm_vcpu *vcpu)
 		return EXIT_FASTPATH_NONE;
 	}
 
-	vmx->exit_reason.full = vmcs_read32(VM_EXIT_REASON);
 	if (unlikely((u16)vmx->exit_reason.basic == EXIT_REASON_MCE_DURING_VMENTRY))
 		kvm_machine_check();
 
 	if (likely(!vmx->exit_reason.failed_vmentry))
 		vmx->idt_vectoring_info = vmcs_read32(IDT_VECTORING_INFO_FIELD);
 
-	trace_kvm_exit(vmx->exit_reason.full, vcpu, KVM_ISA_VMX);
-
 	if (unlikely(vmx->exit_reason.failed_vmentry))
 		return EXIT_FASTPATH_NONE;
 
-- 
2.31.1


^ permalink raw reply related	[flat|nested] 12+ messages in thread

* Re: [PATCH] Move VMEnter and VMExit tracepoints closer to the actual event
  2021-05-19 18:23 [PATCH] Move VMEnter and VMExit tracepoints closer to the actual event Stefano De Venuto
@ 2021-05-20  6:05 ` Dario Faggioli
  2021-05-20  7:21 ` Thomas Gleixner
  2021-05-20 15:32 ` Sean Christopherson
  2 siblings, 0 replies; 12+ messages in thread
From: Dario Faggioli @ 2021-05-20  6:05 UTC (permalink / raw)
  To: Stefano De Venuto, linux-kernel
  Cc: pbonzini, seanjc, vkuznets, wanpengli, jmattson, x86, hpa, kvm,
	rostedt, y.karadz, Borislav Petkov

[-- Attachment #1: Type: text/plain, Size: 880 bytes --]

On Wed, 2021-05-19 at 20:23 +0200, Stefano De Venuto wrote:
> 
> Signed-off-by: Stefano De Venuto <stefano.devenuto99@gmail.com>
> Signed-off-by: Dario Faggioli <dfaggioli@suse.com>
>
So, thanks to Boris, I realized that this is both wrong and
inconsistent (and it's my fault, not Stefano's).

This is how it should be:

Co-developed-by: Dario Faggioli <dfaggioli@suse.com>
Signed-off-by: Dario Faggioli <dfaggioli@suse.com>
Signed-off-by: Stefano De Venuto <stefano.devenuto99@gmail.com>

Of course, we're happy to re-submit with it fixed, if necessary. Just
let us know.

Sorry and Regards
-- 
Dario Faggioli, Ph.D
http://about.me/dario.faggioli
Virtualization Software Engineer
SUSE Labs, SUSE https://www.suse.com/
-------------------------------------------------------------------
<<This happens because _I_ choose it to happen!>> (Raistlin Majere)

[-- Attachment #2: This is a digitally signed message part --]
[-- Type: application/pgp-signature, Size: 833 bytes --]

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [PATCH] Move VMEnter and VMExit tracepoints closer to the actual event
  2021-05-19 18:23 [PATCH] Move VMEnter and VMExit tracepoints closer to the actual event Stefano De Venuto
  2021-05-20  6:05 ` Dario Faggioli
@ 2021-05-20  7:21 ` Thomas Gleixner
  2021-05-21  7:13   ` Dario Faggioli
  2021-05-20 15:32 ` Sean Christopherson
  2 siblings, 1 reply; 12+ messages in thread
From: Thomas Gleixner @ 2021-05-20  7:21 UTC (permalink / raw)
  To: Stefano De Venuto, linux-kernel
  Cc: pbonzini, seanjc, vkuznets, wanpengli, jmattson, x86, hpa, kvm,
	rostedt, y.karadz, Stefano De Venuto, Dario Faggioli

On Wed, May 19 2021 at 20:23, Stefano De Venuto wrote:
> diff --git a/arch/x86/kvm/svm/svm.c b/arch/x86/kvm/svm/svm.c
> index 05eca131eaf2..c77d4866e239 100644
> --- a/arch/x86/kvm/svm/svm.c
> +++ b/arch/x86/kvm/svm/svm.c
> @@ -3275,8 +3275,6 @@ static int handle_exit(struct kvm_vcpu *vcpu, fastpath_t exit_fastpath)
>  	struct kvm_run *kvm_run = vcpu->run;
>  	u32 exit_code = svm->vmcb->control.exit_code;
>  
> -	trace_kvm_exit(exit_code, vcpu, KVM_ISA_SVM);
> -
>  	/* SEV-ES guests must use the CR write traps to track CR registers. */
>  	if (!sev_es_guest(vcpu->kvm)) {
>  		if (!svm_is_intercept(svm, INTERCEPT_CR0_WRITE))
> @@ -3707,6 +3705,8 @@ static noinstr void svm_vcpu_enter_exit(struct kvm_vcpu *vcpu)
>  
>  	kvm_guest_enter_irqoff();
>  
> +	trace_kvm_entry(vcpu);

No. This violates the noinstr rules and will make objtool complain on a
full validation run.

So this wants to be:

+       instrumentation_begin();
+	trace_kvm_entry(vcpu);
+       instrumentation_end();

  	kvm_guest_enter_irqoff();

and on the exit side the trace wants to be post kvm_guest_exit_irqoff().

Thanks,

        tglx

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [PATCH] Move VMEnter and VMExit tracepoints closer to the actual event
  2021-05-19 18:23 [PATCH] Move VMEnter and VMExit tracepoints closer to the actual event Stefano De Venuto
  2021-05-20  6:05 ` Dario Faggioli
  2021-05-20  7:21 ` Thomas Gleixner
@ 2021-05-20 15:32 ` Sean Christopherson
  2021-05-20 16:18   ` Paolo Bonzini
  2021-05-21  7:51   ` Dario Faggioli
  2 siblings, 2 replies; 12+ messages in thread
From: Sean Christopherson @ 2021-05-20 15:32 UTC (permalink / raw)
  To: Stefano De Venuto
  Cc: linux-kernel, pbonzini, vkuznets, wanpengli, jmattson, x86, hpa,
	kvm, rostedt, y.karadz, Dario Faggioli

On Wed, May 19, 2021, Stefano De Venuto wrote:
> The kvm_entry and kvm_exit tracepoints are still quite far from the
> actual VMEnters/VMExits. This means that in a trace we can find host
> events after a kvm_entry event and before a kvm_exit one, as in this
> example:
> 
>            trace.dat:  CPU 0/KVM-4594  [001]  2.167191: kvm_entry:
>            trace.dat:  CPU 0/KVM-4594  [001]  2.167192: write_msr: 48, value 0
>            trace.dat:  CPU 0/KVM-4594  [001]  2.167192: rcu_utilization: Start context switch
>            trace.dat:  CPU 0/KVM-4594  [001]  2.167192: rcu_utilization: End context switch
> trace-tumbleweed.dat:     <idle>-0     [000]  2.167196: hrtimer_cancel:
> trace-tumbleweed.dat:     <idle>-0     [000]  2.167197: hrtimer_expire_entry:
> trace-tumbleweed.dat:     <idle>-0     [000]  2.167201: hrtimer_expire_exit:
> trace-tumbleweed.dat:     <idle>-0     [000]  2.167201: hrtimer_start:
>            trace.dat:  CPU 0/KVM-4594  [001]  2.167203: read_msr: 48, value 0
>            trace.dat:  CPU 0/KVM-4594  [001]  2.167203: write_msr: 48, value 4
>            trace.dat:  CPU 0/KVM-4594  [001]  2.167204: kvm_exit: 
> 
> This patch moves the tracepoints closer to the events, for both Intel
> and AMD, so that a combined host-guest trace will offer a more
> realistic representation of what is really happening, as shown here:
> 
>            trace.dat:  CPU 0/KVM-2553  [000]  2.190290: write_msr: 48, value 0

I'm not sure this is a good thing, as it's not clear to me that invoking tracing
with the guest's SPEC_CTRL loaded is desirable.  Maybe it's a non-issue, but it
should be explicitly called out and discussed.

And to some extent, the current behavior is _more_ accurate because it shows that
KVM started its VM-Enter sequence and then the WRMSR occured as part of that
sequence.  It is writing the guest's value after all.  Ditto for XCR0, XSS, PKRU,
Intel PT, etc...

A more concrete example would be perf; on VMX, if a perf NMI happens after KVM
invokes atomic_switch_perf_msrs() then I absolutely want to see that reflected
in the trace, e.g. to help debug the PEBS mess[*].  If the VM-Enter tracepoint
is moved closer to VM-Enter, that may or may not hold true depending on where the
NMI lands.

On VMX, I think the tracepoint can be moved below the VMWRITEs without much
contention (though doing so is likely a nop), but moving it below
kvm_load_guest_xsave_state() requires a bit more discussion.

I 100% agree that the current behavior can be a bit confusing, but I wonder if
we'd be better off "solving" that problem through documentation.

[*] https://lkml.kernel.org/r/20210209225653.1393771-1-jmattson@google.com

>            trace.dat:  CPU 0/KVM-2553  [000]  2.190290: rcu_utilization: Start context switch
>            trace.dat:  CPU 0/KVM-2553  [000]  2.190290: rcu_utilization: End context switch
>            trace.dat:  CPU 0/KVM-2553  [000]  2.190290: kvm_entry:
> trace-tumbleweed.dat:     <idle>-0     [000]  2.190290: write_msr:
> trace-tumbleweed.dat:     <idle>-0     [000]  2.190290: cpu_idle:
>            trace.dat:  CPU 0/KVM-2553  [000]  2.190291: kvm_exit:
>            trace.dat:  CPU 0/KVM-2553  [000]  2.190291: read_msr: 48, value 0
>            trace.dat:  CPU 0/KVM-2553  [000]  2.190291: write_msr: 48, value 4 
> 
> Signed-off-by: Stefano De Venuto <stefano.devenuto99@gmail.com>
> Signed-off-by: Dario Faggioli <dfaggioli@suse.com>
> ---

...

> diff --git a/arch/x86/kvm/vmx/vmx.c b/arch/x86/kvm/vmx/vmx.c
> index 4bceb5ca3a89..33c732101b83 100644
> --- a/arch/x86/kvm/vmx/vmx.c
> +++ b/arch/x86/kvm/vmx/vmx.c
> @@ -6661,6 +6661,8 @@ static noinstr void vmx_vcpu_enter_exit(struct kvm_vcpu *vcpu,
>  {
>  	kvm_guest_enter_irqoff();
>  
> +	trace_kvm_entry(vcpu);
> +
>  	/* L1D Flush includes CPU buffer clear to mitigate MDS */
>  	if (static_branch_unlikely(&vmx_l1d_should_flush))
>  		vmx_l1d_flush(vcpu);
> @@ -6675,6 +6677,9 @@ static noinstr void vmx_vcpu_enter_exit(struct kvm_vcpu *vcpu,
>  
>  	vcpu->arch.cr2 = native_read_cr2();
>  
> +	vmx->exit_reason.full = vmcs_read32(VM_EXIT_REASON);
> +	trace_kvm_exit(vmx->exit_reason.full, vcpu, KVM_ISA_VMX);

This is wrong in the 'vmx->fail == true' case.

> +
>  	kvm_guest_exit_irqoff();
>  }
>  
> @@ -6693,8 +6698,6 @@ static fastpath_t vmx_vcpu_run(struct kvm_vcpu *vcpu)
>  	if (vmx->emulation_required)
>  		return EXIT_FASTPATH_NONE;
>  
> -	trace_kvm_entry(vcpu);
> -
>  	if (vmx->ple_window_dirty) {
>  		vmx->ple_window_dirty = false;
>  		vmcs_write32(PLE_WINDOW, vmx->ple_window);
> @@ -6814,15 +6817,12 @@ static fastpath_t vmx_vcpu_run(struct kvm_vcpu *vcpu)
>  		return EXIT_FASTPATH_NONE;
>  	}
>  
> -	vmx->exit_reason.full = vmcs_read32(VM_EXIT_REASON);
>  	if (unlikely((u16)vmx->exit_reason.basic == EXIT_REASON_MCE_DURING_VMENTRY))
>  		kvm_machine_check();
>  
>  	if (likely(!vmx->exit_reason.failed_vmentry))
>  		vmx->idt_vectoring_info = vmcs_read32(IDT_VECTORING_INFO_FIELD);
>  
> -	trace_kvm_exit(vmx->exit_reason.full, vcpu, KVM_ISA_VMX);
> -
>  	if (unlikely(vmx->exit_reason.failed_vmentry))
>  		return EXIT_FASTPATH_NONE;
>  
> -- 
> 2.31.1
> 

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [PATCH] Move VMEnter and VMExit tracepoints closer to the actual event
  2021-05-20 15:32 ` Sean Christopherson
@ 2021-05-20 16:18   ` Paolo Bonzini
  2021-05-21  7:58     ` Dario Faggioli
  2021-05-28 16:55     ` Dario Faggioli
  2021-05-21  7:51   ` Dario Faggioli
  1 sibling, 2 replies; 12+ messages in thread
From: Paolo Bonzini @ 2021-05-20 16:18 UTC (permalink / raw)
  To: Sean Christopherson, Stefano De Venuto
  Cc: linux-kernel, vkuznets, wanpengli, jmattson, x86, hpa, kvm,
	rostedt, y.karadz, Dario Faggioli

On 20/05/21 17:32, Sean Christopherson wrote:
> On VMX, I think the tracepoint can be moved below the VMWRITEs without much
> contention (though doing so is likely a nop), but moving it below
> kvm_load_guest_xsave_state() requires a bit more discussion.

Indeed; as a rule of thumb, the tracepoint on SVM could match the 
clgi/stgi region, and on VMX it could be placed in a similar location.

Paolo

> I 100% agree that the current behavior can be a bit confusing, but I wonder if
> we'd be better off "solving" that problem through documentation.


^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [PATCH] Move VMEnter and VMExit tracepoints closer to the actual event
  2021-05-20  7:21 ` Thomas Gleixner
@ 2021-05-21  7:13   ` Dario Faggioli
  0 siblings, 0 replies; 12+ messages in thread
From: Dario Faggioli @ 2021-05-21  7:13 UTC (permalink / raw)
  To: Thomas Gleixner, Stefano De Venuto, linux-kernel
  Cc: pbonzini, seanjc, vkuznets, wanpengli, jmattson, x86, hpa, kvm,
	rostedt, y.karadz

[-- Attachment #1: Type: text/plain, Size: 1200 bytes --]

Hi Thomas, 

And thanks a lot for the review!

On Thu, 2021-05-20 at 09:21 +0200, Thomas Gleixner wrote:
> On Wed, May 19 2021 at 20:23, Stefano De Venuto wrote:
> > diff --git a/arch/x86/kvm/svm/svm.c b/arch/x86/kvm/svm/svm.c
> > index 05eca131eaf2..c77d4866e239 100644
> > --- a/arch/x86/kvm/svm/svm.c
> > +++ b/arch/x86/kvm/svm/svm.c
> > @@ -3707,6 +3705,8 @@ static noinstr void
> > svm_vcpu_enter_exit(struct kvm_vcpu *vcpu)
> >  
> >         kvm_guest_enter_irqoff();
> >  
> > +       trace_kvm_entry(vcpu);
> 
> No. This violates the noinstr rules and will make objtool complain on
> a
> full validation run.
> 
Ok, I see, sorry for not noticing it.

Well, in this specific case --considering others' reviews-- it seems
that the tracepoints will be moved to somewhere else anyway, but we'll
make sure to run all the proper validation steps next time.

Thanks again and Regards
-- 
Dario Faggioli, Ph.D
http://about.me/dario.faggioli
Virtualization Software Engineer
SUSE Labs, SUSE https://www.suse.com/
-------------------------------------------------------------------
<<This happens because _I_ choose it to happen!>> (Raistlin Majere)

[-- Attachment #2: This is a digitally signed message part --]
[-- Type: application/pgp-signature, Size: 833 bytes --]

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [PATCH] Move VMEnter and VMExit tracepoints closer to the actual event
  2021-05-20 15:32 ` Sean Christopherson
  2021-05-20 16:18   ` Paolo Bonzini
@ 2021-05-21  7:51   ` Dario Faggioli
  2021-05-25 16:20     ` Sean Christopherson
  1 sibling, 1 reply; 12+ messages in thread
From: Dario Faggioli @ 2021-05-21  7:51 UTC (permalink / raw)
  To: Sean Christopherson, Stefano De Venuto
  Cc: linux-kernel, pbonzini, vkuznets, wanpengli, jmattson, x86, hpa,
	kvm, rostedt, y.karadz

[-- Attachment #1: Type: text/plain, Size: 3063 bytes --]

Hi Sean,

On Thu, 2021-05-20 at 15:32 +0000, Sean Christopherson wrote:
> On Wed, May 19, 2021, Stefano De Venuto wrote:
> > 
> > This patch moves the tracepoints closer to the events, for both
> > Intel
> > and AMD, so that a combined host-guest trace will offer a more
> > realistic representation of what is really happening, as shown
> > here:
> > 
> >            trace.dat:  CPU 0/KVM-2553  [000]  2.190290: write_msr:
> > 48, value 0
> 
> I'm not sure this is a good thing, as it's not clear to me that
> invoking tracing
> with the guest's SPEC_CTRL loaded is desirable.  Maybe it's a non-
> issue, but it
> should be explicitly called out and discussed.
> 
Oh, this is actually a very good point. Considering the rest of the
review comments, it looks like we're not putting the tracepoint past
the SPEC_CTRL msr-write, not for now at least. :-) But I agree that it
must be explicitly considered and thought about, if/when trying to do
it again.

> And to some extent, the current behavior is _more_ accurate because
> it shows that
> KVM started its VM-Enter sequence and then the WRMSR occured as part
> of that
> sequence.  It is writing the guest's value after all.  Ditto for
> XCR0, XSS, PKRU,
> Intel PT, etc...
> 
Yaah, I guess it comes to what we want/assume the meaning of the
VMEnter/Exit tracepoints to be. I.e., is it the beginning of the
sequence of operations necessary to enter a guest, or the exact point
in time where we switch to it (and vice-versa, for exit)?

In my view and in my experience of trying to trace host and guest at
the same time, I find the latter more useful, but I appreciate that the
former is valid too especially considering that, as you say, some
operations are altering the guest's state already.

> A more concrete example would be perf; on VMX, if a perf NMI happens
> after KVM
> invokes atomic_switch_perf_msrs() then I absolutely want to see that
> reflected
> in the trace, e.g. to help debug the PEBS mess[*].  If the VM-Enter
> tracepoint
> is moved closer to VM-Enter, that may or may not hold true depending
> on where the
> NMI lands.
> 
> On VMX, I think the tracepoint can be moved below the VMWRITEs
> without much
> contention (though doing so is likely a nop), but moving it below
> kvm_load_guest_xsave_state() requires a bit more discussion.
> 
Ok, well, IMO closer is better, even if no past XSAVE state handling.
:-) Let us look into this a little bit.

> I 100% agree that the current behavior can be a bit confusing, but I
> wonder if
> we'd be better off "solving" that problem through documentation.
> 
Indeed. So, do you happen to have in mind what could be the best place
and the best way for documenting this?

Thanks and Regards
-- 
Dario Faggioli, Ph.D
http://about.me/dario.faggioli
Virtualization Software Engineer
SUSE Labs, SUSE https://www.suse.com/
-------------------------------------------------------------------
<<This happens because _I_ choose it to happen!>> (Raistlin Majere)

[-- Attachment #2: This is a digitally signed message part --]
[-- Type: application/pgp-signature, Size: 833 bytes --]

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [PATCH] Move VMEnter and VMExit tracepoints closer to the actual event
  2021-05-20 16:18   ` Paolo Bonzini
@ 2021-05-21  7:58     ` Dario Faggioli
  2021-05-28 16:55     ` Dario Faggioli
  1 sibling, 0 replies; 12+ messages in thread
From: Dario Faggioli @ 2021-05-21  7:58 UTC (permalink / raw)
  To: Paolo Bonzini, Sean Christopherson, Stefano De Venuto
  Cc: linux-kernel, vkuznets, wanpengli, jmattson, x86, hpa, kvm,
	rostedt, y.karadz

[-- Attachment #1: Type: text/plain, Size: 932 bytes --]

Hi!

On Thu, 2021-05-20 at 18:18 +0200, Paolo Bonzini wrote:
> On 20/05/21 17:32, Sean Christopherson wrote:
> > On VMX, I think the tracepoint can be moved below the VMWRITEs
> > without much
> > contention (though doing so is likely a nop), but moving it below
> > kvm_load_guest_xsave_state() requires a bit more discussion.
> 
> Indeed; as a rule of thumb, the tracepoint on SVM could match the 
> clgi/stgi region, and on VMX it could be placed in a similar
> location.
> 
Ok. So, if this is uncontroversial enough, we're more than happy to go
for it... For now. :-)

Let us try it, and see how things end up looking.

Thanks and Regards
-- 
Dario Faggioli, Ph.D
http://about.me/dario.faggioli
Virtualization Software Engineer
SUSE Labs, SUSE https://www.suse.com/
-------------------------------------------------------------------
<<This happens because _I_ choose it to happen!>> (Raistlin Majere)

[-- Attachment #2: This is a digitally signed message part --]
[-- Type: application/pgp-signature, Size: 833 bytes --]

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [PATCH] Move VMEnter and VMExit tracepoints closer to the actual event
  2021-05-21  7:51   ` Dario Faggioli
@ 2021-05-25 16:20     ` Sean Christopherson
  2021-05-28 17:03       ` Dario Faggioli
  0 siblings, 1 reply; 12+ messages in thread
From: Sean Christopherson @ 2021-05-25 16:20 UTC (permalink / raw)
  To: Dario Faggioli
  Cc: Stefano De Venuto, linux-kernel, pbonzini, vkuznets, wanpengli,
	jmattson, x86, hpa, kvm, rostedt, y.karadz

On Fri, May 21, 2021, Dario Faggioli wrote:
> Hi Sean,
> 
> On Thu, 2021-05-20 at 15:32 +0000, Sean Christopherson wrote:
> > I 100% agree that the current behavior can be a bit confusing, but I wonder
> > if we'd be better off "solving" that problem through documentation.
> > 
> Indeed. So, do you happen to have in mind what could be the best place
> and the best way for documenting this?

I didn't have anything in mind, but my gut reaction is to add a new file dedicated
to tracing/tracepoints in KVM, e.g. 

  Documentation/virt/kvm/tracepoints.rst or Documentation/virt/kvm/tracing.rst

I'm sure there are all sorts of tips and tricks people have for using KVM's
tracepoints, it would be nice to provide a way to capture and disseminate them.
My only hesitation is that Documentation/virt/kvm/ might be too formal for what
would effectively be a wiki of sorts.

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [PATCH] Move VMEnter and VMExit tracepoints closer to the actual event
  2021-05-20 16:18   ` Paolo Bonzini
  2021-05-21  7:58     ` Dario Faggioli
@ 2021-05-28 16:55     ` Dario Faggioli
  2021-05-28 16:57       ` Paolo Bonzini
  1 sibling, 1 reply; 12+ messages in thread
From: Dario Faggioli @ 2021-05-28 16:55 UTC (permalink / raw)
  To: Paolo Bonzini, Sean Christopherson, Stefano De Venuto
  Cc: linux-kernel, vkuznets, wanpengli, jmattson, x86, hpa, kvm,
	rostedt, y.karadz

[-- Attachment #1: Type: text/plain, Size: 1965 bytes --]

On Thu, 2021-05-20 at 18:18 +0200, Paolo Bonzini wrote:
> On 20/05/21 17:32, Sean Christopherson wrote:
> > On VMX, I think the tracepoint can be moved below the VMWRITEs
> > without much
> > contention (though doing so is likely a nop), but moving it below
> > kvm_load_guest_xsave_state() requires a bit more discussion.
> 
> Indeed; as a rule of thumb, the tracepoint on SVM could match the 
> clgi/stgi region, and on VMX it could be placed in a similar location.
> 
So, we played a little bit with this and, as envisioned, we can confirm
that moving the tracepoint outside of the xsave handling calls results
in the actual trace looking pretty much the same as it does right now.

Still, I think we should go for it, and we're planning to send a v2 of
this patch that does exactly that. In fact, I think it's still better
to have the tracepoint closer to the actual instruction (provided they
don't end up too close, as we were saying in this thread).

For instance, despite the sequence of events being the same in the
"output", the timestamp of the event that we see in the trace would be
more accurate (although, we're of course talking about very small
differences) and, more importantly, we reduce the chances that more
events creeps in, if tracepoints for them are added in the code between
where the trace_kvm_enter/exit() code are now and where we'd like to
move them.

So, Paolo, just to be sure, when you said "the tracepoint on SVM could
match the clgi/stgi region", did you mean they should be outside of
this region (i.e., trace_kvm_enter() just before clgi() and
trace_kvm_exit() after stgi())? Or vice versa? :-)

Thanks and Regards

Dario
-- 
Dario Faggioli, Ph.D
http://about.me/dario.faggioli
Virtualization Software Engineer
SUSE Labs, SUSE https://www.suse.com/
-------------------------------------------------------------------
<<This happens because _I_ choose it to happen!>> (Raistlin Majere)

[-- Attachment #2: This is a digitally signed message part --]
[-- Type: application/pgp-signature, Size: 833 bytes --]

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [PATCH] Move VMEnter and VMExit tracepoints closer to the actual event
  2021-05-28 16:55     ` Dario Faggioli
@ 2021-05-28 16:57       ` Paolo Bonzini
  0 siblings, 0 replies; 12+ messages in thread
From: Paolo Bonzini @ 2021-05-28 16:57 UTC (permalink / raw)
  To: Dario Faggioli, Sean Christopherson, Stefano De Venuto
  Cc: linux-kernel, vkuznets, wanpengli, jmattson, x86, hpa, kvm,
	rostedt, y.karadz

On 28/05/21 18:55, Dario Faggioli wrote:
> So, Paolo, just to be sure, when you said "the tracepoint on SVM could
> match the clgi/stgi region", did you mean they should be outside of
> this region (i.e., trace_kvm_enter() just before clgi() and
> trace_kvm_exit() after stgi())? Or vice versa?:-)

Just outside, yes.

Paolo


^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [PATCH] Move VMEnter and VMExit tracepoints closer to the actual event
  2021-05-25 16:20     ` Sean Christopherson
@ 2021-05-28 17:03       ` Dario Faggioli
  0 siblings, 0 replies; 12+ messages in thread
From: Dario Faggioli @ 2021-05-28 17:03 UTC (permalink / raw)
  To: Sean Christopherson
  Cc: Stefano De Venuto, linux-kernel, pbonzini, vkuznets, wanpengli,
	jmattson, x86, hpa, kvm, rostedt, y.karadz

[-- Attachment #1: Type: text/plain, Size: 1730 bytes --]

On Tue, 2021-05-25 at 16:20 +0000, Sean Christopherson wrote:
> On Fri, May 21, 2021, Dario Faggioli wrote:
> > > 
> > Indeed. So, do you happen to have in mind what could be the best
> > place
> > and the best way for documenting this?
> 
> I didn't have anything in mind, but my gut reaction is to add a new
> file dedicated
> to tracing/tracepoints in KVM, e.g. 
> 
>   Documentation/virt/kvm/tracepoints.rst or
> Documentation/virt/kvm/tracing.rst
> 
Ok. Well, FWIW, this seems a good idea to me. :-)

> I'm sure there are all sorts of tips and tricks people have for using
> KVM's
> tracepoints, it would be nice to provide a way to capture and
> disseminate them.
> My only hesitation is that Documentation/virt/kvm/ might be too
> formal for what
> would effectively be a wiki of sorts.
> 
Yeah, understand the concerns, I think. However, it seems to me that
how to interpret the kernel KVM tracepoint (i.e., this fact that they
mark the rather the beginning of the "logical" entry and exit sequences
rather than the actual instructions) does belong in the kernel's own
documentation, i.e., where you proposed above.

Surely when we'll have something like that, it seems natural that we'd
want to have more stuff there, and we'll have to judge what's best
suited for it and what should perhaps be somewhere else... But I think
it's worth a try, and I probably will try to put something together.

Thanks and Regards
-- 
Dario Faggioli, Ph.D
http://about.me/dario.faggioli
Virtualization Software Engineer
SUSE Labs, SUSE https://www.suse.com/
-------------------------------------------------------------------
<<This happens because _I_ choose it to happen!>> (Raistlin Majere)

[-- Attachment #2: This is a digitally signed message part --]
[-- Type: application/pgp-signature, Size: 833 bytes --]

^ permalink raw reply	[flat|nested] 12+ messages in thread

end of thread, other threads:[~2021-05-28 17:03 UTC | newest]

Thread overview: 12+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2021-05-19 18:23 [PATCH] Move VMEnter and VMExit tracepoints closer to the actual event Stefano De Venuto
2021-05-20  6:05 ` Dario Faggioli
2021-05-20  7:21 ` Thomas Gleixner
2021-05-21  7:13   ` Dario Faggioli
2021-05-20 15:32 ` Sean Christopherson
2021-05-20 16:18   ` Paolo Bonzini
2021-05-21  7:58     ` Dario Faggioli
2021-05-28 16:55     ` Dario Faggioli
2021-05-28 16:57       ` Paolo Bonzini
2021-05-21  7:51   ` Dario Faggioli
2021-05-25 16:20     ` Sean Christopherson
2021-05-28 17:03       ` Dario Faggioli

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.