All of lore.kernel.org
 help / color / mirror / Atom feed
From: Sean Christopherson <seanjc@google.com>
To: Stefano De Venuto <stefano.devenuto99@gmail.com>
Cc: linux-kernel@vger.kernel.org, pbonzini@redhat.com,
	vkuznets@redhat.com, wanpengli@tencent.com, jmattson@google.com,
	x86@kernel.org, hpa@zytor.com, kvm@vger.kernel.org,
	rostedt@goodmis.org, y.karadz@gmail.com,
	Dario Faggioli <dfaggioli@suse.com>
Subject: Re: [PATCH] Move VMEnter and VMExit tracepoints closer to the actual event
Date: Thu, 20 May 2021 15:32:34 +0000	[thread overview]
Message-ID: <YKaBEn6oUXaVAb0K@google.com> (raw)
In-Reply-To: <20210519182303.2790-1-stefano.devenuto99@gmail.com>

On Wed, May 19, 2021, Stefano De Venuto wrote:
> The kvm_entry and kvm_exit tracepoints are still quite far from the
> actual VMEnters/VMExits. This means that in a trace we can find host
> events after a kvm_entry event and before a kvm_exit one, as in this
> example:
> 
>            trace.dat:  CPU 0/KVM-4594  [001]  2.167191: kvm_entry:
>            trace.dat:  CPU 0/KVM-4594  [001]  2.167192: write_msr: 48, value 0
>            trace.dat:  CPU 0/KVM-4594  [001]  2.167192: rcu_utilization: Start context switch
>            trace.dat:  CPU 0/KVM-4594  [001]  2.167192: rcu_utilization: End context switch
> trace-tumbleweed.dat:     <idle>-0     [000]  2.167196: hrtimer_cancel:
> trace-tumbleweed.dat:     <idle>-0     [000]  2.167197: hrtimer_expire_entry:
> trace-tumbleweed.dat:     <idle>-0     [000]  2.167201: hrtimer_expire_exit:
> trace-tumbleweed.dat:     <idle>-0     [000]  2.167201: hrtimer_start:
>            trace.dat:  CPU 0/KVM-4594  [001]  2.167203: read_msr: 48, value 0
>            trace.dat:  CPU 0/KVM-4594  [001]  2.167203: write_msr: 48, value 4
>            trace.dat:  CPU 0/KVM-4594  [001]  2.167204: kvm_exit: 
> 
> This patch moves the tracepoints closer to the events, for both Intel
> and AMD, so that a combined host-guest trace will offer a more
> realistic representation of what is really happening, as shown here:
> 
>            trace.dat:  CPU 0/KVM-2553  [000]  2.190290: write_msr: 48, value 0

I'm not sure this is a good thing, as it's not clear to me that invoking tracing
with the guest's SPEC_CTRL loaded is desirable.  Maybe it's a non-issue, but it
should be explicitly called out and discussed.

And to some extent, the current behavior is _more_ accurate because it shows that
KVM started its VM-Enter sequence and then the WRMSR occured as part of that
sequence.  It is writing the guest's value after all.  Ditto for XCR0, XSS, PKRU,
Intel PT, etc...

A more concrete example would be perf; on VMX, if a perf NMI happens after KVM
invokes atomic_switch_perf_msrs() then I absolutely want to see that reflected
in the trace, e.g. to help debug the PEBS mess[*].  If the VM-Enter tracepoint
is moved closer to VM-Enter, that may or may not hold true depending on where the
NMI lands.

On VMX, I think the tracepoint can be moved below the VMWRITEs without much
contention (though doing so is likely a nop), but moving it below
kvm_load_guest_xsave_state() requires a bit more discussion.

I 100% agree that the current behavior can be a bit confusing, but I wonder if
we'd be better off "solving" that problem through documentation.

[*] https://lkml.kernel.org/r/20210209225653.1393771-1-jmattson@google.com

>            trace.dat:  CPU 0/KVM-2553  [000]  2.190290: rcu_utilization: Start context switch
>            trace.dat:  CPU 0/KVM-2553  [000]  2.190290: rcu_utilization: End context switch
>            trace.dat:  CPU 0/KVM-2553  [000]  2.190290: kvm_entry:
> trace-tumbleweed.dat:     <idle>-0     [000]  2.190290: write_msr:
> trace-tumbleweed.dat:     <idle>-0     [000]  2.190290: cpu_idle:
>            trace.dat:  CPU 0/KVM-2553  [000]  2.190291: kvm_exit:
>            trace.dat:  CPU 0/KVM-2553  [000]  2.190291: read_msr: 48, value 0
>            trace.dat:  CPU 0/KVM-2553  [000]  2.190291: write_msr: 48, value 4 
> 
> Signed-off-by: Stefano De Venuto <stefano.devenuto99@gmail.com>
> Signed-off-by: Dario Faggioli <dfaggioli@suse.com>
> ---

...

> diff --git a/arch/x86/kvm/vmx/vmx.c b/arch/x86/kvm/vmx/vmx.c
> index 4bceb5ca3a89..33c732101b83 100644
> --- a/arch/x86/kvm/vmx/vmx.c
> +++ b/arch/x86/kvm/vmx/vmx.c
> @@ -6661,6 +6661,8 @@ static noinstr void vmx_vcpu_enter_exit(struct kvm_vcpu *vcpu,
>  {
>  	kvm_guest_enter_irqoff();
>  
> +	trace_kvm_entry(vcpu);
> +
>  	/* L1D Flush includes CPU buffer clear to mitigate MDS */
>  	if (static_branch_unlikely(&vmx_l1d_should_flush))
>  		vmx_l1d_flush(vcpu);
> @@ -6675,6 +6677,9 @@ static noinstr void vmx_vcpu_enter_exit(struct kvm_vcpu *vcpu,
>  
>  	vcpu->arch.cr2 = native_read_cr2();
>  
> +	vmx->exit_reason.full = vmcs_read32(VM_EXIT_REASON);
> +	trace_kvm_exit(vmx->exit_reason.full, vcpu, KVM_ISA_VMX);

This is wrong in the 'vmx->fail == true' case.

> +
>  	kvm_guest_exit_irqoff();
>  }
>  
> @@ -6693,8 +6698,6 @@ static fastpath_t vmx_vcpu_run(struct kvm_vcpu *vcpu)
>  	if (vmx->emulation_required)
>  		return EXIT_FASTPATH_NONE;
>  
> -	trace_kvm_entry(vcpu);
> -
>  	if (vmx->ple_window_dirty) {
>  		vmx->ple_window_dirty = false;
>  		vmcs_write32(PLE_WINDOW, vmx->ple_window);
> @@ -6814,15 +6817,12 @@ static fastpath_t vmx_vcpu_run(struct kvm_vcpu *vcpu)
>  		return EXIT_FASTPATH_NONE;
>  	}
>  
> -	vmx->exit_reason.full = vmcs_read32(VM_EXIT_REASON);
>  	if (unlikely((u16)vmx->exit_reason.basic == EXIT_REASON_MCE_DURING_VMENTRY))
>  		kvm_machine_check();
>  
>  	if (likely(!vmx->exit_reason.failed_vmentry))
>  		vmx->idt_vectoring_info = vmcs_read32(IDT_VECTORING_INFO_FIELD);
>  
> -	trace_kvm_exit(vmx->exit_reason.full, vcpu, KVM_ISA_VMX);
> -
>  	if (unlikely(vmx->exit_reason.failed_vmentry))
>  		return EXIT_FASTPATH_NONE;
>  
> -- 
> 2.31.1
> 

  parent reply	other threads:[~2021-05-20 15:32 UTC|newest]

Thread overview: 12+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2021-05-19 18:23 [PATCH] Move VMEnter and VMExit tracepoints closer to the actual event Stefano De Venuto
2021-05-20  6:05 ` Dario Faggioli
2021-05-20  7:21 ` Thomas Gleixner
2021-05-21  7:13   ` Dario Faggioli
2021-05-20 15:32 ` Sean Christopherson [this message]
2021-05-20 16:18   ` Paolo Bonzini
2021-05-21  7:58     ` Dario Faggioli
2021-05-28 16:55     ` Dario Faggioli
2021-05-28 16:57       ` Paolo Bonzini
2021-05-21  7:51   ` Dario Faggioli
2021-05-25 16:20     ` Sean Christopherson
2021-05-28 17:03       ` Dario Faggioli

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=YKaBEn6oUXaVAb0K@google.com \
    --to=seanjc@google.com \
    --cc=dfaggioli@suse.com \
    --cc=hpa@zytor.com \
    --cc=jmattson@google.com \
    --cc=kvm@vger.kernel.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=pbonzini@redhat.com \
    --cc=rostedt@goodmis.org \
    --cc=stefano.devenuto99@gmail.com \
    --cc=vkuznets@redhat.com \
    --cc=wanpengli@tencent.com \
    --cc=x86@kernel.org \
    --cc=y.karadz@gmail.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.