linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [PATCH] KVM: X86: correct trace_kvm_pv_tlb_flush stats
@ 2022-05-04 18:27 Jon Kohler
  2022-05-04 21:47 ` Sean Christopherson
  0 siblings, 1 reply; 7+ messages in thread
From: Jon Kohler @ 2022-05-04 18:27 UTC (permalink / raw)
  To: Paolo Bonzini, Sean Christopherson, Vitaly Kuznetsov, Wanpeng Li,
	Jim Mattson, Joerg Roedel, Thomas Gleixner, Ingo Molnar,
	Borislav Petkov, Dave Hansen, x86, H. Peter Anvin, kvm,
	linux-kernel
  Cc: Jon Kohler

The trace point in record_steal_time() is above the conditional
that fires kvm_vcpu_flush_tlb_guest(), so even when we might
not be flushing tlb, we still record that we are.

Fix by nestling trace_kvm_pv_tlb_flush() under appropriate
conditional. This results in the stats for kvm:kvm_pv_tlb_flush,
as trivially observable by perf stat -e "kvm:*" -a sleep Xs, in
reporting the amount of times we actually do a pv tlb flush,
instead of just the amount of times we happen to call
record_steal_time().

Signed-off-by: Jon Kohler <jon@nutanix.com>
---
 arch/x86/kvm/x86.c | 4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c
index 4790f0d7d40b..8d4e0e58ec34 100644
--- a/arch/x86/kvm/x86.c
+++ b/arch/x86/kvm/x86.c
@@ -3410,9 +3410,9 @@ static void record_steal_time(struct kvm_vcpu *vcpu)
 
 		vcpu->arch.st.preempted = 0;
 
-		trace_kvm_pv_tlb_flush(vcpu->vcpu_id,
-				       st_preempted & KVM_VCPU_FLUSH_TLB);
 		if (st_preempted & KVM_VCPU_FLUSH_TLB)
+			trace_kvm_pv_tlb_flush(vcpu->vcpu_id,
+				st_preempted & KVM_VCPU_FLUSH_TLB);
 			kvm_vcpu_flush_tlb_guest(vcpu);
 
 		if (!user_access_begin(st, sizeof(*st)))
-- 
2.30.1 (Apple Git-130)


^ permalink raw reply related	[flat|nested] 7+ messages in thread

* Re: [PATCH] KVM: X86: correct trace_kvm_pv_tlb_flush stats
  2022-05-04 18:27 [PATCH] KVM: X86: correct trace_kvm_pv_tlb_flush stats Jon Kohler
@ 2022-05-04 21:47 ` Sean Christopherson
  2022-05-05  0:26   ` Jon Kohler
  0 siblings, 1 reply; 7+ messages in thread
From: Sean Christopherson @ 2022-05-04 21:47 UTC (permalink / raw)
  To: Jon Kohler
  Cc: Paolo Bonzini, Vitaly Kuznetsov, Wanpeng Li, Jim Mattson,
	Joerg Roedel, Thomas Gleixner, Ingo Molnar, Borislav Petkov,
	Dave Hansen, x86, H. Peter Anvin, kvm, linux-kernel

On Wed, May 04, 2022, Jon Kohler wrote:
> The trace point in record_steal_time() is above the conditional
> that fires kvm_vcpu_flush_tlb_guest(), so even when we might
> not be flushing tlb, we still record that we are.

No, it records whether not a TLB flush is needed.

> Fix by nestling trace_kvm_pv_tlb_flush() under appropriate
> conditional. This results in the stats for kvm:kvm_pv_tlb_flush,
> as trivially observable by perf stat -e "kvm:*" -a sleep Xs, in
> reporting the amount of times we actually do a pv tlb flush,
> instead of just the amount of times we happen to call
> record_steal_time().
> 
> Signed-off-by: Jon Kohler <jon@nutanix.com>
> ---
>  arch/x86/kvm/x86.c | 4 ++--
>  1 file changed, 2 insertions(+), 2 deletions(-)
> 
> diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c
> index 4790f0d7d40b..8d4e0e58ec34 100644
> --- a/arch/x86/kvm/x86.c
> +++ b/arch/x86/kvm/x86.c
> @@ -3410,9 +3410,9 @@ static void record_steal_time(struct kvm_vcpu *vcpu)
>  
>  		vcpu->arch.st.preempted = 0;
>  
> -		trace_kvm_pv_tlb_flush(vcpu->vcpu_id,
> -				       st_preempted & KVM_VCPU_FLUSH_TLB);
>  		if (st_preempted & KVM_VCPU_FLUSH_TLB)
> +			trace_kvm_pv_tlb_flush(vcpu->vcpu_id,
> +				st_preempted & KVM_VCPU_FLUSH_TLB);

If you're going to trace only when a flush is needed, this should simply be:

			trace_kvm_pv_tlb_flush(vcpu->vcpu_id);

I haven't used this tracepoint often (at all?) so I don't have a strong preference,
but I can see the "no TLB flush needed" information being extremely valuable when
debugging a supsected TLB flushing bug.

>  			kvm_vcpu_flush_tlb_guest(vcpu);
>  
>  		if (!user_access_begin(st, sizeof(*st)))
> -- 
> 2.30.1 (Apple Git-130)
> 

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: [PATCH] KVM: X86: correct trace_kvm_pv_tlb_flush stats
  2022-05-04 21:47 ` Sean Christopherson
@ 2022-05-05  0:26   ` Jon Kohler
  2022-05-05  8:09     ` Vitaly Kuznetsov
  0 siblings, 1 reply; 7+ messages in thread
From: Jon Kohler @ 2022-05-05  0:26 UTC (permalink / raw)
  To: Sean Christopherson
  Cc: Jon Kohler, Paolo Bonzini, Vitaly Kuznetsov, Wanpeng Li,
	Jim Mattson, Joerg Roedel, Thomas Gleixner, Ingo Molnar,
	Borislav Petkov, Dave Hansen, x86, H. Peter Anvin, kvm,
	linux-kernel



> On May 4, 2022, at 5:47 PM, Sean Christopherson <seanjc@google.com> wrote:
> 
> On Wed, May 04, 2022, Jon Kohler wrote:
>> The trace point in record_steal_time() is above the conditional
>> that fires kvm_vcpu_flush_tlb_guest(), so even when we might
>> not be flushing tlb, we still record that we are.
> 
> No, it records whether not a TLB flush is needed.

Sure, the trace does, but the stat makes it seem like the host is going
nuts with doing pv tlb flushes when in reality it may not really be
doing all that much work.

> 
>> Fix by nestling trace_kvm_pv_tlb_flush() under appropriate
>> conditional. This results in the stats for kvm:kvm_pv_tlb_flush,
>> as trivially observable by perf stat -e "kvm:*" -a sleep Xs, in
>> reporting the amount of times we actually do a pv tlb flush,
>> instead of just the amount of times we happen to call
>> record_steal_time().
>> 
>> Signed-off-by: Jon Kohler <jon@nutanix.com>
>> ---
>> arch/x86/kvm/x86.c | 4 ++--
>> 1 file changed, 2 insertions(+), 2 deletions(-)
>> 
>> diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c
>> index 4790f0d7d40b..8d4e0e58ec34 100644
>> --- a/arch/x86/kvm/x86.c
>> +++ b/arch/x86/kvm/x86.c
>> @@ -3410,9 +3410,9 @@ static void record_steal_time(struct kvm_vcpu *vcpu)
>> 
>> 		vcpu->arch.st.preempted = 0;
>> 
>> -		trace_kvm_pv_tlb_flush(vcpu->vcpu_id,
>> -				       st_preempted & KVM_VCPU_FLUSH_TLB);
>> 		if (st_preempted & KVM_VCPU_FLUSH_TLB)
>> +			trace_kvm_pv_tlb_flush(vcpu->vcpu_id,
>> +				st_preempted & KVM_VCPU_FLUSH_TLB);
> 
> If you're going to trace only when a flush is needed, this should simply be:
> 
> 			trace_kvm_pv_tlb_flush(vcpu->vcpu_id);
> 
> I haven't used this tracepoint often (at all?) so I don't have a strong preference,
> but I can see the "no TLB flush needed" information being extremely valuable when
> debugging a supsected TLB flushing bug.

Yea thats fair; however, this is only calling into some other function that is
actually doing the work. Those other flush TLB areas do not have traces
AFAIK, so even that is a bit in complete.

The net problem here is really that the stat is likely incorrect; however,
one other oddity I didn’t quite understand after looking into this is that
the call site for all of this is in record_steal_time(), which is only called
from vcpu_enter_guest(), and that is called *after*
kvm_service_local_tlb_flush_requests(), which also calls
kvm_vcpu_flush_tlb_guest() if request == KVM_REQ_TLB_FLUSH_GUEST

That request may be there set from a few different places. 

I don’t have any proof of this, but it seems to me like we might have a
situation where we double flush?

Put another way, I wonder if there is any sense behind maybe hoisting
if (kvm_check_request(KVM_REQ_STEAL_UPDATE, vcpu)) up before
Other tlb flushes, and have it clear the FLUSH_GUEST if it was set?

Just thinking aloud, seemed fishy.

Regardless, this pv tlb flush stat needs some love, open to suggestions
on the most appropriate way to tackle it?

> 
>> 			kvm_vcpu_flush_tlb_guest(vcpu);
>> 
>> 		if (!user_access_begin(st, sizeof(*st)))
>> -- 
>> 2.30.1 (Apple Git-130)


^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: [PATCH] KVM: X86: correct trace_kvm_pv_tlb_flush stats
  2022-05-05  0:26   ` Jon Kohler
@ 2022-05-05  8:09     ` Vitaly Kuznetsov
  2022-05-06 15:42       ` Jon Kohler
  0 siblings, 1 reply; 7+ messages in thread
From: Vitaly Kuznetsov @ 2022-05-05  8:09 UTC (permalink / raw)
  To: Jon Kohler
  Cc: Paolo Bonzini, Wanpeng Li, Jim Mattson, Joerg Roedel,
	Thomas Gleixner, Ingo Molnar, Borislav Petkov, Dave Hansen, x86,
	H. Peter Anvin, kvm, linux-kernel, Sean Christopherson

Jon Kohler <jon@nutanix.com> writes:

>> On May 4, 2022, at 5:47 PM, Sean Christopherson <seanjc@google.com> wrote:
>> 

...

>
> The net problem here is really that the stat is likely incorrect; however,
> one other oddity I didn’t quite understand after looking into this is that
> the call site for all of this is in record_steal_time(), which is only called
> from vcpu_enter_guest(), and that is called *after*
> kvm_service_local_tlb_flush_requests(), which also calls
> kvm_vcpu_flush_tlb_guest() if request == KVM_REQ_TLB_FLUSH_GUEST
>
> That request may be there set from a few different places. 
>
> I don’t have any proof of this, but it seems to me like we might have a
> situation where we double flush?
>
> Put another way, I wonder if there is any sense behind maybe hoisting
> if (kvm_check_request(KVM_REQ_STEAL_UPDATE, vcpu)) up before
> Other tlb flushes, and have it clear the FLUSH_GUEST if it was set?

Indeed, if we move KVM_REQ_STEAL_UPDATE check/record_steal_time() call
in vcpu_enter_guest() before kvm_service_local_tlb_flush_requests(), we
can probably get aways with kvm_make_request(KVM_REQ_TLB_FLUSH_GUEST,
vcpu) in record_steal_time() which would help to avoid double flushing.

-- 
Vitaly


^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: [PATCH] KVM: X86: correct trace_kvm_pv_tlb_flush stats
  2022-05-05  8:09     ` Vitaly Kuznetsov
@ 2022-05-06 15:42       ` Jon Kohler
  2022-05-10  7:57         ` Vitaly Kuznetsov
  0 siblings, 1 reply; 7+ messages in thread
From: Jon Kohler @ 2022-05-06 15:42 UTC (permalink / raw)
  To: Vitaly Kuznetsov
  Cc: Jon Kohler, Paolo Bonzini, Wanpeng Li, Jim Mattson, Joerg Roedel,
	Thomas Gleixner, Ingo Molnar, Borislav Petkov, Dave Hansen, x86,
	H. Peter Anvin, kvm, linux-kernel, Sean Christopherson



> On May 5, 2022, at 4:09 AM, Vitaly Kuznetsov <vkuznets@redhat.com> wrote:
> 
> Jon Kohler <jon@nutanix.com> writes:
> 
>>> On May 4, 2022, at 5:47 PM, Sean Christopherson <seanjc@google.com> wrote:
>>> 
> 
> ...
> 
>> 
>> The net problem here is really that the stat is likely incorrect; however,
>> one other oddity I didn’t quite understand after looking into this is that
>> the call site for all of this is in record_steal_time(), which is only called
>> from vcpu_enter_guest(), and that is called *after*
>> kvm_service_local_tlb_flush_requests(), which also calls
>> kvm_vcpu_flush_tlb_guest() if request == KVM_REQ_TLB_FLUSH_GUEST
>> 
>> That request may be there set from a few different places. 
>> 
>> I don’t have any proof of this, but it seems to me like we might have a
>> situation where we double flush?
>> 
>> Put another way, I wonder if there is any sense behind maybe hoisting
>> if (kvm_check_request(KVM_REQ_STEAL_UPDATE, vcpu)) up before
>> Other tlb flushes, and have it clear the FLUSH_GUEST if it was set?
> 
> Indeed, if we move KVM_REQ_STEAL_UPDATE check/record_steal_time() call
> in vcpu_enter_guest() before kvm_service_local_tlb_flush_requests(), we
> can probably get aways with kvm_make_request(KVM_REQ_TLB_FLUSH_GUEST,
> vcpu) in record_steal_time() which would help to avoid double flushing.

Thanks, Vitaly, I’ll rework this one and incorporate that. In the mean time, do you
have any suggestions on Sean's concern about losing the trace in situations
where pv tlb flushing isn’t happening?

> 
> -- 
> Vitaly
> 


^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: [PATCH] KVM: X86: correct trace_kvm_pv_tlb_flush stats
  2022-05-06 15:42       ` Jon Kohler
@ 2022-05-10  7:57         ` Vitaly Kuznetsov
  2022-05-10 13:38           ` Sean Christopherson
  0 siblings, 1 reply; 7+ messages in thread
From: Vitaly Kuznetsov @ 2022-05-10  7:57 UTC (permalink / raw)
  To: Jon Kohler
  Cc: Paolo Bonzini, Wanpeng Li, Jim Mattson, Joerg Roedel,
	Thomas Gleixner, Ingo Molnar, Borislav Petkov, Dave Hansen, x86,
	H. Peter Anvin, kvm, linux-kernel, Sean Christopherson

Jon Kohler <jon@nutanix.com> writes:

>> On May 5, 2022, at 4:09 AM, Vitaly Kuznetsov <vkuznets@redhat.com> wrote:
>> 
>> Jon Kohler <jon@nutanix.com> writes:
>> 
>>>> On May 4, 2022, at 5:47 PM, Sean Christopherson <seanjc@google.com> wrote:
>>>> 
>> 
>> ...
>> 
>>> 
>>> The net problem here is really that the stat is likely incorrect; however,
>>> one other oddity I didn’t quite understand after looking into this is that
>>> the call site for all of this is in record_steal_time(), which is only called
>>> from vcpu_enter_guest(), and that is called *after*
>>> kvm_service_local_tlb_flush_requests(), which also calls
>>> kvm_vcpu_flush_tlb_guest() if request == KVM_REQ_TLB_FLUSH_GUEST
>>> 
>>> That request may be there set from a few different places. 
>>> 
>>> I don’t have any proof of this, but it seems to me like we might have a
>>> situation where we double flush?
>>> 
>>> Put another way, I wonder if there is any sense behind maybe hoisting
>>> if (kvm_check_request(KVM_REQ_STEAL_UPDATE, vcpu)) up before
>>> Other tlb flushes, and have it clear the FLUSH_GUEST if it was set?
>> 
>> Indeed, if we move KVM_REQ_STEAL_UPDATE check/record_steal_time() call
>> in vcpu_enter_guest() before kvm_service_local_tlb_flush_requests(), we
>> can probably get aways with kvm_make_request(KVM_REQ_TLB_FLUSH_GUEST,
>> vcpu) in record_steal_time() which would help to avoid double flushing.
>
> Thanks, Vitaly, I’ll rework this one and incorporate that. In the mean time, do you
> have any suggestions on Sean's concern about losing the trace in situations
> where pv tlb flushing isn’t happening?
>

No strong preference from my side but there are multiple places which
conditionally cause TLB flush but we don't have tracepoints saying
"flush could've been done but wasn't" there, right? Also,
kvm_vcpu_flush_tlb_all()/kvm_vcpu_flush_tlb_guest()/kvm_vcpu_flush_tlb_current()
don't seem to have tracepoints so we don't actually record when we
flush. Hyper-V TLB flush has its own tracepoints
(trace_kvm_hv_flush_tlb()/trace_kvm_hv_flush_tlb_ex()) though.
This probably deserves a cleanup if we want TLB flush to be debuggable
without code instrumentation.

-- 
Vitaly


^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: [PATCH] KVM: X86: correct trace_kvm_pv_tlb_flush stats
  2022-05-10  7:57         ` Vitaly Kuznetsov
@ 2022-05-10 13:38           ` Sean Christopherson
  0 siblings, 0 replies; 7+ messages in thread
From: Sean Christopherson @ 2022-05-10 13:38 UTC (permalink / raw)
  To: Vitaly Kuznetsov
  Cc: Jon Kohler, Paolo Bonzini, Wanpeng Li, Jim Mattson, Joerg Roedel,
	Thomas Gleixner, Ingo Molnar, Borislav Petkov, Dave Hansen, x86,
	H. Peter Anvin, kvm, linux-kernel

On Tue, May 10, 2022, Vitaly Kuznetsov wrote:
> Jon Kohler <jon@nutanix.com> writes:
> 
> >> On May 5, 2022, at 4:09 AM, Vitaly Kuznetsov <vkuznets@redhat.com> wrote:
> >> 
> >> Jon Kohler <jon@nutanix.com> writes:
> >> 
> >>>> On May 4, 2022, at 5:47 PM, Sean Christopherson <seanjc@google.com> wrote:
> >>>> 
> >> 
> >> ...
> >> 
> >>> 
> >>> The net problem here is really that the stat is likely incorrect; however,
> >>> one other oddity I didn’t quite understand after looking into this is that
> >>> the call site for all of this is in record_steal_time(), which is only called
> >>> from vcpu_enter_guest(), and that is called *after*
> >>> kvm_service_local_tlb_flush_requests(), which also calls
> >>> kvm_vcpu_flush_tlb_guest() if request == KVM_REQ_TLB_FLUSH_GUEST
> >>> 
> >>> That request may be there set from a few different places. 
> >>> 
> >>> I don’t have any proof of this, but it seems to me like we might have a
> >>> situation where we double flush?
> >>> 
> >>> Put another way, I wonder if there is any sense behind maybe hoisting
> >>> if (kvm_check_request(KVM_REQ_STEAL_UPDATE, vcpu)) up before
> >>> Other tlb flushes, and have it clear the FLUSH_GUEST if it was set?
> >> 
> >> Indeed, if we move KVM_REQ_STEAL_UPDATE check/record_steal_time() call
> >> in vcpu_enter_guest() before kvm_service_local_tlb_flush_requests(), we
> >> can probably get aways with kvm_make_request(KVM_REQ_TLB_FLUSH_GUEST,
> >> vcpu) in record_steal_time() which would help to avoid double flushing.
> >
> > Thanks, Vitaly, I’ll rework this one and incorporate that. In the mean time, do you
> > have any suggestions on Sean's concern about losing the trace in situations
> > where pv tlb flushing isn’t happening?
> >
> 
> No strong preference from my side but there are multiple places which
> conditionally cause TLB flush but we don't have tracepoints saying
> "flush could've been done but wasn't" there, right?

IMO this one is different because it's an explicit request from the guest that is
otherwise not traced, whereas e.g. INVLPG will show up in exits.

> Also,
> kvm_vcpu_flush_tlb_all()/kvm_vcpu_flush_tlb_guest()/kvm_vcpu_flush_tlb_current()
> don't seem to have tracepoints so we don't actually record when we
> flush. Hyper-V TLB flush has its own tracepoints
> (trace_kvm_hv_flush_tlb()/trace_kvm_hv_flush_tlb_ex()) though.
> This probably deserves a cleanup if we want TLB flush to be debuggable
> without code instrumentation.

I don't have a preference either way.  I'm not opposed to tracing flushes, but I'm
also more than a bit skeptical that any non-trivial TLB bugs will be debuggable via
tracepoints.

^ permalink raw reply	[flat|nested] 7+ messages in thread

end of thread, other threads:[~2022-05-10 14:23 UTC | newest]

Thread overview: 7+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2022-05-04 18:27 [PATCH] KVM: X86: correct trace_kvm_pv_tlb_flush stats Jon Kohler
2022-05-04 21:47 ` Sean Christopherson
2022-05-05  0:26   ` Jon Kohler
2022-05-05  8:09     ` Vitaly Kuznetsov
2022-05-06 15:42       ` Jon Kohler
2022-05-10  7:57         ` Vitaly Kuznetsov
2022-05-10 13:38           ` Sean Christopherson

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).