All of lore.kernel.org
 help / color / mirror / Atom feed
* [PATCH RFC 1/1] kvm: export per-vcpu exits to userspace
@ 2021-09-08  0:08 Dongli Zhang
  2021-09-24 20:34 ` Sean Christopherson
  0 siblings, 1 reply; 3+ messages in thread
From: Dongli Zhang @ 2021-09-08  0:08 UTC (permalink / raw)
  To: kvm
  Cc: pbonzini, seanjc, vkuznets, wanpengli, jmattson, joro, tglx,
	mingo, bp, x86, hpa, linux-kernel, joe.jin

People sometimes may blame KVM scheduling if there is softlockup/rcu_stall
in VM kernel. The KVM developers are required to prove that a specific VCPU
is being regularly scheduled by KVM hypervisor.

So far we use "pidstat -p <qemu-pid> -t 1" or
"cat /proc/<pid>/task/<tid>/stat", but 'exits' is more fine-grained.

Therefore, the 'exits' is exported to userspace to verify if a VCPU is
being scheduled regularly.

I was going to export 'exits', until there was binary stats available.
Unfortunately, QEMU does not support binary stats and we will need to
read via debugfs temporarily. This patch can also be backported to prior
versions that do not support binary stats.

Signed-off-by: Dongli Zhang <dongli.zhang@oracle.com>
---
 arch/x86/kvm/debugfs.c | 11 +++++++++++
 1 file changed, 11 insertions(+)

diff --git a/arch/x86/kvm/debugfs.c b/arch/x86/kvm/debugfs.c
index 95a98413dc32..69ecc06e45a0 100644
--- a/arch/x86/kvm/debugfs.c
+++ b/arch/x86/kvm/debugfs.c
@@ -17,6 +17,15 @@ static int vcpu_get_timer_advance_ns(void *data, u64 *val)
 
 DEFINE_SIMPLE_ATTRIBUTE(vcpu_timer_advance_ns_fops, vcpu_get_timer_advance_ns, NULL, "%llu\n");
 
+static int vcpu_get_exits(void *data, u64 *val)
+{
+	struct kvm_vcpu *vcpu = (struct kvm_vcpu *) data;
+	*val = vcpu->stat.exits;
+	return 0;
+}
+
+DEFINE_SIMPLE_ATTRIBUTE(vcpu_exits_fops, vcpu_get_exits, NULL, "%llu\n");
+
 static int vcpu_get_guest_mode(void *data, u64 *val)
 {
 	struct kvm_vcpu *vcpu = (struct kvm_vcpu *) data;
@@ -54,6 +63,8 @@ DEFINE_SIMPLE_ATTRIBUTE(vcpu_tsc_scaling_frac_fops, vcpu_get_tsc_scaling_frac_bi
 
 void kvm_arch_create_vcpu_debugfs(struct kvm_vcpu *vcpu, struct dentry *debugfs_dentry)
 {
+	debugfs_create_file("exits", 0444, debugfs_dentry, vcpu,
+			    &vcpu_exits_fops);
 	debugfs_create_file("guest_mode", 0444, debugfs_dentry, vcpu,
 			    &vcpu_guest_mode_fops);
 	debugfs_create_file("tsc-offset", 0444, debugfs_dentry, vcpu,
-- 
2.17.1


^ permalink raw reply related	[flat|nested] 3+ messages in thread

* Re: [PATCH RFC 1/1] kvm: export per-vcpu exits to userspace
  2021-09-08  0:08 [PATCH RFC 1/1] kvm: export per-vcpu exits to userspace Dongli Zhang
@ 2021-09-24 20:34 ` Sean Christopherson
  2021-09-25  0:41   ` Dongli Zhang
  0 siblings, 1 reply; 3+ messages in thread
From: Sean Christopherson @ 2021-09-24 20:34 UTC (permalink / raw)
  To: Dongli Zhang
  Cc: kvm, pbonzini, vkuznets, wanpengli, jmattson, joro, tglx, mingo,
	bp, x86, hpa, linux-kernel, joe.jin

On Tue, Sep 07, 2021, Dongli Zhang wrote:
> People sometimes may blame KVM scheduling if there is softlockup/rcu_stall
> in VM kernel. The KVM developers are required to prove that a specific VCPU
> is being regularly scheduled by KVM hypervisor.
> 
> So far we use "pidstat -p <qemu-pid> -t 1" or
> "cat /proc/<pid>/task/<tid>/stat", but 'exits' is more fine-grained.

Sort of?  Yes, counts _almost_ every VM-Exit, but it's also measuring something
completely different.

> Therefore, the 'exits' is exported to userspace to verify if a VCPU is
> being scheduled regularly.

The number of VM-Exits seems like a very cumbersome and potentially misinterpreted
indicator, e.g. userspace could naively think that a guest that is generating a
high number of exits is getting more runtime.  With posted interrupts and other
hardware features, that doesn't necessarily hold true.

I'm not saying don't count exits, they absolutely can be a good triage tool, but
they're not the right tool to verify tasks are getting scheduled.

> I was going to export 'exits', until there was binary stats available.
> Unfortunately, QEMU does not support binary stats and we will need to
> read via debugfs temporarily. This patch can also be backported to prior
> versions that do not support binary stats.

Adding temporary code to the _upstream_ kernel to work around lack of support in
the userspace VMM does not seem right to me.  Especially in debugfs, which is
very explicitly not intended to be used for thing like monitoring in production.

^ permalink raw reply	[flat|nested] 3+ messages in thread

* Re: [PATCH RFC 1/1] kvm: export per-vcpu exits to userspace
  2021-09-24 20:34 ` Sean Christopherson
@ 2021-09-25  0:41   ` Dongli Zhang
  0 siblings, 0 replies; 3+ messages in thread
From: Dongli Zhang @ 2021-09-25  0:41 UTC (permalink / raw)
  To: Sean Christopherson
  Cc: kvm, pbonzini, vkuznets, wanpengli, jmattson, joro, tglx, mingo,
	bp, x86, hpa, linux-kernel, joe.jin



On 9/24/21 1:34 PM, Sean Christopherson wrote:
> On Tue, Sep 07, 2021, Dongli Zhang wrote:
>> People sometimes may blame KVM scheduling if there is softlockup/rcu_stall
>> in VM kernel. The KVM developers are required to prove that a specific VCPU
>> is being regularly scheduled by KVM hypervisor.
>>
>> So far we use "pidstat -p <qemu-pid> -t 1" or
>> "cat /proc/<pid>/task/<tid>/stat", but 'exits' is more fine-grained.
> 
> Sort of?  Yes, counts _almost_ every VM-Exit, but it's also measuring something
> completely different.
> 
>> Therefore, the 'exits' is exported to userspace to verify if a VCPU is
>> being scheduled regularly.
> 
> The number of VM-Exits seems like a very cumbersome and potentially misinterpreted
> indicator, e.g. userspace could naively think that a guest that is generating a
> high number of exits is getting more runtime.  With posted interrupts and other
> hardware features, that doesn't necessarily hold true.
> 
> I'm not saying don't count exits, they absolutely can be a good triage tool, but
> they're not the right tool to verify tasks are getting scheduled.

Yes, the high number of 'exits' does not indicate the guest is getting more runtime.

This is used to prove that a specific VCPU is entering into guest mode
regularly. Sometimes it is much more difficult to prove KVM works well, than to
resolve a KVM issue.

If the VM side complains that a VCPU stopped entering into guest mode, the
increasing 'exits' will be used as convincing evidence.

> 
>> I was going to export 'exits', until there was binary stats available.
>> Unfortunately, QEMU does not support binary stats and we will need to
>> read via debugfs temporarily. This patch can also be backported to prior
>> versions that do not support binary stats.
> 
> Adding temporary code to the _upstream_ kernel to work around lack of support in
> the userspace VMM does not seem right to me.  Especially in debugfs, which is
> very explicitly not intended to be used for thing like monitoring in production.
> 

I agree. That's why I tag the patch with RFC.

Thank you very much!

Dongli Zhang

^ permalink raw reply	[flat|nested] 3+ messages in thread

end of thread, other threads:[~2021-09-25  0:43 UTC | newest]

Thread overview: 3+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2021-09-08  0:08 [PATCH RFC 1/1] kvm: export per-vcpu exits to userspace Dongli Zhang
2021-09-24 20:34 ` Sean Christopherson
2021-09-25  0:41   ` Dongli Zhang

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.