kvm.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [PATCH v7 0/4] KVM statistics data fd-based binary interface
@ 2021-06-03 21:14 Jing Zhang
  2021-06-03 21:14 ` [PATCH v7 1/4] KVM: stats: Separate generic stats from architecture specific ones Jing Zhang
                   ` (5 more replies)
  0 siblings, 6 replies; 32+ messages in thread
From: Jing Zhang @ 2021-06-03 21:14 UTC (permalink / raw)
  To: KVM, KVMARM, LinuxMIPS, KVMPPC, LinuxS390, Linuxkselftest,
	Paolo Bonzini, Marc Zyngier, James Morse, Julien Thierry,
	Suzuki K Poulose, Will Deacon, Huacai Chen, Aleksandar Markovic,
	Thomas Bogendoerfer, Paul Mackerras, Christian Borntraeger,
	Janosch Frank, David Hildenbrand, Cornelia Huck,
	Claudio Imbrenda, Sean Christopherson, Vitaly Kuznetsov,
	Jim Mattson, Peter Shier, Oliver Upton, David Rientjes,
	Emanuele Giuseppe Esposito, David Matlack, Ricardo Koller,
	Krish Sadhukhan
  Cc: Jing Zhang

This patchset provides a file descriptor for every VM and VCPU to read
KVM statistics data in binary format.
It is meant to provide a lightweight, flexible, scalable and efficient
lock-free solution for user space telemetry applications to pull the
statistics data periodically for large scale systems. The pulling
frequency could be as high as a few times per second.
In this patchset, every statistics data are treated to have some
attributes as below:
  * architecture dependent or generic
  * VM statistics data or VCPU statistics data
  * type: cumulative, instantaneous,
  * unit: none for simple counter, nanosecond, microsecond,
    millisecond, second, Byte, KiByte, MiByte, GiByte. Clock Cycles
Since no lock/synchronization is used, the consistency between all
the statistics data is not guaranteed. That means not all statistics
data are read out at the exact same time, since the statistics date
are still being updated by KVM subsystems while they are read out.

---

* v6 -> v7
  - Improve file descriptor allocation function by Krish suggestion
  - Use "generic stats" instead of "common stats" as Krish suggested
  - Addressed some other nits from Krish and David Matlack

* v5 -> v6
  - Use designated initializers for STATS_DESC
  - Change KVM_STATS_SCALE... to KVM_STATS_BASE...
  - Use a common function for kvm_[vm|vcpu]_stats_read
  - Fix some documentation errors/missings
  - Use TEST_ASSERT in selftest
  - Use a common function for [vm|vcpu]_stats_test in selftest

* v4 -> v5
  - Rebase to kvm/queue, commit a4345a7cecfb ("Merge tag
    'kvmarm-fixes-5.13-1'")
  - Change maximum stats name length to 48
  - Replace VM_STATS_COMMON/VCPU_STATS_COMMON macros with stats
    descriptor definition macros.
  - Fixed some errors/warnings reported by checkpatch.pl

* v3 -> v4
  - Rebase to kvm/queue, commit 9f242010c3b4 ("KVM: avoid "deadlock"
    between install_new_memslots and MMU notifier")
  - Use C-stype comments in the whole patch
  - Fix wrong count for x86 VCPU stats descriptors
  - Fix KVM stats data size counting and validity check in selftest

* v2 -> v3
  - Rebase to kvm/queue, commit edf408f5257b ("KVM: avoid "deadlock"
    between install_new_memslots and MMU notifier")
  - Resolve some nitpicks about format

* v1 -> v2
  - Use ARRAY_SIZE to count the number of stats descriptors
  - Fix missing `size` field initialization in macro STATS_DESC

[1] https://lore.kernel.org/kvm/20210402224359.2297157-1-jingzhangos@google.com
[2] https://lore.kernel.org/kvm/20210415151741.1607806-1-jingzhangos@google.com
[3] https://lore.kernel.org/kvm/20210423181727.596466-1-jingzhangos@google.com
[4] https://lore.kernel.org/kvm/20210429203740.1935629-1-jingzhangos@google.com
[5] https://lore.kernel.org/kvm/20210517145314.157626-1-jingzhangos@google.com
[6] https://lore.kernel.org/kvm/20210524151828.4113777-1-jingzhangos@google.com

---

Jing Zhang (4):
  KVM: stats: Separate generic stats from architecture specific ones
  KVM: stats: Add fd-based API to read binary stats data
  KVM: stats: Add documentation for statistics data binary interface
  KVM: selftests: Add selftest for KVM statistics data binary interface

 Documentation/virt/kvm/api.rst                | 180 +++++++++++++++
 arch/arm64/include/asm/kvm_host.h             |   9 +-
 arch/arm64/kvm/guest.c                        |  38 +++-
 arch/mips/include/asm/kvm_host.h              |   9 +-
 arch/mips/kvm/mips.c                          |  64 +++++-
 arch/powerpc/include/asm/kvm_host.h           |   9 +-
 arch/powerpc/kvm/book3s.c                     |  64 +++++-
 arch/powerpc/kvm/book3s_hv.c                  |  12 +-
 arch/powerpc/kvm/book3s_pr.c                  |   2 +-
 arch/powerpc/kvm/book3s_pr_papr.c             |   2 +-
 arch/powerpc/kvm/booke.c                      |  59 ++++-
 arch/s390/include/asm/kvm_host.h              |   9 +-
 arch/s390/kvm/kvm-s390.c                      | 129 ++++++++++-
 arch/x86/include/asm/kvm_host.h               |   9 +-
 arch/x86/kvm/x86.c                            |  67 +++++-
 include/linux/kvm_host.h                      | 141 +++++++++++-
 include/linux/kvm_types.h                     |  12 +
 include/uapi/linux/kvm.h                      |  50 ++++
 tools/testing/selftests/kvm/.gitignore        |   1 +
 tools/testing/selftests/kvm/Makefile          |   3 +
 .../testing/selftests/kvm/include/kvm_util.h  |   3 +
 .../selftests/kvm/kvm_binary_stats_test.c     | 215 ++++++++++++++++++
 tools/testing/selftests/kvm/lib/kvm_util.c    |  12 +
 virt/kvm/kvm_main.c                           | 169 +++++++++++++-
 24 files changed, 1178 insertions(+), 90 deletions(-)
 create mode 100644 tools/testing/selftests/kvm/kvm_binary_stats_test.c


base-commit: a4345a7cecfb91ae78cd43d26b0c6a956420761a
-- 
2.32.0.rc1.229.g3e70b5a671-goog


^ permalink raw reply	[flat|nested] 32+ messages in thread

* [PATCH v7 1/4] KVM: stats: Separate generic stats from architecture specific ones
  2021-06-03 21:14 [PATCH v7 0/4] KVM statistics data fd-based binary interface Jing Zhang
@ 2021-06-03 21:14 ` Jing Zhang
  2021-06-07 19:22   ` Krish Sadhukhan
  2021-06-11  6:57   ` Christian Borntraeger
  2021-06-03 21:14 ` [PATCH v7 2/4] KVM: stats: Add fd-based API to read binary stats data Jing Zhang
                   ` (4 subsequent siblings)
  5 siblings, 2 replies; 32+ messages in thread
From: Jing Zhang @ 2021-06-03 21:14 UTC (permalink / raw)
  To: KVM, KVMARM, LinuxMIPS, KVMPPC, LinuxS390, Linuxkselftest,
	Paolo Bonzini, Marc Zyngier, James Morse, Julien Thierry,
	Suzuki K Poulose, Will Deacon, Huacai Chen, Aleksandar Markovic,
	Thomas Bogendoerfer, Paul Mackerras, Christian Borntraeger,
	Janosch Frank, David Hildenbrand, Cornelia Huck,
	Claudio Imbrenda, Sean Christopherson, Vitaly Kuznetsov,
	Jim Mattson, Peter Shier, Oliver Upton, David Rientjes,
	Emanuele Giuseppe Esposito, David Matlack, Ricardo Koller,
	Krish Sadhukhan
  Cc: Jing Zhang

Put all generic statistics in a separate structure to ease
statistics handling for the incoming new statistics API.

No functional change intended.

Reviewed-by: David Matlack <dmatlack@google.com>
Reviewed-by: Ricardo Koller <ricarkol@google.com>
Signed-off-by: Jing Zhang <jingzhangos@google.com>
---
 arch/arm64/include/asm/kvm_host.h   |  9 ++-------
 arch/arm64/kvm/guest.c              | 12 ++++++------
 arch/mips/include/asm/kvm_host.h    |  9 ++-------
 arch/mips/kvm/mips.c                | 12 ++++++------
 arch/powerpc/include/asm/kvm_host.h |  9 ++-------
 arch/powerpc/kvm/book3s.c           | 12 ++++++------
 arch/powerpc/kvm/book3s_hv.c        | 12 ++++++------
 arch/powerpc/kvm/book3s_pr.c        |  2 +-
 arch/powerpc/kvm/book3s_pr_papr.c   |  2 +-
 arch/powerpc/kvm/booke.c            | 14 +++++++-------
 arch/s390/include/asm/kvm_host.h    |  9 ++-------
 arch/s390/kvm/kvm-s390.c            | 12 ++++++------
 arch/x86/include/asm/kvm_host.h     |  9 ++-------
 arch/x86/kvm/x86.c                  | 14 +++++++-------
 include/linux/kvm_host.h            |  9 +++++++--
 include/linux/kvm_types.h           | 12 ++++++++++++
 virt/kvm/kvm_main.c                 | 14 +++++++-------
 17 files changed, 82 insertions(+), 90 deletions(-)

diff --git a/arch/arm64/include/asm/kvm_host.h b/arch/arm64/include/asm/kvm_host.h
index 7cd7d5c8c4bc..5a2c82f63baa 100644
--- a/arch/arm64/include/asm/kvm_host.h
+++ b/arch/arm64/include/asm/kvm_host.h
@@ -556,16 +556,11 @@ static inline bool __vcpu_write_sys_reg_to_cpu(u64 val, int reg)
 }
 
 struct kvm_vm_stat {
-	ulong remote_tlb_flush;
+	struct kvm_vm_stat_generic generic;
 };
 
 struct kvm_vcpu_stat {
-	u64 halt_successful_poll;
-	u64 halt_attempted_poll;
-	u64 halt_poll_success_ns;
-	u64 halt_poll_fail_ns;
-	u64 halt_poll_invalid;
-	u64 halt_wakeup;
+	struct kvm_vcpu_stat_generic generic;
 	u64 hvc_exit_stat;
 	u64 wfe_exit_stat;
 	u64 wfi_exit_stat;
diff --git a/arch/arm64/kvm/guest.c b/arch/arm64/kvm/guest.c
index 5cb4a1cd5603..4962331d01e6 100644
--- a/arch/arm64/kvm/guest.c
+++ b/arch/arm64/kvm/guest.c
@@ -29,18 +29,18 @@
 #include "trace.h"
 
 struct kvm_stats_debugfs_item debugfs_entries[] = {
-	VCPU_STAT("halt_successful_poll", halt_successful_poll),
-	VCPU_STAT("halt_attempted_poll", halt_attempted_poll),
-	VCPU_STAT("halt_poll_invalid", halt_poll_invalid),
-	VCPU_STAT("halt_wakeup", halt_wakeup),
+	VCPU_STAT_GENERIC("halt_successful_poll", halt_successful_poll),
+	VCPU_STAT_GENERIC("halt_attempted_poll", halt_attempted_poll),
+	VCPU_STAT_GENERIC("halt_poll_invalid", halt_poll_invalid),
+	VCPU_STAT_GENERIC("halt_wakeup", halt_wakeup),
 	VCPU_STAT("hvc_exit_stat", hvc_exit_stat),
 	VCPU_STAT("wfe_exit_stat", wfe_exit_stat),
 	VCPU_STAT("wfi_exit_stat", wfi_exit_stat),
 	VCPU_STAT("mmio_exit_user", mmio_exit_user),
 	VCPU_STAT("mmio_exit_kernel", mmio_exit_kernel),
 	VCPU_STAT("exits", exits),
-	VCPU_STAT("halt_poll_success_ns", halt_poll_success_ns),
-	VCPU_STAT("halt_poll_fail_ns", halt_poll_fail_ns),
+	VCPU_STAT_GENERIC("halt_poll_success_ns", halt_poll_success_ns),
+	VCPU_STAT_GENERIC("halt_poll_fail_ns", halt_poll_fail_ns),
 	{ NULL }
 };
 
diff --git a/arch/mips/include/asm/kvm_host.h b/arch/mips/include/asm/kvm_host.h
index fca4547d580f..696f6b009377 100644
--- a/arch/mips/include/asm/kvm_host.h
+++ b/arch/mips/include/asm/kvm_host.h
@@ -109,10 +109,11 @@ static inline bool kvm_is_error_hva(unsigned long addr)
 }
 
 struct kvm_vm_stat {
-	ulong remote_tlb_flush;
+	struct kvm_vm_stat_generic generic;
 };
 
 struct kvm_vcpu_stat {
+	struct kvm_vcpu_stat_generic generic;
 	u64 wait_exits;
 	u64 cache_exits;
 	u64 signal_exits;
@@ -142,12 +143,6 @@ struct kvm_vcpu_stat {
 #ifdef CONFIG_CPU_LOONGSON64
 	u64 vz_cpucfg_exits;
 #endif
-	u64 halt_successful_poll;
-	u64 halt_attempted_poll;
-	u64 halt_poll_success_ns;
-	u64 halt_poll_fail_ns;
-	u64 halt_poll_invalid;
-	u64 halt_wakeup;
 };
 
 struct kvm_arch_memory_slot {
diff --git a/arch/mips/kvm/mips.c b/arch/mips/kvm/mips.c
index 4d4af97dcc88..ff205b35719b 100644
--- a/arch/mips/kvm/mips.c
+++ b/arch/mips/kvm/mips.c
@@ -68,12 +68,12 @@ struct kvm_stats_debugfs_item debugfs_entries[] = {
 #ifdef CONFIG_CPU_LOONGSON64
 	VCPU_STAT("vz_cpucfg", vz_cpucfg_exits),
 #endif
-	VCPU_STAT("halt_successful_poll", halt_successful_poll),
-	VCPU_STAT("halt_attempted_poll", halt_attempted_poll),
-	VCPU_STAT("halt_poll_invalid", halt_poll_invalid),
-	VCPU_STAT("halt_wakeup", halt_wakeup),
-	VCPU_STAT("halt_poll_success_ns", halt_poll_success_ns),
-	VCPU_STAT("halt_poll_fail_ns", halt_poll_fail_ns),
+	VCPU_STAT_GENERIC("halt_successful_poll", halt_successful_poll),
+	VCPU_STAT_GENERIC("halt_attempted_poll", halt_attempted_poll),
+	VCPU_STAT_GENERIC("halt_poll_invalid", halt_poll_invalid),
+	VCPU_STAT_GENERIC("halt_wakeup", halt_wakeup),
+	VCPU_STAT_GENERIC("halt_poll_success_ns", halt_poll_success_ns),
+	VCPU_STAT_GENERIC("halt_poll_fail_ns", halt_poll_fail_ns),
 	{NULL}
 };
 
diff --git a/arch/powerpc/include/asm/kvm_host.h b/arch/powerpc/include/asm/kvm_host.h
index 1e83359f286b..6e41064858a1 100644
--- a/arch/powerpc/include/asm/kvm_host.h
+++ b/arch/powerpc/include/asm/kvm_host.h
@@ -80,12 +80,13 @@ struct kvmppc_book3s_shadow_vcpu;
 struct kvm_nested_guest;
 
 struct kvm_vm_stat {
-	ulong remote_tlb_flush;
+	struct kvm_vm_stat_generic generic;
 	ulong num_2M_pages;
 	ulong num_1G_pages;
 };
 
 struct kvm_vcpu_stat {
+	struct kvm_vcpu_stat_generic generic;
 	u64 sum_exits;
 	u64 mmio_exits;
 	u64 signal_exits;
@@ -101,14 +102,8 @@ struct kvm_vcpu_stat {
 	u64 emulated_inst_exits;
 	u64 dec_exits;
 	u64 ext_intr_exits;
-	u64 halt_poll_success_ns;
-	u64 halt_poll_fail_ns;
 	u64 halt_wait_ns;
-	u64 halt_successful_poll;
-	u64 halt_attempted_poll;
 	u64 halt_successful_wait;
-	u64 halt_poll_invalid;
-	u64 halt_wakeup;
 	u64 dbell_exits;
 	u64 gdbell_exits;
 	u64 ld;
diff --git a/arch/powerpc/kvm/book3s.c b/arch/powerpc/kvm/book3s.c
index 2b691f4d1f26..92cdb4175945 100644
--- a/arch/powerpc/kvm/book3s.c
+++ b/arch/powerpc/kvm/book3s.c
@@ -47,14 +47,14 @@ struct kvm_stats_debugfs_item debugfs_entries[] = {
 	VCPU_STAT("dec", dec_exits),
 	VCPU_STAT("ext_intr", ext_intr_exits),
 	VCPU_STAT("queue_intr", queue_intr),
-	VCPU_STAT("halt_poll_success_ns", halt_poll_success_ns),
-	VCPU_STAT("halt_poll_fail_ns", halt_poll_fail_ns),
+	VCPU_STAT_GENERIC("halt_poll_success_ns", halt_poll_success_ns),
+	VCPU_STAT_GENERIC("halt_poll_fail_ns", halt_poll_fail_ns),
 	VCPU_STAT("halt_wait_ns", halt_wait_ns),
-	VCPU_STAT("halt_successful_poll", halt_successful_poll),
-	VCPU_STAT("halt_attempted_poll", halt_attempted_poll),
+	VCPU_STAT_GENERIC("halt_successful_poll", halt_successful_poll),
+	VCPU_STAT_GENERIC("halt_attempted_poll", halt_attempted_poll),
 	VCPU_STAT("halt_successful_wait", halt_successful_wait),
-	VCPU_STAT("halt_poll_invalid", halt_poll_invalid),
-	VCPU_STAT("halt_wakeup", halt_wakeup),
+	VCPU_STAT_GENERIC("halt_poll_invalid", halt_poll_invalid),
+	VCPU_STAT_GENERIC("halt_wakeup", halt_wakeup),
 	VCPU_STAT("pf_storage", pf_storage),
 	VCPU_STAT("sp_storage", sp_storage),
 	VCPU_STAT("pf_instruc", pf_instruc),
diff --git a/arch/powerpc/kvm/book3s_hv.c b/arch/powerpc/kvm/book3s_hv.c
index 28a80d240b76..2a1cf29ba3fd 100644
--- a/arch/powerpc/kvm/book3s_hv.c
+++ b/arch/powerpc/kvm/book3s_hv.c
@@ -236,7 +236,7 @@ static void kvmppc_fast_vcpu_kick_hv(struct kvm_vcpu *vcpu)
 
 	waitp = kvm_arch_vcpu_get_wait(vcpu);
 	if (rcuwait_wake_up(waitp))
-		++vcpu->stat.halt_wakeup;
+		++vcpu->stat.generic.halt_wakeup;
 
 	cpu = READ_ONCE(vcpu->arch.thread_cpu);
 	if (cpu >= 0 && kvmppc_ipi_thread(cpu))
@@ -3925,7 +3925,7 @@ static void kvmppc_vcore_blocked(struct kvmppc_vcore *vc)
 	cur = start_poll = ktime_get();
 	if (vc->halt_poll_ns) {
 		ktime_t stop = ktime_add_ns(start_poll, vc->halt_poll_ns);
-		++vc->runner->stat.halt_attempted_poll;
+		++vc->runner->stat.generic.halt_attempted_poll;
 
 		vc->vcore_state = VCORE_POLLING;
 		spin_unlock(&vc->lock);
@@ -3942,7 +3942,7 @@ static void kvmppc_vcore_blocked(struct kvmppc_vcore *vc)
 		vc->vcore_state = VCORE_INACTIVE;
 
 		if (!do_sleep) {
-			++vc->runner->stat.halt_successful_poll;
+			++vc->runner->stat.generic.halt_successful_poll;
 			goto out;
 		}
 	}
@@ -3954,7 +3954,7 @@ static void kvmppc_vcore_blocked(struct kvmppc_vcore *vc)
 		do_sleep = 0;
 		/* If we polled, count this as a successful poll */
 		if (vc->halt_poll_ns)
-			++vc->runner->stat.halt_successful_poll;
+			++vc->runner->stat.generic.halt_successful_poll;
 		goto out;
 	}
 
@@ -3981,13 +3981,13 @@ static void kvmppc_vcore_blocked(struct kvmppc_vcore *vc)
 			ktime_to_ns(cur) - ktime_to_ns(start_wait);
 		/* Attribute failed poll time */
 		if (vc->halt_poll_ns)
-			vc->runner->stat.halt_poll_fail_ns +=
+			vc->runner->stat.generic.halt_poll_fail_ns +=
 				ktime_to_ns(start_wait) -
 				ktime_to_ns(start_poll);
 	} else {
 		/* Attribute successful poll time */
 		if (vc->halt_poll_ns)
-			vc->runner->stat.halt_poll_success_ns +=
+			vc->runner->stat.generic.halt_poll_success_ns +=
 				ktime_to_ns(cur) -
 				ktime_to_ns(start_poll);
 	}
diff --git a/arch/powerpc/kvm/book3s_pr.c b/arch/powerpc/kvm/book3s_pr.c
index d7733b07f489..71bcb0140461 100644
--- a/arch/powerpc/kvm/book3s_pr.c
+++ b/arch/powerpc/kvm/book3s_pr.c
@@ -493,7 +493,7 @@ static void kvmppc_set_msr_pr(struct kvm_vcpu *vcpu, u64 msr)
 		if (!vcpu->arch.pending_exceptions) {
 			kvm_vcpu_block(vcpu);
 			kvm_clear_request(KVM_REQ_UNHALT, vcpu);
-			vcpu->stat.halt_wakeup++;
+			vcpu->stat.generic.halt_wakeup++;
 
 			/* Unset POW bit after we woke up */
 			msr &= ~MSR_POW;
diff --git a/arch/powerpc/kvm/book3s_pr_papr.c b/arch/powerpc/kvm/book3s_pr_papr.c
index 031c8015864a..ac14239f3424 100644
--- a/arch/powerpc/kvm/book3s_pr_papr.c
+++ b/arch/powerpc/kvm/book3s_pr_papr.c
@@ -378,7 +378,7 @@ int kvmppc_h_pr(struct kvm_vcpu *vcpu, unsigned long cmd)
 		kvmppc_set_msr_fast(vcpu, kvmppc_get_msr(vcpu) | MSR_EE);
 		kvm_vcpu_block(vcpu);
 		kvm_clear_request(KVM_REQ_UNHALT, vcpu);
-		vcpu->stat.halt_wakeup++;
+		vcpu->stat.generic.halt_wakeup++;
 		return EMULATE_DONE;
 	case H_LOGICAL_CI_LOAD:
 		return kvmppc_h_pr_logical_ci_load(vcpu);
diff --git a/arch/powerpc/kvm/booke.c b/arch/powerpc/kvm/booke.c
index 7d5fe43f85c4..80d3b39aa7ac 100644
--- a/arch/powerpc/kvm/booke.c
+++ b/arch/powerpc/kvm/booke.c
@@ -49,15 +49,15 @@ struct kvm_stats_debugfs_item debugfs_entries[] = {
 	VCPU_STAT("inst_emu", emulated_inst_exits),
 	VCPU_STAT("dec", dec_exits),
 	VCPU_STAT("ext_intr", ext_intr_exits),
-	VCPU_STAT("halt_successful_poll", halt_successful_poll),
-	VCPU_STAT("halt_attempted_poll", halt_attempted_poll),
-	VCPU_STAT("halt_poll_invalid", halt_poll_invalid),
-	VCPU_STAT("halt_wakeup", halt_wakeup),
+	VCPU_STAT_GENERIC("halt_successful_poll", halt_successful_poll),
+	VCPU_STAT_GENERIC("halt_attempted_poll", halt_attempted_poll),
+	VCPU_STAT_GENERIC("halt_poll_invalid", halt_poll_invalid),
+	VCPU_STAT_GENERIC("halt_wakeup", halt_wakeup),
 	VCPU_STAT("doorbell", dbell_exits),
 	VCPU_STAT("guest doorbell", gdbell_exits),
-	VCPU_STAT("halt_poll_success_ns", halt_poll_success_ns),
-	VCPU_STAT("halt_poll_fail_ns", halt_poll_fail_ns),
-	VM_STAT("remote_tlb_flush", remote_tlb_flush),
+	VCPU_STAT_GENERIC("halt_poll_success_ns", halt_poll_success_ns),
+	VCPU_STAT_GENERIC("halt_poll_fail_ns", halt_poll_fail_ns),
+	VM_STAT_GENERIC("remote_tlb_flush", remote_tlb_flush),
 	{ NULL }
 };
 
diff --git a/arch/s390/include/asm/kvm_host.h b/arch/s390/include/asm/kvm_host.h
index 8925f3969478..9b4473f76e56 100644
--- a/arch/s390/include/asm/kvm_host.h
+++ b/arch/s390/include/asm/kvm_host.h
@@ -361,6 +361,7 @@ struct sie_page {
 };
 
 struct kvm_vcpu_stat {
+	struct kvm_vcpu_stat_generic generic;
 	u64 exit_userspace;
 	u64 exit_null;
 	u64 exit_external_request;
@@ -370,13 +371,7 @@ struct kvm_vcpu_stat {
 	u64 exit_validity;
 	u64 exit_instruction;
 	u64 exit_pei;
-	u64 halt_successful_poll;
-	u64 halt_attempted_poll;
-	u64 halt_poll_invalid;
 	u64 halt_no_poll_steal;
-	u64 halt_wakeup;
-	u64 halt_poll_success_ns;
-	u64 halt_poll_fail_ns;
 	u64 instruction_lctl;
 	u64 instruction_lctlg;
 	u64 instruction_stctl;
@@ -755,12 +750,12 @@ struct kvm_vcpu_arch {
 };
 
 struct kvm_vm_stat {
+	struct kvm_vm_stat_generic generic;
 	u64 inject_io;
 	u64 inject_float_mchk;
 	u64 inject_pfault_done;
 	u64 inject_service_signal;
 	u64 inject_virtio;
-	u64 remote_tlb_flush;
 };
 
 struct kvm_arch_memory_slot {
diff --git a/arch/s390/kvm/kvm-s390.c b/arch/s390/kvm/kvm-s390.c
index 1296fc10f80c..e8bc7cd06794 100644
--- a/arch/s390/kvm/kvm-s390.c
+++ b/arch/s390/kvm/kvm-s390.c
@@ -72,13 +72,13 @@ struct kvm_stats_debugfs_item debugfs_entries[] = {
 	VCPU_STAT("exit_program_interruption", exit_program_interruption),
 	VCPU_STAT("exit_instr_and_program_int", exit_instr_and_program),
 	VCPU_STAT("exit_operation_exception", exit_operation_exception),
-	VCPU_STAT("halt_successful_poll", halt_successful_poll),
-	VCPU_STAT("halt_attempted_poll", halt_attempted_poll),
-	VCPU_STAT("halt_poll_invalid", halt_poll_invalid),
+	VCPU_STAT_GENERIC("halt_successful_poll", halt_successful_poll),
+	VCPU_STAT_GENERIC("halt_attempted_poll", halt_attempted_poll),
+	VCPU_STAT_GENERIC("halt_poll_invalid", halt_poll_invalid),
 	VCPU_STAT("halt_no_poll_steal", halt_no_poll_steal),
-	VCPU_STAT("halt_wakeup", halt_wakeup),
-	VCPU_STAT("halt_poll_success_ns", halt_poll_success_ns),
-	VCPU_STAT("halt_poll_fail_ns", halt_poll_fail_ns),
+	VCPU_STAT_GENERIC("halt_wakeup", halt_wakeup),
+	VCPU_STAT_GENERIC("halt_poll_success_ns", halt_poll_success_ns),
+	VCPU_STAT_GENERIC("halt_poll_fail_ns", halt_poll_fail_ns),
 	VCPU_STAT("instruction_lctlg", instruction_lctlg),
 	VCPU_STAT("instruction_lctl", instruction_lctl),
 	VCPU_STAT("instruction_stctl", instruction_stctl),
diff --git a/arch/x86/include/asm/kvm_host.h b/arch/x86/include/asm/kvm_host.h
index 55efbacfc244..db8eb3dead7e 100644
--- a/arch/x86/include/asm/kvm_host.h
+++ b/arch/x86/include/asm/kvm_host.h
@@ -1127,6 +1127,7 @@ struct kvm_arch {
 };
 
 struct kvm_vm_stat {
+	struct kvm_vm_stat_generic generic;
 	ulong mmu_shadow_zapped;
 	ulong mmu_pte_write;
 	ulong mmu_pde_zapped;
@@ -1134,13 +1135,13 @@ struct kvm_vm_stat {
 	ulong mmu_recycled;
 	ulong mmu_cache_miss;
 	ulong mmu_unsync;
-	ulong remote_tlb_flush;
 	ulong lpages;
 	ulong nx_lpage_splits;
 	ulong max_mmu_page_hash_collisions;
 };
 
 struct kvm_vcpu_stat {
+	struct kvm_vcpu_stat_generic generic;
 	u64 pf_fixed;
 	u64 pf_guest;
 	u64 tlb_flush;
@@ -1154,10 +1155,6 @@ struct kvm_vcpu_stat {
 	u64 nmi_window_exits;
 	u64 l1d_flush;
 	u64 halt_exits;
-	u64 halt_successful_poll;
-	u64 halt_attempted_poll;
-	u64 halt_poll_invalid;
-	u64 halt_wakeup;
 	u64 request_irq_exits;
 	u64 irq_exits;
 	u64 host_state_reload;
@@ -1168,8 +1165,6 @@ struct kvm_vcpu_stat {
 	u64 irq_injections;
 	u64 nmi_injections;
 	u64 req_event;
-	u64 halt_poll_success_ns;
-	u64 halt_poll_fail_ns;
 	u64 nested_run;
 	u64 directed_yield_attempted;
 	u64 directed_yield_successful;
diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c
index 9b6bca616929..96d10253218a 100644
--- a/arch/x86/kvm/x86.c
+++ b/arch/x86/kvm/x86.c
@@ -226,10 +226,10 @@ struct kvm_stats_debugfs_item debugfs_entries[] = {
 	VCPU_STAT("irq_window", irq_window_exits),
 	VCPU_STAT("nmi_window", nmi_window_exits),
 	VCPU_STAT("halt_exits", halt_exits),
-	VCPU_STAT("halt_successful_poll", halt_successful_poll),
-	VCPU_STAT("halt_attempted_poll", halt_attempted_poll),
-	VCPU_STAT("halt_poll_invalid", halt_poll_invalid),
-	VCPU_STAT("halt_wakeup", halt_wakeup),
+	VCPU_STAT_GENERIC("halt_successful_poll", halt_successful_poll),
+	VCPU_STAT_GENERIC("halt_attempted_poll", halt_attempted_poll),
+	VCPU_STAT_GENERIC("halt_poll_invalid", halt_poll_invalid),
+	VCPU_STAT_GENERIC("halt_wakeup", halt_wakeup),
 	VCPU_STAT("hypercalls", hypercalls),
 	VCPU_STAT("request_irq", request_irq_exits),
 	VCPU_STAT("irq_exits", irq_exits),
@@ -241,8 +241,8 @@ struct kvm_stats_debugfs_item debugfs_entries[] = {
 	VCPU_STAT("nmi_injections", nmi_injections),
 	VCPU_STAT("req_event", req_event),
 	VCPU_STAT("l1d_flush", l1d_flush),
-	VCPU_STAT("halt_poll_success_ns", halt_poll_success_ns),
-	VCPU_STAT("halt_poll_fail_ns", halt_poll_fail_ns),
+	VCPU_STAT_GENERIC("halt_poll_success_ns", halt_poll_success_ns),
+	VCPU_STAT_GENERIC("halt_poll_fail_ns", halt_poll_fail_ns),
 	VCPU_STAT("nested_run", nested_run),
 	VCPU_STAT("directed_yield_attempted", directed_yield_attempted),
 	VCPU_STAT("directed_yield_successful", directed_yield_successful),
@@ -253,7 +253,7 @@ struct kvm_stats_debugfs_item debugfs_entries[] = {
 	VM_STAT("mmu_recycled", mmu_recycled),
 	VM_STAT("mmu_cache_miss", mmu_cache_miss),
 	VM_STAT("mmu_unsync", mmu_unsync),
-	VM_STAT("remote_tlb_flush", remote_tlb_flush),
+	VM_STAT_GENERIC("remote_tlb_flush", remote_tlb_flush),
 	VM_STAT("largepages", lpages, .mode = 0444),
 	VM_STAT("nx_largepages_splitted", nx_lpage_splits, .mode = 0444),
 	VM_STAT("max_mmu_page_hash_collisions", max_mmu_page_hash_collisions),
diff --git a/include/linux/kvm_host.h b/include/linux/kvm_host.h
index 2f34487e21f2..1870fa928762 100644
--- a/include/linux/kvm_host.h
+++ b/include/linux/kvm_host.h
@@ -1243,10 +1243,15 @@ struct kvm_stats_debugfs_item {
 #define KVM_DBGFS_GET_MODE(dbgfs_item)                                         \
 	((dbgfs_item)->mode ? (dbgfs_item)->mode : 0644)
 
-#define VM_STAT(n, x, ...) 							\
+#define VM_STAT(n, x, ...)						       \
 	{ n, offsetof(struct kvm, stat.x), KVM_STAT_VM, ## __VA_ARGS__ }
-#define VCPU_STAT(n, x, ...)							\
+#define VCPU_STAT(n, x, ...)						       \
 	{ n, offsetof(struct kvm_vcpu, stat.x), KVM_STAT_VCPU, ## __VA_ARGS__ }
+#define VM_STAT_GENERIC(n, x, ...)					       \
+	{ n, offsetof(struct kvm, stat.generic.x), KVM_STAT_VM, ## __VA_ARGS__ }
+#define VCPU_STAT_GENERIC(n, x, ...)					       \
+	{ n, offsetof(struct kvm_vcpu, stat.generic.x),			       \
+	  KVM_STAT_VCPU, ## __VA_ARGS__ }
 
 extern struct kvm_stats_debugfs_item debugfs_entries[];
 extern struct dentry *kvm_debugfs_dir;
diff --git a/include/linux/kvm_types.h b/include/linux/kvm_types.h
index a7580f69dda0..7c39489f9953 100644
--- a/include/linux/kvm_types.h
+++ b/include/linux/kvm_types.h
@@ -76,5 +76,17 @@ struct kvm_mmu_memory_cache {
 };
 #endif
 
+struct kvm_vm_stat_generic {
+	ulong remote_tlb_flush;
+};
+
+struct kvm_vcpu_stat_generic {
+	u64 halt_successful_poll;
+	u64 halt_attempted_poll;
+	u64 halt_poll_invalid;
+	u64 halt_wakeup;
+	u64 halt_poll_success_ns;
+	u64 halt_poll_fail_ns;
+};
 
 #endif /* __KVM_TYPES_H__ */
diff --git a/virt/kvm/kvm_main.c b/virt/kvm/kvm_main.c
index 6b4feb92dc79..f6ad5b080994 100644
--- a/virt/kvm/kvm_main.c
+++ b/virt/kvm/kvm_main.c
@@ -330,7 +330,7 @@ void kvm_flush_remote_tlbs(struct kvm *kvm)
 	 */
 	if (!kvm_arch_flush_remote_tlb(kvm)
 	    || kvm_make_all_cpus_request(kvm, KVM_REQ_TLB_FLUSH))
-		++kvm->stat.remote_tlb_flush;
+		++kvm->stat.generic.remote_tlb_flush;
 	cmpxchg(&kvm->tlbs_dirty, dirty_count, 0);
 }
 EXPORT_SYMBOL_GPL(kvm_flush_remote_tlbs);
@@ -2940,9 +2940,9 @@ static inline void
 update_halt_poll_stats(struct kvm_vcpu *vcpu, u64 poll_ns, bool waited)
 {
 	if (waited)
-		vcpu->stat.halt_poll_fail_ns += poll_ns;
+		vcpu->stat.generic.halt_poll_fail_ns += poll_ns;
 	else
-		vcpu->stat.halt_poll_success_ns += poll_ns;
+		vcpu->stat.generic.halt_poll_success_ns += poll_ns;
 }
 
 /*
@@ -2960,16 +2960,16 @@ void kvm_vcpu_block(struct kvm_vcpu *vcpu)
 	if (vcpu->halt_poll_ns && !kvm_arch_no_poll(vcpu)) {
 		ktime_t stop = ktime_add_ns(ktime_get(), vcpu->halt_poll_ns);
 
-		++vcpu->stat.halt_attempted_poll;
+		++vcpu->stat.generic.halt_attempted_poll;
 		do {
 			/*
 			 * This sets KVM_REQ_UNHALT if an interrupt
 			 * arrives.
 			 */
 			if (kvm_vcpu_check_block(vcpu) < 0) {
-				++vcpu->stat.halt_successful_poll;
+				++vcpu->stat.generic.halt_successful_poll;
 				if (!vcpu_valid_wakeup(vcpu))
-					++vcpu->stat.halt_poll_invalid;
+					++vcpu->stat.generic.halt_poll_invalid;
 				goto out;
 			}
 			poll_end = cur = ktime_get();
@@ -3027,7 +3027,7 @@ bool kvm_vcpu_wake_up(struct kvm_vcpu *vcpu)
 	waitp = kvm_arch_vcpu_get_wait(vcpu);
 	if (rcuwait_wake_up(waitp)) {
 		WRITE_ONCE(vcpu->ready, true);
-		++vcpu->stat.halt_wakeup;
+		++vcpu->stat.generic.halt_wakeup;
 		return true;
 	}
 
-- 
2.32.0.rc1.229.g3e70b5a671-goog


^ permalink raw reply related	[flat|nested] 32+ messages in thread

* [PATCH v7 2/4] KVM: stats: Add fd-based API to read binary stats data
  2021-06-03 21:14 [PATCH v7 0/4] KVM statistics data fd-based binary interface Jing Zhang
  2021-06-03 21:14 ` [PATCH v7 1/4] KVM: stats: Separate generic stats from architecture specific ones Jing Zhang
@ 2021-06-03 21:14 ` Jing Zhang
  2021-06-07 19:22   ` Krish Sadhukhan
                     ` (3 more replies)
  2021-06-03 21:14 ` [PATCH v7 3/4] KVM: stats: Add documentation for statistics data binary interface Jing Zhang
                   ` (3 subsequent siblings)
  5 siblings, 4 replies; 32+ messages in thread
From: Jing Zhang @ 2021-06-03 21:14 UTC (permalink / raw)
  To: KVM, KVMARM, LinuxMIPS, KVMPPC, LinuxS390, Linuxkselftest,
	Paolo Bonzini, Marc Zyngier, James Morse, Julien Thierry,
	Suzuki K Poulose, Will Deacon, Huacai Chen, Aleksandar Markovic,
	Thomas Bogendoerfer, Paul Mackerras, Christian Borntraeger,
	Janosch Frank, David Hildenbrand, Cornelia Huck,
	Claudio Imbrenda, Sean Christopherson, Vitaly Kuznetsov,
	Jim Mattson, Peter Shier, Oliver Upton, David Rientjes,
	Emanuele Giuseppe Esposito, David Matlack, Ricardo Koller,
	Krish Sadhukhan
  Cc: Jing Zhang

Provides a file descriptor per VM to read VM stats info/data.
Provides a file descriptor per vCPU to read vCPU stats info/data.

Reviewed-by: David Matlack <dmatlack@google.com>
Reviewed-by: Ricardo Koller <ricarkol@google.com>
Signed-off-by: Jing Zhang <jingzhangos@google.com>
---
 arch/arm64/kvm/guest.c    |  26 +++++++
 arch/mips/kvm/mips.c      |  52 +++++++++++++
 arch/powerpc/kvm/book3s.c |  52 +++++++++++++
 arch/powerpc/kvm/booke.c  |  45 +++++++++++
 arch/s390/kvm/kvm-s390.c  | 117 ++++++++++++++++++++++++++++
 arch/x86/kvm/x86.c        |  53 +++++++++++++
 include/linux/kvm_host.h  | 132 ++++++++++++++++++++++++++++++++
 include/uapi/linux/kvm.h  |  50 ++++++++++++
 virt/kvm/kvm_main.c       | 155 ++++++++++++++++++++++++++++++++++++++
 9 files changed, 682 insertions(+)

diff --git a/arch/arm64/kvm/guest.c b/arch/arm64/kvm/guest.c
index 4962331d01e6..c68addd38cf8 100644
--- a/arch/arm64/kvm/guest.c
+++ b/arch/arm64/kvm/guest.c
@@ -28,6 +28,32 @@
 
 #include "trace.h"
 
+struct _kvm_stats_desc kvm_vm_stats_desc[] = DEFINE_VM_STATS_DESC();
+
+struct _kvm_stats_header kvm_vm_stats_header = {
+	.name_size = KVM_STATS_NAME_LEN,
+	.count = ARRAY_SIZE(kvm_vm_stats_desc),
+	.desc_offset = sizeof(struct kvm_stats_header),
+	.data_offset = sizeof(struct kvm_stats_header) +
+		sizeof(kvm_vm_stats_desc),
+};
+
+struct _kvm_stats_desc kvm_vcpu_stats_desc[] = DEFINE_VCPU_STATS_DESC(
+	STATS_DESC_COUNTER("hvc_exit_stat"),
+	STATS_DESC_COUNTER("wfe_exit_stat"),
+	STATS_DESC_COUNTER("wfi_exit_stat"),
+	STATS_DESC_COUNTER("mmio_exit_user"),
+	STATS_DESC_COUNTER("mmio_exit_kernel"),
+	STATS_DESC_COUNTER("exits"));
+
+struct _kvm_stats_header kvm_vcpu_stats_header = {
+	.name_size = KVM_STATS_NAME_LEN,
+	.count = ARRAY_SIZE(kvm_vcpu_stats_desc),
+	.desc_offset = sizeof(struct kvm_stats_header),
+	.data_offset = sizeof(struct kvm_stats_header) +
+		sizeof(kvm_vcpu_stats_desc),
+};
+
 struct kvm_stats_debugfs_item debugfs_entries[] = {
 	VCPU_STAT_GENERIC("halt_successful_poll", halt_successful_poll),
 	VCPU_STAT_GENERIC("halt_attempted_poll", halt_attempted_poll),
diff --git a/arch/mips/kvm/mips.c b/arch/mips/kvm/mips.c
index ff205b35719b..eb8fd4a96952 100644
--- a/arch/mips/kvm/mips.c
+++ b/arch/mips/kvm/mips.c
@@ -38,6 +38,58 @@
 #define VECTORSPACING 0x100	/* for EI/VI mode */
 #endif
 
+struct _kvm_stats_desc kvm_vm_stats_desc[] = DEFINE_VM_STATS_DESC();
+
+struct _kvm_stats_header kvm_vm_stats_header = {
+	.name_size = KVM_STATS_NAME_LEN,
+	.count = ARRAY_SIZE(kvm_vm_stats_desc),
+	.desc_offset = sizeof(struct kvm_stats_header),
+	.data_offset = sizeof(struct kvm_stats_header) +
+		sizeof(kvm_vm_stats_desc),
+};
+
+struct _kvm_stats_desc kvm_vcpu_stats_desc[] = DEFINE_VCPU_STATS_DESC(
+	STATS_DESC_COUNTER("wait_exits"),
+	STATS_DESC_COUNTER("cache_exits"),
+	STATS_DESC_COUNTER("signal_exits"),
+	STATS_DESC_COUNTER("int_exits"),
+	STATS_DESC_COUNTER("cop_unusable_exits"),
+	STATS_DESC_COUNTER("tlbmod_exits"),
+	STATS_DESC_COUNTER("tlbmiss_ld_exits"),
+	STATS_DESC_COUNTER("tlbmiss_st_exits"),
+	STATS_DESC_COUNTER("addrerr_st_exits"),
+	STATS_DESC_COUNTER("addrerr_ld_exits"),
+	STATS_DESC_COUNTER("syscall_exits"),
+	STATS_DESC_COUNTER("resvd_inst_exits"),
+	STATS_DESC_COUNTER("break_inst_exits"),
+	STATS_DESC_COUNTER("trap_inst_exits"),
+	STATS_DESC_COUNTER("msa_fpe_exits"),
+	STATS_DESC_COUNTER("fpe_exits"),
+	STATS_DESC_COUNTER("msa_disabled_exits"),
+	STATS_DESC_COUNTER("flush_dcache_exits"),
+#ifdef CONFIG_KVM_MIPS_VZ
+	STATS_DESC_COUNTER("vz_gpsi_exits"),
+	STATS_DESC_COUNTER("vz_gsfc_exits"),
+	STATS_DESC_COUNTER("vz_hc_exits"),
+	STATS_DESC_COUNTER("vz_grr_exits"),
+	STATS_DESC_COUNTER("vz_gva_exits"),
+	STATS_DESC_COUNTER("vz_ghfc_exits"),
+	STATS_DESC_COUNTER("vz_gpa_exits"),
+	STATS_DESC_COUNTER("vz_resvd_exits"),
+#ifdef CONFIG_CPU_LOONGSON64
+	STATS_DESC_COUNTER("vz_cpucfg_exits"),
+#endif
+#endif
+	);
+
+struct _kvm_stats_header kvm_vcpu_stats_header = {
+	.name_size = KVM_STATS_NAME_LEN,
+	.count = ARRAY_SIZE(kvm_vcpu_stats_desc),
+	.desc_offset = sizeof(struct kvm_stats_header),
+	.data_offset = sizeof(struct kvm_stats_header) +
+		sizeof(kvm_vcpu_stats_desc),
+};
+
 struct kvm_stats_debugfs_item debugfs_entries[] = {
 	VCPU_STAT("wait", wait_exits),
 	VCPU_STAT("cache", cache_exits),
diff --git a/arch/powerpc/kvm/book3s.c b/arch/powerpc/kvm/book3s.c
index 92cdb4175945..53fb1da049a7 100644
--- a/arch/powerpc/kvm/book3s.c
+++ b/arch/powerpc/kvm/book3s.c
@@ -38,6 +38,58 @@
 
 /* #define EXIT_DEBUG */
 
+struct _kvm_stats_desc kvm_vm_stats_desc[] = DEFINE_VM_STATS_DESC(
+	STATS_DESC_ICOUNTER("num_2M_pages"),
+	STATS_DESC_ICOUNTER("num_1G_pages"));
+
+struct _kvm_stats_header kvm_vm_stats_header = {
+	.name_size = KVM_STATS_NAME_LEN,
+	.count = ARRAY_SIZE(kvm_vm_stats_desc),
+	.desc_offset = sizeof(struct kvm_stats_header),
+	.data_offset = sizeof(struct kvm_stats_header) +
+		sizeof(kvm_vm_stats_desc),
+};
+
+struct _kvm_stats_desc kvm_vcpu_stats_desc[] = DEFINE_VCPU_STATS_DESC(
+	STATS_DESC_COUNTER("sum_exits"),
+	STATS_DESC_COUNTER("mmio_exits"),
+	STATS_DESC_COUNTER("signal_exits"),
+	STATS_DESC_COUNTER("light_exits"),
+	STATS_DESC_COUNTER("itlb_real_miss_exits"),
+	STATS_DESC_COUNTER("itlb_virt_miss_exits"),
+	STATS_DESC_COUNTER("dtlb_real_miss_exits"),
+	STATS_DESC_COUNTER("dtlb_virt_miss_exits"),
+	STATS_DESC_COUNTER("syscall_exits"),
+	STATS_DESC_COUNTER("isi_exits"),
+	STATS_DESC_COUNTER("dsi_exits"),
+	STATS_DESC_COUNTER("emulated_inst_exits"),
+	STATS_DESC_COUNTER("dec_exits"),
+	STATS_DESC_COUNTER("ext_intr_exits"),
+	STATS_DESC_TIME_NSEC("halt_wait_ns"),
+	STATS_DESC_COUNTER("halt_successful_wait"),
+	STATS_DESC_COUNTER("dbell_exits"),
+	STATS_DESC_COUNTER("gdbell_exits"),
+	STATS_DESC_COUNTER("ld"),
+	STATS_DESC_COUNTER("st"),
+	STATS_DESC_COUNTER("pf_storage"),
+	STATS_DESC_COUNTER("pf_instruc"),
+	STATS_DESC_COUNTER("sp_storage"),
+	STATS_DESC_COUNTER("sp_instruc"),
+	STATS_DESC_COUNTER("queue_intr"),
+	STATS_DESC_COUNTER("ld_slow"),
+	STATS_DESC_COUNTER("st_slow"),
+	STATS_DESC_COUNTER("pthru_all"),
+	STATS_DESC_COUNTER("pthru_host"),
+	STATS_DESC_COUNTER("pthru_bad_aff"));
+
+struct _kvm_stats_header kvm_vcpu_stats_header = {
+	.name_size = KVM_STATS_NAME_LEN,
+	.count = ARRAY_SIZE(kvm_vcpu_stats_desc),
+	.desc_offset = sizeof(struct kvm_stats_header),
+	.data_offset = sizeof(struct kvm_stats_header) +
+		sizeof(kvm_vcpu_stats_desc),
+};
+
 struct kvm_stats_debugfs_item debugfs_entries[] = {
 	VCPU_STAT("exits", sum_exits),
 	VCPU_STAT("mmio", mmio_exits),
diff --git a/arch/powerpc/kvm/booke.c b/arch/powerpc/kvm/booke.c
index 80d3b39aa7ac..02f1e147d224 100644
--- a/arch/powerpc/kvm/booke.c
+++ b/arch/powerpc/kvm/booke.c
@@ -36,6 +36,51 @@
 
 unsigned long kvmppc_booke_handlers;
 
+struct _kvm_stats_desc kvm_vm_stats_desc[] = DEFINE_VM_STATS_DESC(
+	STATS_DESC_ICOUNTER("num_2M_pages"),
+	STATS_DESC_ICOUNTER("num_1G_pages"));
+
+struct _kvm_stats_header kvm_vm_stats_header = {
+	.name_size = KVM_STATS_NAME_LEN,
+	.count = ARRAY_SIZE(kvm_vm_stats_desc),
+	.desc_offset = sizeof(struct kvm_stats_header),
+	.data_offset = sizeof(struct kvm_stats_header) +
+		sizeof(kvm_vm_stats_desc),
+};
+
+struct _kvm_stats_desc kvm_vcpu_stats_desc[] = DEFINE_VCPU_STATS_DESC(
+	STATS_DESC_COUNTER("sum_exits"),
+	STATS_DESC_COUNTER("mmio_exits"),
+	STATS_DESC_COUNTER("signal_exits"),
+	STATS_DESC_COUNTER("light_exits"),
+	STATS_DESC_COUNTER("itlb_real_miss_exits"),
+	STATS_DESC_COUNTER("itlb_virt_miss_exits"),
+	STATS_DESC_COUNTER("dtlb_real_miss_exits"),
+	STATS_DESC_COUNTER("dtlb_virt_miss_exits"),
+	STATS_DESC_COUNTER("syscall_exits"),
+	STATS_DESC_COUNTER("isi_exits"),
+	STATS_DESC_COUNTER("dsi_exits"),
+	STATS_DESC_COUNTER("emulated_inst_exits"),
+	STATS_DESC_COUNTER("dec_exits"),
+	STATS_DESC_COUNTER("ext_intr_exits"),
+	STATS_DESC_TIME_NSEC("halt_wait_ns"),
+	STATS_DESC_COUNTER("halt_successful_wait"),
+	STATS_DESC_COUNTER("dbell_exits"),
+	STATS_DESC_COUNTER("gdbell_exits"),
+	STATS_DESC_COUNTER("ld"),
+	STATS_DESC_COUNTER("st"),
+	STATS_DESC_COUNTER("pthru_all"),
+	STATS_DESC_COUNTER("pthru_host"),
+	STATS_DESC_COUNTER("pthru_bad_aff"));
+
+struct _kvm_stats_header kvm_vcpu_stats_header = {
+	.name_size = KVM_STATS_NAME_LEN,
+	.count = ARRAY_SIZE(kvm_vcpu_stats_desc),
+	.desc_offset = sizeof(struct kvm_stats_header),
+	.data_offset = sizeof(struct kvm_stats_header) +
+		sizeof(kvm_vcpu_stats_desc),
+};
+
 struct kvm_stats_debugfs_item debugfs_entries[] = {
 	VCPU_STAT("mmio", mmio_exits),
 	VCPU_STAT("sig", signal_exits),
diff --git a/arch/s390/kvm/kvm-s390.c b/arch/s390/kvm/kvm-s390.c
index e8bc7cd06794..4e2141bee3bc 100644
--- a/arch/s390/kvm/kvm-s390.c
+++ b/arch/s390/kvm/kvm-s390.c
@@ -58,6 +58,123 @@
 #define VCPU_IRQS_MAX_BUF (sizeof(struct kvm_s390_irq) * \
 			   (KVM_MAX_VCPUS + LOCAL_IRQS))
 
+struct _kvm_stats_desc kvm_vm_stats_desc[] = DEFINE_VM_STATS_DESC(
+	STATS_DESC_COUNTER("inject_io"),
+	STATS_DESC_COUNTER("inject_float_mchk"),
+	STATS_DESC_COUNTER("inject_pfault_done"),
+	STATS_DESC_COUNTER("inject_service_signal"),
+	STATS_DESC_COUNTER("inject_virtio"));
+
+struct _kvm_stats_header kvm_vm_stats_header = {
+	.name_size = KVM_STATS_NAME_LEN,
+	.count = ARRAY_SIZE(kvm_vm_stats_desc),
+	.desc_offset = sizeof(struct kvm_stats_header),
+	.data_offset = sizeof(struct kvm_stats_header) +
+		sizeof(kvm_vm_stats_desc),
+};
+
+struct _kvm_stats_desc kvm_vcpu_stats_desc[] = DEFINE_VCPU_STATS_DESC(
+	STATS_DESC_COUNTER("exit_userspace"),
+	STATS_DESC_COUNTER("exit_null"),
+	STATS_DESC_COUNTER("exit_external_request"),
+	STATS_DESC_COUNTER("exit_io_request"),
+	STATS_DESC_COUNTER("exit_external_interrupt"),
+	STATS_DESC_COUNTER("exit_stop_request"),
+	STATS_DESC_COUNTER("exit_validity"),
+	STATS_DESC_COUNTER("exit_instruction"),
+	STATS_DESC_COUNTER("exit_pei"),
+	STATS_DESC_COUNTER("halt_no_poll_steal"),
+	STATS_DESC_COUNTER("instruction_lctl"),
+	STATS_DESC_COUNTER("instruction_lctlg"),
+	STATS_DESC_COUNTER("instruction_stctl"),
+	STATS_DESC_COUNTER("instruction_stctg"),
+	STATS_DESC_COUNTER("exit_program_interruption"),
+	STATS_DESC_COUNTER("exit_instr_and_program"),
+	STATS_DESC_COUNTER("exit_operation_exception"),
+	STATS_DESC_COUNTER("deliver_ckc"),
+	STATS_DESC_COUNTER("deliver_cputm"),
+	STATS_DESC_COUNTER("deliver_external_call"),
+	STATS_DESC_COUNTER("deliver_emergency_signal"),
+	STATS_DESC_COUNTER("deliver_service_signal"),
+	STATS_DESC_COUNTER("deliver_virtio"),
+	STATS_DESC_COUNTER("deliver_stop_signal"),
+	STATS_DESC_COUNTER("deliver_prefix_signal"),
+	STATS_DESC_COUNTER("deliver_restart_signal"),
+	STATS_DESC_COUNTER("deliver_program"),
+	STATS_DESC_COUNTER("deliver_io"),
+	STATS_DESC_COUNTER("deliver_machine_check"),
+	STATS_DESC_COUNTER("exit_wait_state"),
+	STATS_DESC_COUNTER("inject_ckc"),
+	STATS_DESC_COUNTER("inject_cputm"),
+	STATS_DESC_COUNTER("inject_external_call"),
+	STATS_DESC_COUNTER("inject_emergency_signal"),
+	STATS_DESC_COUNTER("inject_mchk"),
+	STATS_DESC_COUNTER("inject_pfault_init"),
+	STATS_DESC_COUNTER("inject_program"),
+	STATS_DESC_COUNTER("inject_restart"),
+	STATS_DESC_COUNTER("inject_set_prefix"),
+	STATS_DESC_COUNTER("inject_stop_signal"),
+	STATS_DESC_COUNTER("instruction_epsw"),
+	STATS_DESC_COUNTER("instruction_gs"),
+	STATS_DESC_COUNTER("instruction_io_other"),
+	STATS_DESC_COUNTER("instruction_lpsw"),
+	STATS_DESC_COUNTER("instruction_lpswe"),
+	STATS_DESC_COUNTER("instruction_pfmf"),
+	STATS_DESC_COUNTER("instruction_ptff"),
+	STATS_DESC_COUNTER("instruction_sck"),
+	STATS_DESC_COUNTER("instruction_sckpf"),
+	STATS_DESC_COUNTER("instruction_stidp"),
+	STATS_DESC_COUNTER("instruction_spx"),
+	STATS_DESC_COUNTER("instruction_stpx"),
+	STATS_DESC_COUNTER("instruction_stap"),
+	STATS_DESC_COUNTER("instruction_iske"),
+	STATS_DESC_COUNTER("instruction_ri"),
+	STATS_DESC_COUNTER("instruction_rrbe"),
+	STATS_DESC_COUNTER("instruction_sske"),
+	STATS_DESC_COUNTER("instruction_ipte_interlock"),
+	STATS_DESC_COUNTER("instruction_stsi"),
+	STATS_DESC_COUNTER("instruction_stfl"),
+	STATS_DESC_COUNTER("instruction_tb"),
+	STATS_DESC_COUNTER("instruction_tpi"),
+	STATS_DESC_COUNTER("instruction_tprot"),
+	STATS_DESC_COUNTER("instruction_tsch"),
+	STATS_DESC_COUNTER("instruction_sie"),
+	STATS_DESC_COUNTER("instruction_essa"),
+	STATS_DESC_COUNTER("instruction_sthyi"),
+	STATS_DESC_COUNTER("instruction_sigp_sense"),
+	STATS_DESC_COUNTER("instruction_sigp_sense_running"),
+	STATS_DESC_COUNTER("instruction_sigp_external_call"),
+	STATS_DESC_COUNTER("instruction_sigp_emergency"),
+	STATS_DESC_COUNTER("instruction_sigp_cond_emergency"),
+	STATS_DESC_COUNTER("instruction_sigp_start"),
+	STATS_DESC_COUNTER("instruction_sigp_stop"),
+	STATS_DESC_COUNTER("instruction_sigp_stop_store_status"),
+	STATS_DESC_COUNTER("instruction_sigp_store_status"),
+	STATS_DESC_COUNTER("instruction_sigp_store_adtl_status"),
+	STATS_DESC_COUNTER("instruction_sigp_arch"),
+	STATS_DESC_COUNTER("instruction_sigp_prefix"),
+	STATS_DESC_COUNTER("instruction_sigp_restart"),
+	STATS_DESC_COUNTER("instruction_sigp_init_cpu_reset"),
+	STATS_DESC_COUNTER("instruction_sigp_cpu_reset"),
+	STATS_DESC_COUNTER("instruction_sigp_unknown"),
+	STATS_DESC_COUNTER("diagnose_10"),
+	STATS_DESC_COUNTER("diagnose_44"),
+	STATS_DESC_COUNTER("diagnose_9c"),
+	STATS_DESC_COUNTER("diagnose_9c_ignored"),
+	STATS_DESC_COUNTER("diagnose_258"),
+	STATS_DESC_COUNTER("diagnose_308"),
+	STATS_DESC_COUNTER("diagnose_500"),
+	STATS_DESC_COUNTER("diagnose_other"),
+	STATS_DESC_COUNTER("pfault_sync"));
+
+struct _kvm_stats_header kvm_vcpu_stats_header = {
+	.name_size = KVM_STATS_NAME_LEN,
+	.count = ARRAY_SIZE(kvm_vcpu_stats_desc),
+	.desc_offset = sizeof(struct kvm_stats_header),
+	.data_offset = sizeof(struct kvm_stats_header) +
+		sizeof(kvm_vcpu_stats_desc),
+};
+
 struct kvm_stats_debugfs_item debugfs_entries[] = {
 	VCPU_STAT("userspace_handled", exit_userspace),
 	VCPU_STAT("exit_null", exit_null),
diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c
index 96d10253218a..8e900c482626 100644
--- a/arch/x86/kvm/x86.c
+++ b/arch/x86/kvm/x86.c
@@ -214,6 +214,59 @@ EXPORT_SYMBOL_GPL(host_xss);
 u64 __read_mostly supported_xss;
 EXPORT_SYMBOL_GPL(supported_xss);
 
+struct _kvm_stats_desc kvm_vm_stats_desc[] = DEFINE_VM_STATS_DESC(
+	STATS_DESC_COUNTER("mmu_shadow_zapped"),
+	STATS_DESC_COUNTER("mmu_pte_write"),
+	STATS_DESC_COUNTER("mmu_pde_zapped"),
+	STATS_DESC_COUNTER("mmu_flooded"),
+	STATS_DESC_COUNTER("mmu_recycled"),
+	STATS_DESC_COUNTER("mmu_cache_miss"),
+	STATS_DESC_ICOUNTER("mmu_unsync"),
+	STATS_DESC_ICOUNTER("largepages"),
+	STATS_DESC_ICOUNTER("nx_largepages_splits"),
+	STATS_DESC_ICOUNTER("max_mmu_page_hash_collisions"));
+
+struct _kvm_stats_header kvm_vm_stats_header = {
+	.name_size = KVM_STATS_NAME_LEN,
+	.count = ARRAY_SIZE(kvm_vm_stats_desc),
+	.desc_offset = sizeof(struct kvm_stats_header),
+	.data_offset = sizeof(struct kvm_stats_header) +
+		sizeof(kvm_vm_stats_desc),
+};
+
+struct _kvm_stats_desc kvm_vcpu_stats_desc[] = DEFINE_VCPU_STATS_DESC(
+	STATS_DESC_COUNTER("pf_fixed"),
+	STATS_DESC_COUNTER("pf_guest"),
+	STATS_DESC_COUNTER("tlb_flush"),
+	STATS_DESC_COUNTER("invlpg"),
+	STATS_DESC_COUNTER("exits"),
+	STATS_DESC_COUNTER("io_exits"),
+	STATS_DESC_COUNTER("mmio_exits"),
+	STATS_DESC_COUNTER("signal_exits"),
+	STATS_DESC_COUNTER("irq_window_exits"),
+	STATS_DESC_COUNTER("nmi_window_exits"),
+	STATS_DESC_COUNTER("l1d_flush"),
+	STATS_DESC_COUNTER("halt_exits"),
+	STATS_DESC_COUNTER("request_irq_exits"),
+	STATS_DESC_COUNTER("irq_exits"),
+	STATS_DESC_COUNTER("host_state_reload"),
+	STATS_DESC_COUNTER("fpu_reload"),
+	STATS_DESC_COUNTER("insn_emulation"),
+	STATS_DESC_COUNTER("insn_emulation_fail"),
+	STATS_DESC_COUNTER("hypercalls"),
+	STATS_DESC_COUNTER("irq_injections"),
+	STATS_DESC_COUNTER("nmi_injections"),
+	STATS_DESC_COUNTER("req_event"),
+	STATS_DESC_COUNTER("nested_run"));
+
+struct _kvm_stats_header kvm_vcpu_stats_header = {
+	.name_size = KVM_STATS_NAME_LEN,
+	.count = ARRAY_SIZE(kvm_vcpu_stats_desc),
+	.desc_offset = sizeof(struct kvm_stats_header),
+	.data_offset = sizeof(struct kvm_stats_header) +
+		sizeof(kvm_vcpu_stats_desc),
+};
+
 struct kvm_stats_debugfs_item debugfs_entries[] = {
 	VCPU_STAT("pf_fixed", pf_fixed),
 	VCPU_STAT("pf_guest", pf_guest),
diff --git a/include/linux/kvm_host.h b/include/linux/kvm_host.h
index 1870fa928762..873e21af36be 100644
--- a/include/linux/kvm_host.h
+++ b/include/linux/kvm_host.h
@@ -1240,6 +1240,19 @@ struct kvm_stats_debugfs_item {
 	int mode;
 };
 
+struct _kvm_stats_header {
+	__u32 name_size;
+	__u32 count;
+	__u32 desc_offset;
+	__u32 data_offset;
+};
+
+#define KVM_STATS_NAME_LEN	48
+struct _kvm_stats_desc {
+	struct kvm_stats_desc desc;
+	char name[KVM_STATS_NAME_LEN];
+};
+
 #define KVM_DBGFS_GET_MODE(dbgfs_item)                                         \
 	((dbgfs_item)->mode ? (dbgfs_item)->mode : 0644)
 
@@ -1253,8 +1266,127 @@ struct kvm_stats_debugfs_item {
 	{ n, offsetof(struct kvm_vcpu, stat.generic.x),			       \
 	  KVM_STAT_VCPU, ## __VA_ARGS__ }
 
+#define STATS_DESC(stat, type, unit, scale, exp)			       \
+	{								       \
+		{							       \
+			.flags = type | unit | scale,			       \
+			.exponent = exp,				       \
+			.size = 1					       \
+		},							       \
+		.name = stat,						       \
+	}
+#define STATS_DESC_CUMULATIVE(name, unit, scale, exponent)		       \
+	STATS_DESC(name, KVM_STATS_TYPE_CUMULATIVE, unit, scale, exponent)
+#define STATS_DESC_INSTANT(name, unit, scale, exponent)			       \
+	STATS_DESC(name, KVM_STATS_TYPE_INSTANT, unit, scale, exponent)
+
+/* Cumulative counter */
+#define STATS_DESC_COUNTER(name)					       \
+	STATS_DESC_CUMULATIVE(name, KVM_STATS_UNIT_NONE,		       \
+		KVM_STATS_BASE_POW10, 0)
+/* Instantaneous counter */
+#define STATS_DESC_ICOUNTER(name)					       \
+	STATS_DESC_INSTANT(name, KVM_STATS_UNIT_NONE,			       \
+		KVM_STATS_BASE_POW10, 0)
+
+/* Cumulative clock cycles */
+#define STATS_DESC_CYCLE(name)						       \
+	STATS_DESC_CUMULATIVE(name, KVM_STATS_UNIT_CYCLES,		       \
+		KVM_STATS_BASE_POW10, 0)
+/* Instantaneous clock cycles */
+#define STATS_DESC_ICYCLE(name)						       \
+	STATS_DESC_INSTANT(name, KVM_STATS_UNIT_CYCLES,			       \
+		KVM_STATS_BASE_POW10, 0)
+
+/* Cumulative memory size in Byte */
+#define STATS_DESC_SIZE_BYTE(name)					       \
+	STATS_DESC_CUMULATIVE(name, KVM_STATS_UNIT_BYTES,		       \
+		KVM_STATS_BASE_POW2, 0)
+/* Cumulative memory size in KiByte */
+#define STATS_DESC_SIZE_KBYTE(name)					       \
+	STATS_DESC_CUMULATIVE(name, KVM_STATS_UNIT_BYTES,		       \
+		KVM_STATS_BASE_POW2, 10)
+/* Cumulative memory size in MiByte */
+#define STATS_DESC_SIZE_MBYTE(name)					       \
+	STATS_DESC_CUMULATIVE(name, KVM_STATS_UNIT_BYTES,		       \
+		KVM_STATS_BASE_POW2, 20)
+/* Cumulative memory size in GiByte */
+#define STATS_DESC_SIZE_GBYTE(name)					       \
+	STATS_DESC_CUMULATIVE(name, KVM_STATS_UNIT_BYTES,		       \
+		KVM_STATS_BASE_POW2, 30)
+
+/* Instantaneous memory size in Byte */
+#define STATS_DESC_ISIZE_BYTE(name)					       \
+	STATS_DESC_INSTANT(name, KVM_STATS_UNIT_BYTES,			       \
+		KVM_STATS_BASE_POW2, 0)
+/* Instantaneous memory size in KiByte */
+#define STATS_DESC_ISIZE_KBYTE(name)					       \
+	STATS_DESC_INSTANT(name, KVM_STATS_UNIT_BYTES,			       \
+		KVM_STATS_BASE_POW2, 10)
+/* Instantaneous memory size in MiByte */
+#define STATS_DESC_ISIZE_MBYTE(name)					       \
+	STATS_DESC_INSTANT(name, KVM_STATS_UNIT_BYTES,			       \
+		KVM_STATS_BASE_POW2, 20)
+/* Instantaneous memory size in GiByte */
+#define STATS_DESC_ISIZE_GBYTE(name)					       \
+	STATS_DESC_INSTANT(name, KVM_STATS_UNIT_BYTES,			       \
+		KVM_STATS_BASE_POW2, 30)
+
+/* Cumulative time in second */
+#define STATS_DESC_TIME_SEC(name)					       \
+	STATS_DESC_CUMULATIVE(name, KVM_STATS_UNIT_SECONDS,		       \
+		KVM_STATS_BASE_POW10, 0)
+/* Cumulative time in millisecond */
+#define STATS_DESC_TIME_MSEC(name)					       \
+	STATS_DESC_CUMULATIVE(name, KVM_STATS_UNIT_SECONDS,		       \
+		KVM_STATS_BASE_POW10, -3)
+/* Cumulative time in microsecond */
+#define STATS_DESC_TIME_USEC(name)					       \
+	STATS_DESC_CUMULATIVE(name, KVM_STATS_UNIT_SECONDS,		       \
+		KVM_STATS_BASE_POW10, -6)
+/* Cumulative time in nanosecond */
+#define STATS_DESC_TIME_NSEC(name)					       \
+	STATS_DESC_CUMULATIVE(name, KVM_STATS_UNIT_SECONDS,		       \
+		KVM_STATS_BASE_POW10, -9)
+
+/* Instantaneous time in second */
+#define STATS_DESC_ITIME_SEC(name)					       \
+	STATS_DESC_INSTANT(name, KVM_STATS_UNIT_SECONDS,		       \
+		KVM_STATS_BASE_POW10, 0)
+/* Instantaneous time in millisecond */
+#define STATS_DESC_ITIME_MSEC(name)					       \
+	STATS_DESC_INSTANT(name, KVM_STATS_UNIT_SECONDS,		       \
+		KVM_STATS_BASE_POW10, -3)
+/* Instantaneous time in microsecond */
+#define STATS_DESC_ITIME_USEC(name)					       \
+	STATS_DESC_INSTANT(name, KVM_STATS_UNIT_SECONDS,		       \
+		KVM_STATS_BASE_POW10, -6)
+/* Instantaneous time in nanosecond */
+#define STATS_DESC_ITIME_NSEC(name)					       \
+	STATS_DESC_INSTANT(name, KVM_STATS_UNIT_SECONDS,		       \
+		KVM_STATS_BASE_POW10, -9)
+
+#define DEFINE_VM_STATS_DESC(...) {					       \
+	STATS_DESC_COUNTER("remote_tlb_flush"),				       \
+	## __VA_ARGS__							       \
+}
+
+#define DEFINE_VCPU_STATS_DESC(...) {					       \
+	STATS_DESC_COUNTER("halt_successful_poll"),			       \
+	STATS_DESC_COUNTER("halt_attempted_poll"),			       \
+	STATS_DESC_COUNTER("halt_poll_invalid"),			       \
+	STATS_DESC_COUNTER("halt_wakeup"),				       \
+	STATS_DESC_TIME_NSEC("halt_poll_success_ns"),			       \
+	STATS_DESC_TIME_NSEC("halt_poll_fail_ns"),			       \
+	## __VA_ARGS__							       \
+}
+
 extern struct kvm_stats_debugfs_item debugfs_entries[];
 extern struct dentry *kvm_debugfs_dir;
+extern struct _kvm_stats_header kvm_vm_stats_header;
+extern struct _kvm_stats_header kvm_vcpu_stats_header;
+extern struct _kvm_stats_desc kvm_vm_stats_desc[];
+extern struct _kvm_stats_desc kvm_vcpu_stats_desc[];
 
 #if defined(CONFIG_MMU_NOTIFIER) && defined(KVM_ARCH_WANT_MMU_NOTIFIER)
 static inline int mmu_notifier_retry(struct kvm *kvm, unsigned long mmu_seq)
diff --git a/include/uapi/linux/kvm.h b/include/uapi/linux/kvm.h
index 3fd9a7e9d90c..dcfa0315e3f9 100644
--- a/include/uapi/linux/kvm.h
+++ b/include/uapi/linux/kvm.h
@@ -1082,6 +1082,7 @@ struct kvm_ppc_resize_hpt {
 #define KVM_CAP_SGX_ATTRIBUTE 196
 #define KVM_CAP_VM_COPY_ENC_CONTEXT_FROM 197
 #define KVM_CAP_PTP_KVM 198
+#define KVM_CAP_STATS_BINARY_FD 199
 
 #ifdef KVM_CAP_IRQ_ROUTING
 
@@ -1898,4 +1899,53 @@ struct kvm_dirty_gfn {
 #define KVM_BUS_LOCK_DETECTION_OFF             (1 << 0)
 #define KVM_BUS_LOCK_DETECTION_EXIT            (1 << 1)
 
+#define KVM_STATS_ID_MAXLEN		64
+
+struct kvm_stats_header {
+	char id[KVM_STATS_ID_MAXLEN];
+	__u32 name_size;
+	__u32 count;
+	__u32 desc_offset;
+	__u32 data_offset;
+};
+
+#define KVM_STATS_TYPE_SHIFT		0
+#define KVM_STATS_TYPE_MASK		(0xF << KVM_STATS_TYPE_SHIFT)
+#define KVM_STATS_TYPE_CUMULATIVE	(0x0 << KVM_STATS_TYPE_SHIFT)
+#define KVM_STATS_TYPE_INSTANT		(0x1 << KVM_STATS_TYPE_SHIFT)
+#define KVM_STATS_TYPE_MAX		KVM_STATS_TYPE_INSTANT
+
+#define KVM_STATS_UNIT_SHIFT		4
+#define KVM_STATS_UNIT_MASK		(0xF << KVM_STATS_UNIT_SHIFT)
+#define KVM_STATS_UNIT_NONE		(0x0 << KVM_STATS_UNIT_SHIFT)
+#define KVM_STATS_UNIT_BYTES		(0x1 << KVM_STATS_UNIT_SHIFT)
+#define KVM_STATS_UNIT_SECONDS		(0x2 << KVM_STATS_UNIT_SHIFT)
+#define KVM_STATS_UNIT_CYCLES		(0x3 << KVM_STATS_UNIT_SHIFT)
+#define KVM_STATS_UNIT_MAX		KVM_STATS_UNIT_CYCLES
+
+#define KVM_STATS_BASE_SHIFT		8
+#define KVM_STATS_BASE_MASK		(0xF << KVM_STATS_BASE_SHIFT)
+#define KVM_STATS_BASE_POW10		(0x0 << KVM_STATS_BASE_SHIFT)
+#define KVM_STATS_BASE_POW2		(0x1 << KVM_STATS_BASE_SHIFT)
+#define KVM_STATS_BASE_MAX		KVM_STATS_BASE_POW2
+
+struct kvm_stats_desc {
+	__u32 flags;
+	__s16 exponent;
+	__u16 size;
+	__u32 unused1;
+	__u32 unused2;
+	char name[0];
+};
+
+struct kvm_vm_stats_data {
+	unsigned long value[0];
+};
+
+struct kvm_vcpu_stats_data {
+	__u64 value[0];
+};
+
+#define KVM_GET_STATS_FD  _IOR(KVMIO,  0xcc, struct kvm_stats_header)
+
 #endif /* __LINUX_KVM_H */
diff --git a/virt/kvm/kvm_main.c b/virt/kvm/kvm_main.c
index f6ad5b080994..d84bb17bdea8 100644
--- a/virt/kvm/kvm_main.c
+++ b/virt/kvm/kvm_main.c
@@ -3409,6 +3409,115 @@ static int kvm_vcpu_ioctl_set_sigmask(struct kvm_vcpu *vcpu, sigset_t *sigset)
 	return 0;
 }
 
+static ssize_t kvm_stats_read(char *id, struct _kvm_stats_header *header,
+		struct _kvm_stats_desc *desc, void *stats, size_t size_stats,
+		char __user *user_buffer, size_t size, loff_t *offset)
+{
+	ssize_t copylen, len, remain = size;
+	size_t size_header, size_desc;
+	loff_t pos = *offset;
+	char __user *dest = user_buffer;
+	void *src;
+
+	size_header = sizeof(*header);
+	size_desc = header->count * sizeof(*desc);
+
+	len = KVM_STATS_ID_MAXLEN + size_header + size_desc + size_stats - pos;
+	len = min(len, remain);
+	if (len <= 0)
+		return 0;
+	remain = len;
+
+	/* Copy kvm stats header id string */
+	copylen = KVM_STATS_ID_MAXLEN - pos;
+	copylen = min(copylen, remain);
+	if (copylen > 0) {
+		src = id + pos;
+		if (copy_to_user(dest, src, copylen))
+			return -EFAULT;
+		remain -= copylen;
+		pos += copylen;
+		dest += copylen;
+	}
+	/* Copy kvm stats header */
+	copylen = KVM_STATS_ID_MAXLEN + size_header - pos;
+	copylen = min(copylen, remain);
+	if (copylen > 0) {
+		src = header + pos - KVM_STATS_ID_MAXLEN;
+		if (copy_to_user(dest, src, copylen))
+			return -EFAULT;
+		remain -= copylen;
+		pos += copylen;
+		dest += copylen;
+	}
+	/* Copy kvm stats descriptors */
+	copylen = header->desc_offset + size_desc - pos;
+	copylen = min(copylen, remain);
+	if (copylen > 0) {
+		src = desc + pos - header->desc_offset;
+		if (copy_to_user(dest, src, copylen))
+			return -EFAULT;
+		remain -= copylen;
+		pos += copylen;
+		dest += copylen;
+	}
+	/* Copy kvm stats values */
+	copylen = header->data_offset + size_stats - pos;
+	copylen = min(copylen, remain);
+	if (copylen > 0) {
+		src = stats + pos - header->data_offset;
+		if (copy_to_user(dest, src, copylen))
+			return -EFAULT;
+		remain -= copylen;
+		pos += copylen;
+		dest += copylen;
+	}
+
+	*offset = pos;
+	return len;
+}
+
+static ssize_t kvm_vcpu_stats_read(struct file *file, char __user *user_buffer,
+			      size_t size, loff_t *offset)
+{
+	char id[KVM_STATS_ID_MAXLEN];
+	struct kvm_vcpu *vcpu = file->private_data;
+
+	snprintf(id, sizeof(id), "kvm-%d/vcpu-%d",
+			task_pid_nr(current), vcpu->vcpu_id);
+	return kvm_stats_read(id, &kvm_vcpu_stats_header,
+			&kvm_vcpu_stats_desc[0], &vcpu->stat,
+			sizeof(vcpu->stat), user_buffer, size, offset);
+}
+
+static const struct file_operations kvm_vcpu_stats_fops = {
+	.read = kvm_vcpu_stats_read,
+	.llseek = noop_llseek,
+};
+
+static int kvm_vcpu_ioctl_get_stats_fd(struct kvm_vcpu *vcpu)
+{
+	int fd;
+	struct file *file;
+	char name[15 + ITOA_MAX_LEN + 1];
+
+	snprintf(name, sizeof(name), "kvm-vcpu-stats:%d", vcpu->vcpu_id);
+
+	fd = get_unused_fd_flags(O_CLOEXEC);
+	if (fd < 0)
+		return fd;
+
+	file = anon_inode_getfile(name, &kvm_vcpu_stats_fops, vcpu, O_RDONLY);
+	if (IS_ERR(file)) {
+		put_unused_fd(fd);
+		return PTR_ERR(file);
+	}
+	file->f_mode |= FMODE_PREAD;
+	fd_install(fd, file);
+
+	return fd;
+}
+
 static long kvm_vcpu_ioctl(struct file *filp,
 			   unsigned int ioctl, unsigned long arg)
 {
@@ -3606,6 +3715,10 @@ static long kvm_vcpu_ioctl(struct file *filp,
 		r = kvm_arch_vcpu_ioctl_set_fpu(vcpu, fpu);
 		break;
 	}
+	case KVM_GET_STATS_FD: {
+		r = kvm_vcpu_ioctl_get_stats_fd(vcpu);
+		break;
+	}
 	default:
 		r = kvm_arch_vcpu_ioctl(filp, ioctl, arg);
 	}
@@ -3864,6 +3977,8 @@ static long kvm_vm_ioctl_check_extension_generic(struct kvm *kvm, long arg)
 #else
 		return 0;
 #endif
+	case KVM_CAP_STATS_BINARY_FD:
+		return 1;
 	default:
 		break;
 	}
@@ -3967,6 +4082,43 @@ static int kvm_vm_ioctl_enable_cap_generic(struct kvm *kvm,
 	}
 }
 
+static ssize_t kvm_vm_stats_read(struct file *file, char __user *user_buffer,
+			      size_t size, loff_t *offset)
+{
+	char id[KVM_STATS_ID_MAXLEN];
+	struct kvm *kvm = file->private_data;
+
+	snprintf(id, sizeof(id), "kvm-%d", task_pid_nr(current));
+	return kvm_stats_read(id, &kvm_vm_stats_header, &kvm_vm_stats_desc[0],
+		&kvm->stat, sizeof(kvm->stat), user_buffer, size, offset);
+}
+
+static const struct file_operations kvm_vm_stats_fops = {
+	.read = kvm_vm_stats_read,
+	.llseek = noop_llseek,
+};
+
+static int kvm_vm_ioctl_get_stats_fd(struct kvm *kvm)
+{
+	int fd;
+	struct file *file;
+
+	fd = get_unused_fd_flags(O_CLOEXEC);
+	if (fd < 0)
+		return fd;
+
+	file = anon_inode_getfile("kvm-vm-stats",
+			&kvm_vm_stats_fops, kvm, O_RDONLY);
+	if (IS_ERR(file)) {
+		put_unused_fd(fd);
+		return PTR_ERR(file);
+	}
+	file->f_mode |= FMODE_PREAD;
+	fd_install(fd, file);
+
+	return fd;
+}
+
 static long kvm_vm_ioctl(struct file *filp,
 			   unsigned int ioctl, unsigned long arg)
 {
@@ -4149,6 +4301,9 @@ static long kvm_vm_ioctl(struct file *filp,
 	case KVM_RESET_DIRTY_RINGS:
 		r = kvm_vm_ioctl_reset_dirty_pages(kvm);
 		break;
+	case KVM_GET_STATS_FD:
+		r = kvm_vm_ioctl_get_stats_fd(kvm);
+		break;
 	default:
 		r = kvm_arch_vm_ioctl(filp, ioctl, arg);
 	}
-- 
2.32.0.rc1.229.g3e70b5a671-goog


^ permalink raw reply related	[flat|nested] 32+ messages in thread

* [PATCH v7 3/4] KVM: stats: Add documentation for statistics data binary interface
  2021-06-03 21:14 [PATCH v7 0/4] KVM statistics data fd-based binary interface Jing Zhang
  2021-06-03 21:14 ` [PATCH v7 1/4] KVM: stats: Separate generic stats from architecture specific ones Jing Zhang
  2021-06-03 21:14 ` [PATCH v7 2/4] KVM: stats: Add fd-based API to read binary stats data Jing Zhang
@ 2021-06-03 21:14 ` Jing Zhang
  2021-06-07 19:22   ` Krish Sadhukhan
  2021-06-14  7:56   ` Fuad Tabba
  2021-06-03 21:14 ` [PATCH v7 4/4] KVM: selftests: Add selftest for KVM " Jing Zhang
                   ` (2 subsequent siblings)
  5 siblings, 2 replies; 32+ messages in thread
From: Jing Zhang @ 2021-06-03 21:14 UTC (permalink / raw)
  To: KVM, KVMARM, LinuxMIPS, KVMPPC, LinuxS390, Linuxkselftest,
	Paolo Bonzini, Marc Zyngier, James Morse, Julien Thierry,
	Suzuki K Poulose, Will Deacon, Huacai Chen, Aleksandar Markovic,
	Thomas Bogendoerfer, Paul Mackerras, Christian Borntraeger,
	Janosch Frank, David Hildenbrand, Cornelia Huck,
	Claudio Imbrenda, Sean Christopherson, Vitaly Kuznetsov,
	Jim Mattson, Peter Shier, Oliver Upton, David Rientjes,
	Emanuele Giuseppe Esposito, David Matlack, Ricardo Koller,
	Krish Sadhukhan
  Cc: Jing Zhang

Update KVM API documentation for binary statistics.

Reviewed-by: David Matlack <dmatlack@google.com>
Reviewed-by: Ricardo Koller <ricarkol@google.com>
Signed-off-by: Jing Zhang <jingzhangos@google.com>
---
 Documentation/virt/kvm/api.rst | 180 +++++++++++++++++++++++++++++++++
 1 file changed, 180 insertions(+)

diff --git a/Documentation/virt/kvm/api.rst b/Documentation/virt/kvm/api.rst
index 7fcb2fd38f42..550bfbdf611b 100644
--- a/Documentation/virt/kvm/api.rst
+++ b/Documentation/virt/kvm/api.rst
@@ -5034,6 +5034,178 @@ see KVM_XEN_VCPU_SET_ATTR above.
 The KVM_XEN_VCPU_ATTR_TYPE_RUNSTATE_ADJUST type may not be used
 with the KVM_XEN_VCPU_GET_ATTR ioctl.
 
+4.130 KVM_GET_STATS_FD
+---------------------
+
+:Capability: KVM_CAP_STATS_BINARY_FD
+:Architectures: all
+:Type: vm ioctl, vcpu ioctl
+:Parameters: none
+:Returns: statistics file descriptor on success, < 0 on error
+
+Errors:
+
+  ======     ======================================================
+  ENOMEM     if the fd could not be created due to lack of memory
+  EMFILE     if the number of opened files exceeds the limit
+  ======     ======================================================
+
+The file descriptor can be used to read VM/vCPU statistics data in binary
+format. The file data is organized into three blocks as below:
++-------------+
+|   Header    |
++-------------+
+| Descriptors |
++-------------+
+| Stats Data  |
++-------------+
+
+The Header block is always at the start of the file. It is only needed to be
+read one time for the lifetime of the file descriptor.
+It is in the form of ``struct kvm_stats_header`` as below::
+
+	#define KVM_STATS_ID_MAXLEN		64
+
+	struct kvm_stats_header {
+		char id[KVM_STATS_ID_MAXLEN];
+		__u32 name_size;
+		__u32 count;
+		__u32 desc_offset;
+		__u32 data_offset;
+	};
+
+The ``id`` field is identification for the corresponding KVM statistics. For
+VM statistics, it is in the form of "kvm-{kvm pid}", like "kvm-12345". For
+VCPU statistics, it is in the form of "kvm-{kvm pid}/vcpu-{vcpu id}", like
+"kvm-12345/vcpu-12".
+
+The ``name_size`` field is the size (byte) of the statistics name string
+(including trailing '\0') appended to the end of every statistics descriptor.
+
+The ``count`` field is the number of statistics.
+
+The ``desc_offset`` field is the offset of the Descriptors block from the start
+of the file indicated by the file descriptor.
+
+The ``data_offset`` field is the offset of the Stats Data block from the start
+of the file indicated by the file descriptor.
+
+The Descriptors block is only needed to be read once for the lifetime of the
+file descriptor. It is an array of ``struct kvm_stats_desc`` as shown in
+below code block::
+
+	#define KVM_STATS_TYPE_SHIFT		0
+	#define KVM_STATS_TYPE_MASK		(0xF << KVM_STATS_TYPE_SHIFT)
+	#define KVM_STATS_TYPE_CUMULATIVE	(0x0 << KVM_STATS_TYPE_SHIFT)
+	#define KVM_STATS_TYPE_INSTANT		(0x1 << KVM_STATS_TYPE_SHIFT)
+	#define KVM_STATS_TYPE_MAX		KVM_STATS_TYPE_INSTANT
+
+	#define KVM_STATS_UNIT_SHIFT		4
+	#define KVM_STATS_UNIT_MASK		(0xF << KVM_STATS_UNIT_SHIFT)
+	#define KVM_STATS_UNIT_NONE		(0x0 << KVM_STATS_UNIT_SHIFT)
+	#define KVM_STATS_UNIT_BYTES		(0x1 << KVM_STATS_UNIT_SHIFT)
+	#define KVM_STATS_UNIT_SECONDS		(0x2 << KVM_STATS_UNIT_SHIFT)
+	#define KVM_STATS_UNIT_CYCLES		(0x3 << KVM_STATS_UNIT_SHIFT)
+	#define KVM_STATS_UNIT_MAX		KVM_STATS_UNIT_CYCLES
+
+	#define KVM_STATS_BASE_SHIFT		8
+	#define KVM_STATS_BASE_MASK		(0xF << KVM_STATS_BASE_SHIFT)
+	#define KVM_STATS_BASE_POW10		(0x0 << KVM_STATS_BASE_SHIFT)
+	#define KVM_STATS_BASE_POW2		(0x1 << KVM_STATS_BASE_SHIFT)
+	#define KVM_STATS_BASE_MAX		KVM_STATS_BASE_POW2
+
+	struct kvm_stats_desc {
+		__u32 flags;
+		__s16 exponent;
+		__u16 size;
+		__u32 unused1;
+		__u32 unused2;
+		char name[0];
+	};
+
+The ``flags`` field contains the type and unit of the statistics data described
+by this descriptor. The following flags are supported:
+
+Bits 0-3 of ``flags`` encode the type:
+  * ``KVM_STATS_TYPE_CUMULATIVE``
+    The statistics data is cumulative. The value of data can only be increased.
+    Most of the counters used in KVM are of this type.
+    The corresponding ``count`` filed for this type is always 1.
+  * ``KVM_STATS_TYPE_INSTANT``
+    The statistics data is instantaneous. Its value can be increased or
+    decreased. This type is usually used as a measurement of some resources,
+    like the number of dirty pages, the number of large pages, etc.
+    The corresponding ``count`` field for this type is always 1.
+
+Bits 4-7 of ``flags`` encode the unit:
+  * ``KVM_STATS_UNIT_NONE``
+    There is no unit for the value of statistics data. This usually means that
+    the value is a simple counter of an event.
+  * ``KVM_STATS_UNIT_BYTES``
+    It indicates that the statistics data is used to measure memory size, in the
+    unit of Byte, KiByte, MiByte, GiByte, etc. The unit of the data is
+    determined by the ``exponent`` field in the descriptor. The
+    ``KVM_STATS_BASE_POW2`` flag is valid in this case. The unit of the data is
+    determined by ``pow(2, exponent)``. For example, if value is 10,
+    ``exponent`` is 20, which means the unit of statistics data is MiByte, we
+    can get the statistics data in the unit of Byte by
+    ``value * pow(2, exponent) = 10 * pow(2, 20) = 10 MiByte`` which is
+    10 * 1024 * 1024 Bytes.
+  * ``KVM_STATS_UNIT_SECONDS``
+    It indicates that the statistics data is used to measure time/latency, in
+    the unit of nanosecond, microsecond, millisecond and second. The unit of the
+    data is determined by the ``exponent`` field in the descriptor. The
+    ``KVM_STATS_BASE_POW10`` flag is valid in this case. The unit of the data
+    is determined by ``pow(10, exponent)``. For example, if value is 2000000,
+    ``exponent`` is -6, which means the unit of statistics data is microsecond,
+    we can get the statistics data in the unit of second by
+    ``value * pow(10, exponent) = 2000000 * pow(10, -6) = 2 seconds``.
+  * ``KVM_STATS_UNIT_CYCLES``
+    It indicates that the statistics data is used to measure CPU clock cycles.
+    The ``KVM_STATS_BASE_POW10`` flag is valid in this case. For example, if
+    value is 200, ``exponent`` is 4, we can get the number of CPU clock cycles
+    by ``value * pow(10, exponent) = 200 * pow(10, 4) = 2000000``.
+
+Bits 7-11 of ``flags`` encode the base:
+  * ``KVM_STATS_BASE_POW10``
+    The scale is based on power of 10. It is used for measurement of time and
+    CPU clock cycles.
+  * ``KVM_STATS_BASE_POW2``
+    The scale is based on power of 2. It is used for measurement of memory size.
+
+The ``exponent`` field is the scale of corresponding statistics data. For
+example, if the unit is ``KVM_STATS_UNIT_BYTES``, the base is
+``KVM_STATS_BASE_POW2``, the ``exponent`` is 10, then we know that the real
+unit of the statistics data is KBytes a.k.a pow(2, 10) = 1024 bytes.
+
+The ``size`` field is the number of values of this statistics data. It is in the
+unit of ``unsigned long`` for VM or ``__u64`` for VCPU.
+
+The ``unused1`` and ``unused2`` fields are reserved for future
+support for other types of statistics data, like log/linear histogram.
+
+The ``name`` field points to the name string of the statistics data. The name
+string starts at the end of ``struct kvm_stats_desc``.
+The maximum length (including trailing '\0') is indicated by ``name_size``
+in ``struct kvm_stats_header``.
+
+The Stats Data block contains an array of data values of type ``struct
+kvm_vm_stats_data`` or ``struct kvm_vcpu_stats_data``. It would be read by
+user space periodically to pull statistics data.
+The order of data value in Stats Data block is the same as the order of
+descriptors in Descriptors block.
+  * Statistics data for VM::
+
+	struct kvm_vm_stats_data {
+		unsigned long value[0];
+	};
+
+  * Statistics data for VCPU::
+
+	struct kvm_vcpu_stats_data {
+		__u64 value[0];
+	};
+
 5. The kvm_run structure
 ========================
 
@@ -6891,3 +7063,11 @@ This capability is always enabled.
 This capability indicates that the KVM virtual PTP service is
 supported in the host. A VMM can check whether the service is
 available to the guest on migration.
+
+8.33 KVM_CAP_STATS_BINARY_FD
+----------------------------
+
+:Architectures: all
+
+This capability indicates the feature that user space can create get a file
+descriptor for every VM and VCPU to read statistics data in binary format.
-- 
2.32.0.rc1.229.g3e70b5a671-goog


^ permalink raw reply related	[flat|nested] 32+ messages in thread

* [PATCH v7 4/4] KVM: selftests: Add selftest for KVM statistics data binary interface
  2021-06-03 21:14 [PATCH v7 0/4] KVM statistics data fd-based binary interface Jing Zhang
                   ` (2 preceding siblings ...)
  2021-06-03 21:14 ` [PATCH v7 3/4] KVM: stats: Add documentation for statistics data binary interface Jing Zhang
@ 2021-06-03 21:14 ` Jing Zhang
  2021-06-07 19:23   ` Krish Sadhukhan
  2021-06-10 16:46 ` [PATCH v7 0/4] KVM statistics data fd-based " Paolo Bonzini
  2021-06-11  6:57 ` Christian Borntraeger
  5 siblings, 1 reply; 32+ messages in thread
From: Jing Zhang @ 2021-06-03 21:14 UTC (permalink / raw)
  To: KVM, KVMARM, LinuxMIPS, KVMPPC, LinuxS390, Linuxkselftest,
	Paolo Bonzini, Marc Zyngier, James Morse, Julien Thierry,
	Suzuki K Poulose, Will Deacon, Huacai Chen, Aleksandar Markovic,
	Thomas Bogendoerfer, Paul Mackerras, Christian Borntraeger,
	Janosch Frank, David Hildenbrand, Cornelia Huck,
	Claudio Imbrenda, Sean Christopherson, Vitaly Kuznetsov,
	Jim Mattson, Peter Shier, Oliver Upton, David Rientjes,
	Emanuele Giuseppe Esposito, David Matlack, Ricardo Koller,
	Krish Sadhukhan
  Cc: Jing Zhang

Add selftest to check KVM stats descriptors validity.

Reviewed-by: David Matlack <dmatlack@google.com>
Reviewed-by: Ricardo Koller <ricarkol@google.com>
Signed-off-by: Jing Zhang <jingzhangos@google.com>
---
 tools/testing/selftests/kvm/.gitignore        |   1 +
 tools/testing/selftests/kvm/Makefile          |   3 +
 .../testing/selftests/kvm/include/kvm_util.h  |   3 +
 .../selftests/kvm/kvm_binary_stats_test.c     | 215 ++++++++++++++++++
 tools/testing/selftests/kvm/lib/kvm_util.c    |  12 +
 5 files changed, 234 insertions(+)
 create mode 100644 tools/testing/selftests/kvm/kvm_binary_stats_test.c

diff --git a/tools/testing/selftests/kvm/.gitignore b/tools/testing/selftests/kvm/.gitignore
index bd83158e0e0b..d1c3ee7d3e41 100644
--- a/tools/testing/selftests/kvm/.gitignore
+++ b/tools/testing/selftests/kvm/.gitignore
@@ -43,3 +43,4 @@
 /memslot_modification_stress_test
 /set_memory_region_test
 /steal_time
+/kvm_binary_stats_test
diff --git a/tools/testing/selftests/kvm/Makefile b/tools/testing/selftests/kvm/Makefile
index e439d027939d..0cd46d6d1e15 100644
--- a/tools/testing/selftests/kvm/Makefile
+++ b/tools/testing/selftests/kvm/Makefile
@@ -76,6 +76,7 @@ TEST_GEN_PROGS_x86_64 += kvm_page_table_test
 TEST_GEN_PROGS_x86_64 += memslot_modification_stress_test
 TEST_GEN_PROGS_x86_64 += set_memory_region_test
 TEST_GEN_PROGS_x86_64 += steal_time
+TEST_GEN_PROGS_x86_64 += kvm_binary_stats_test
 
 TEST_GEN_PROGS_aarch64 += aarch64/get-reg-list
 TEST_GEN_PROGS_aarch64 += aarch64/get-reg-list-sve
@@ -87,6 +88,7 @@ TEST_GEN_PROGS_aarch64 += kvm_create_max_vcpus
 TEST_GEN_PROGS_aarch64 += kvm_page_table_test
 TEST_GEN_PROGS_aarch64 += set_memory_region_test
 TEST_GEN_PROGS_aarch64 += steal_time
+TEST_GEN_PROGS_aarch64 += kvm_binary_stats_test
 
 TEST_GEN_PROGS_s390x = s390x/memop
 TEST_GEN_PROGS_s390x += s390x/resets
@@ -96,6 +98,7 @@ TEST_GEN_PROGS_s390x += dirty_log_test
 TEST_GEN_PROGS_s390x += kvm_create_max_vcpus
 TEST_GEN_PROGS_s390x += kvm_page_table_test
 TEST_GEN_PROGS_s390x += set_memory_region_test
+TEST_GEN_PROGS_s390x += kvm_binary_stats_test
 
 TEST_GEN_PROGS += $(TEST_GEN_PROGS_$(UNAME_M))
 LIBKVM += $(LIBKVM_$(UNAME_M))
diff --git a/tools/testing/selftests/kvm/include/kvm_util.h b/tools/testing/selftests/kvm/include/kvm_util.h
index a8f022794ce3..96d15da3d72e 100644
--- a/tools/testing/selftests/kvm/include/kvm_util.h
+++ b/tools/testing/selftests/kvm/include/kvm_util.h
@@ -387,4 +387,7 @@ uint64_t get_ucall(struct kvm_vm *vm, uint32_t vcpu_id, struct ucall *uc);
 #define GUEST_ASSERT_4(_condition, arg1, arg2, arg3, arg4) \
 	__GUEST_ASSERT((_condition), 4, (arg1), (arg2), (arg3), (arg4))
 
+int vm_get_stats_fd(struct kvm_vm *vm);
+int vcpu_get_stats_fd(struct kvm_vm *vm, uint32_t vcpuid);
+
 #endif /* SELFTEST_KVM_UTIL_H */
diff --git a/tools/testing/selftests/kvm/kvm_binary_stats_test.c b/tools/testing/selftests/kvm/kvm_binary_stats_test.c
new file mode 100644
index 000000000000..081983110dc5
--- /dev/null
+++ b/tools/testing/selftests/kvm/kvm_binary_stats_test.c
@@ -0,0 +1,215 @@
+// SPDX-License-Identifier: GPL-2.0-only
+/*
+ * kvm_binary_stats_test
+ *
+ * Copyright (C) 2021, Google LLC.
+ *
+ * Test the fd-based interface for KVM statistics.
+ */
+
+#define _GNU_SOURCE /* for program_invocation_short_name */
+#include <fcntl.h>
+#include <stdio.h>
+#include <stdlib.h>
+#include <string.h>
+#include <errno.h>
+
+#include "test_util.h"
+
+#include "kvm_util.h"
+#include "asm/kvm.h"
+#include "linux/kvm.h"
+
+void stats_test(int stats_fd, int size_stat)
+{
+	ssize_t ret;
+	int i;
+	size_t size_desc, size_data = 0;
+	struct kvm_stats_header header;
+	struct kvm_stats_desc *stats_desc, *pdesc;
+	void *stats_data;
+
+	/* Read kvm stats header */
+	ret = read(stats_fd, &header, sizeof(header));
+	TEST_ASSERT(ret == sizeof(header), "Read stats header");
+	size_desc = sizeof(*stats_desc) + header.name_size;
+
+	/* Check id string in header, that should start with "kvm" */
+	TEST_ASSERT(!strncmp(header.id, "kvm", 3) &&
+			strlen(header.id) < KVM_STATS_ID_MAXLEN,
+			"Invalid KVM stats type");
+
+	/* Sanity check for other fields in header */
+	if (header.count == 0) {
+		printf("No KVM stats defined!");
+		return;
+	}
+	/* Check overlap */
+	TEST_ASSERT(header.desc_offset > 0 && header.data_offset > 0
+			&& header.desc_offset >= sizeof(header)
+			&& header.data_offset >= sizeof(header),
+			"Invalid offset fields in header");
+	TEST_ASSERT(header.desc_offset > header.data_offset
+			|| (header.desc_offset + size_desc * header.count <=
+				header.data_offset),
+			"Descriptor block is overlapped with data block");
+
+	/* Allocate memory for stats descriptors */
+	stats_desc = calloc(header.count, size_desc);
+	TEST_ASSERT(stats_desc, "Allocate memory for stats descriptors");
+	/* Read kvm stats descriptors */
+	ret = pread(stats_fd, stats_desc,
+			size_desc * header.count, header.desc_offset);
+	TEST_ASSERT(ret == size_desc * header.count,
+			"Read KVM stats descriptors");
+
+	/* Sanity check for fields in descriptors */
+	for (i = 0; i < header.count; ++i) {
+		pdesc = (void *)stats_desc + i * size_desc;
+		/* Check type,unit,base boundaries */
+		TEST_ASSERT((pdesc->flags & KVM_STATS_TYPE_MASK)
+				<= KVM_STATS_TYPE_MAX, "Unknown KVM stats type");
+		TEST_ASSERT((pdesc->flags & KVM_STATS_UNIT_MASK)
+				<= KVM_STATS_UNIT_MAX, "Unknown KVM stats unit");
+		TEST_ASSERT((pdesc->flags & KVM_STATS_BASE_MASK)
+				<= KVM_STATS_BASE_MAX, "Unknown KVM stats base");
+		/* Check exponent for stats unit
+		 * Exponent for counter should be greater than or equal to 0
+		 * Exponent for unit bytes should be greater than or equal to 0
+		 * Exponent for unit seconds should be less than or equal to 0
+		 * Exponent for unit clock cycles should be greater than or
+		 * equal to 0
+		 */
+		switch (pdesc->flags & KVM_STATS_UNIT_MASK) {
+		case KVM_STATS_UNIT_NONE:
+		case KVM_STATS_UNIT_BYTES:
+		case KVM_STATS_UNIT_CYCLES:
+			TEST_ASSERT(pdesc->exponent >= 0,
+					"Unsupported KVM stats unit");
+			break;
+		case KVM_STATS_UNIT_SECONDS:
+			TEST_ASSERT(pdesc->exponent <= 0,
+					"Unsupported KVM stats unit");
+			break;
+		}
+		/* Check name string */
+		TEST_ASSERT(strlen(pdesc->name) < header.name_size,
+				"KVM stats name(%s) too long", pdesc->name);
+		/* Check size field, which should not be zero */
+		TEST_ASSERT(pdesc->size, "KVM descriptor(%s) with size of 0",
+				pdesc->name);
+		size_data += pdesc->size * size_stat;
+	}
+	/* Check overlap */
+	TEST_ASSERT(header.data_offset >= header.desc_offset
+			|| header.data_offset + size_data <= header.desc_offset,
+			"Data block is overlapped with Descriptor block");
+	/* Check validity of all stats data size */
+	TEST_ASSERT(size_data >= header.count * size_stat,
+			"Data size is not correct");
+
+	/* Allocate memory for stats data */
+	stats_data = malloc(size_data);
+	TEST_ASSERT(stats_data, "Allocate memory for stats data");
+	/* Read kvm stats data as a bulk */
+	ret = pread(stats_fd, stats_data, size_data, header.data_offset);
+	TEST_ASSERT(ret == size_data, "Read KVM stats data");
+	/* Read kvm stats data one by one */
+	size_data = 0;
+	for (i = 0; i < header.count; ++i) {
+		pdesc = (void *)stats_desc + i * size_desc;
+		ret = pread(stats_fd, stats_data, pdesc->size * size_stat,
+				header.data_offset + size_data);
+		TEST_ASSERT(ret == pdesc->size * size_stat,
+				"Read data of KVM stats: %s", pdesc->name);
+		size_data += pdesc->size * size_stat;
+	}
+
+	free(stats_data);
+	free(stats_desc);
+}
+
+
+void vm_stats_test(struct kvm_vm *vm)
+{
+	int stats_fd;
+	struct kvm_vm_stats_data *stats_data;
+
+	/* Get fd for VM stats */
+	stats_fd = vm_get_stats_fd(vm);
+	TEST_ASSERT(stats_fd >= 0, "Get VM stats fd");
+
+	stats_test(stats_fd, sizeof(stats_data->value[0]));
+	close(stats_fd);
+	TEST_ASSERT(fcntl(stats_fd, F_GETFD) == -1, "Stats fd not freed");
+}
+
+void vcpu_stats_test(struct kvm_vm *vm, int vcpu_id)
+{
+	int stats_fd;
+	struct kvm_vcpu_stats_data *stats_data;
+
+	/* Get fd for VCPU stats */
+	stats_fd = vcpu_get_stats_fd(vm, vcpu_id);
+	TEST_ASSERT(stats_fd >= 0, "Get VCPU stats fd");
+
+	stats_test(stats_fd, sizeof(stats_data->value[0]));
+	close(stats_fd);
+	TEST_ASSERT(fcntl(stats_fd, F_GETFD) == -1, "Stats fd not freed");
+}
+
+#define DEFAULT_NUM_VM		4
+#define DEFAULT_NUM_VCPU	4
+
+/*
+ * Usage: kvm_bin_form_stats [#vm] [#vcpu]
+ * The first parameter #vm set the number of VMs being created.
+ * The second parameter #vcpu set the number of VCPUs being created.
+ * By default, DEFAULT_NUM_VM VM and DEFAULT_NUM_VCPU VCPU for the VM would be
+ * created for testing.
+ */
+
+int main(int argc, char *argv[])
+{
+	int max_vm = DEFAULT_NUM_VM, max_vcpu = DEFAULT_NUM_VCPU, ret, i, j;
+	struct kvm_vm **vms;
+
+	/* Get the number of VMs and VCPUs that would be created for testing. */
+	if (argc > 1) {
+		max_vm = strtol(argv[1], NULL, 0);
+		if (max_vm <= 0)
+			max_vm = DEFAULT_NUM_VM;
+	}
+	if (argc > 2) {
+		max_vcpu = strtol(argv[2], NULL, 0);
+		if (max_vcpu <= 0)
+			max_vcpu = DEFAULT_NUM_VCPU;
+	}
+
+	/* Check the extension for binary stats */
+	ret = kvm_check_cap(KVM_CAP_STATS_BINARY_FD);
+	TEST_ASSERT(ret >= 0,
+			"Binary form statistics interface is not supported");
+
+	/* Create VMs and VCPUs */
+	vms = malloc(sizeof(vms[0]) * max_vm);
+	TEST_ASSERT(vms, "Allocate memory for storing VM pointers");
+	for (i = 0; i < max_vm; ++i) {
+		vms[i] = vm_create(VM_MODE_DEFAULT,
+				DEFAULT_GUEST_PHY_PAGES, O_RDWR);
+		for (j = 0; j < max_vcpu; ++j)
+			vm_vcpu_add(vms[i], j);
+	}
+
+	/* Check stats read for every VM and VCPU */
+	for (i = 0; i < max_vm; ++i) {
+		vm_stats_test(vms[i]);
+		for (j = 0; j < max_vcpu; ++j)
+			vcpu_stats_test(vms[i], j);
+	}
+
+	for (i = 0; i < max_vm; ++i)
+		kvm_vm_free(vms[i]);
+	free(vms);
+	return 0;
+}
diff --git a/tools/testing/selftests/kvm/lib/kvm_util.c b/tools/testing/selftests/kvm/lib/kvm_util.c
index fc83f6c5902d..10385b76fe11 100644
--- a/tools/testing/selftests/kvm/lib/kvm_util.c
+++ b/tools/testing/selftests/kvm/lib/kvm_util.c
@@ -2090,3 +2090,15 @@ unsigned int vm_calc_num_guest_pages(enum vm_guest_mode mode, size_t size)
 	n = DIV_ROUND_UP(size, vm_guest_mode_params[mode].page_size);
 	return vm_adjust_num_guest_pages(mode, n);
 }
+
+int vm_get_stats_fd(struct kvm_vm *vm)
+{
+	return ioctl(vm->fd, KVM_GET_STATS_FD, NULL);
+}
+
+int vcpu_get_stats_fd(struct kvm_vm *vm, uint32_t vcpuid)
+{
+	struct vcpu *vcpu = vcpu_find(vm, vcpuid);
+
+	return ioctl(vcpu->fd, KVM_GET_STATS_FD, NULL);
+}
-- 
2.32.0.rc1.229.g3e70b5a671-goog


^ permalink raw reply related	[flat|nested] 32+ messages in thread

* Re: [PATCH v7 1/4] KVM: stats: Separate generic stats from architecture specific ones
  2021-06-03 21:14 ` [PATCH v7 1/4] KVM: stats: Separate generic stats from architecture specific ones Jing Zhang
@ 2021-06-07 19:22   ` Krish Sadhukhan
  2021-06-11  6:57   ` Christian Borntraeger
  1 sibling, 0 replies; 32+ messages in thread
From: Krish Sadhukhan @ 2021-06-07 19:22 UTC (permalink / raw)
  To: Jing Zhang, KVM, KVMARM, LinuxMIPS, KVMPPC, LinuxS390,
	Linuxkselftest, Paolo Bonzini, Marc Zyngier, James Morse,
	Julien Thierry, Suzuki K Poulose, Will Deacon, Huacai Chen,
	Aleksandar Markovic, Thomas Bogendoerfer, Paul Mackerras,
	Christian Borntraeger, Janosch Frank, David Hildenbrand,
	Cornelia Huck, Claudio Imbrenda, Sean Christopherson,
	Vitaly Kuznetsov, Jim Mattson, Peter Shier, Oliver Upton,
	David Rientjes, Emanuele Giuseppe Esposito, David Matlack,
	Ricardo Koller


On 6/3/21 2:14 PM, Jing Zhang wrote:
> Put all generic statistics in a separate structure to ease
> statistics handling for the incoming new statistics API.
>
> No functional change intended.
>
> Reviewed-by: David Matlack <dmatlack@google.com>
> Reviewed-by: Ricardo Koller <ricarkol@google.com>
> Signed-off-by: Jing Zhang <jingzhangos@google.com>
> ---
>   arch/arm64/include/asm/kvm_host.h   |  9 ++-------
>   arch/arm64/kvm/guest.c              | 12 ++++++------
>   arch/mips/include/asm/kvm_host.h    |  9 ++-------
>   arch/mips/kvm/mips.c                | 12 ++++++------
>   arch/powerpc/include/asm/kvm_host.h |  9 ++-------
>   arch/powerpc/kvm/book3s.c           | 12 ++++++------
>   arch/powerpc/kvm/book3s_hv.c        | 12 ++++++------
>   arch/powerpc/kvm/book3s_pr.c        |  2 +-
>   arch/powerpc/kvm/book3s_pr_papr.c   |  2 +-
>   arch/powerpc/kvm/booke.c            | 14 +++++++-------
>   arch/s390/include/asm/kvm_host.h    |  9 ++-------
>   arch/s390/kvm/kvm-s390.c            | 12 ++++++------
>   arch/x86/include/asm/kvm_host.h     |  9 ++-------
>   arch/x86/kvm/x86.c                  | 14 +++++++-------
>   include/linux/kvm_host.h            |  9 +++++++--
>   include/linux/kvm_types.h           | 12 ++++++++++++
>   virt/kvm/kvm_main.c                 | 14 +++++++-------
>   17 files changed, 82 insertions(+), 90 deletions(-)
>
> diff --git a/arch/arm64/include/asm/kvm_host.h b/arch/arm64/include/asm/kvm_host.h
> index 7cd7d5c8c4bc..5a2c82f63baa 100644
> --- a/arch/arm64/include/asm/kvm_host.h
> +++ b/arch/arm64/include/asm/kvm_host.h
> @@ -556,16 +556,11 @@ static inline bool __vcpu_write_sys_reg_to_cpu(u64 val, int reg)
>   }
>   
>   struct kvm_vm_stat {
> -	ulong remote_tlb_flush;
> +	struct kvm_vm_stat_generic generic;
>   };
>   
>   struct kvm_vcpu_stat {
> -	u64 halt_successful_poll;
> -	u64 halt_attempted_poll;
> -	u64 halt_poll_success_ns;
> -	u64 halt_poll_fail_ns;
> -	u64 halt_poll_invalid;
> -	u64 halt_wakeup;
> +	struct kvm_vcpu_stat_generic generic;
>   	u64 hvc_exit_stat;
>   	u64 wfe_exit_stat;
>   	u64 wfi_exit_stat;
> diff --git a/arch/arm64/kvm/guest.c b/arch/arm64/kvm/guest.c
> index 5cb4a1cd5603..4962331d01e6 100644
> --- a/arch/arm64/kvm/guest.c
> +++ b/arch/arm64/kvm/guest.c
> @@ -29,18 +29,18 @@
>   #include "trace.h"
>   
>   struct kvm_stats_debugfs_item debugfs_entries[] = {
> -	VCPU_STAT("halt_successful_poll", halt_successful_poll),
> -	VCPU_STAT("halt_attempted_poll", halt_attempted_poll),
> -	VCPU_STAT("halt_poll_invalid", halt_poll_invalid),
> -	VCPU_STAT("halt_wakeup", halt_wakeup),
> +	VCPU_STAT_GENERIC("halt_successful_poll", halt_successful_poll),
> +	VCPU_STAT_GENERIC("halt_attempted_poll", halt_attempted_poll),
> +	VCPU_STAT_GENERIC("halt_poll_invalid", halt_poll_invalid),
> +	VCPU_STAT_GENERIC("halt_wakeup", halt_wakeup),
>   	VCPU_STAT("hvc_exit_stat", hvc_exit_stat),
>   	VCPU_STAT("wfe_exit_stat", wfe_exit_stat),
>   	VCPU_STAT("wfi_exit_stat", wfi_exit_stat),
>   	VCPU_STAT("mmio_exit_user", mmio_exit_user),
>   	VCPU_STAT("mmio_exit_kernel", mmio_exit_kernel),
>   	VCPU_STAT("exits", exits),
> -	VCPU_STAT("halt_poll_success_ns", halt_poll_success_ns),
> -	VCPU_STAT("halt_poll_fail_ns", halt_poll_fail_ns),
> +	VCPU_STAT_GENERIC("halt_poll_success_ns", halt_poll_success_ns),
> +	VCPU_STAT_GENERIC("halt_poll_fail_ns", halt_poll_fail_ns),
>   	{ NULL }
>   };
>   
> diff --git a/arch/mips/include/asm/kvm_host.h b/arch/mips/include/asm/kvm_host.h
> index fca4547d580f..696f6b009377 100644
> --- a/arch/mips/include/asm/kvm_host.h
> +++ b/arch/mips/include/asm/kvm_host.h
> @@ -109,10 +109,11 @@ static inline bool kvm_is_error_hva(unsigned long addr)
>   }
>   
>   struct kvm_vm_stat {
> -	ulong remote_tlb_flush;
> +	struct kvm_vm_stat_generic generic;
>   };
>   
>   struct kvm_vcpu_stat {
> +	struct kvm_vcpu_stat_generic generic;
>   	u64 wait_exits;
>   	u64 cache_exits;
>   	u64 signal_exits;
> @@ -142,12 +143,6 @@ struct kvm_vcpu_stat {
>   #ifdef CONFIG_CPU_LOONGSON64
>   	u64 vz_cpucfg_exits;
>   #endif
> -	u64 halt_successful_poll;
> -	u64 halt_attempted_poll;
> -	u64 halt_poll_success_ns;
> -	u64 halt_poll_fail_ns;
> -	u64 halt_poll_invalid;
> -	u64 halt_wakeup;
>   };
>   
>   struct kvm_arch_memory_slot {
> diff --git a/arch/mips/kvm/mips.c b/arch/mips/kvm/mips.c
> index 4d4af97dcc88..ff205b35719b 100644
> --- a/arch/mips/kvm/mips.c
> +++ b/arch/mips/kvm/mips.c
> @@ -68,12 +68,12 @@ struct kvm_stats_debugfs_item debugfs_entries[] = {
>   #ifdef CONFIG_CPU_LOONGSON64
>   	VCPU_STAT("vz_cpucfg", vz_cpucfg_exits),
>   #endif
> -	VCPU_STAT("halt_successful_poll", halt_successful_poll),
> -	VCPU_STAT("halt_attempted_poll", halt_attempted_poll),
> -	VCPU_STAT("halt_poll_invalid", halt_poll_invalid),
> -	VCPU_STAT("halt_wakeup", halt_wakeup),
> -	VCPU_STAT("halt_poll_success_ns", halt_poll_success_ns),
> -	VCPU_STAT("halt_poll_fail_ns", halt_poll_fail_ns),
> +	VCPU_STAT_GENERIC("halt_successful_poll", halt_successful_poll),
> +	VCPU_STAT_GENERIC("halt_attempted_poll", halt_attempted_poll),
> +	VCPU_STAT_GENERIC("halt_poll_invalid", halt_poll_invalid),
> +	VCPU_STAT_GENERIC("halt_wakeup", halt_wakeup),
> +	VCPU_STAT_GENERIC("halt_poll_success_ns", halt_poll_success_ns),
> +	VCPU_STAT_GENERIC("halt_poll_fail_ns", halt_poll_fail_ns),
>   	{NULL}
>   };
>   
> diff --git a/arch/powerpc/include/asm/kvm_host.h b/arch/powerpc/include/asm/kvm_host.h
> index 1e83359f286b..6e41064858a1 100644
> --- a/arch/powerpc/include/asm/kvm_host.h
> +++ b/arch/powerpc/include/asm/kvm_host.h
> @@ -80,12 +80,13 @@ struct kvmppc_book3s_shadow_vcpu;
>   struct kvm_nested_guest;
>   
>   struct kvm_vm_stat {
> -	ulong remote_tlb_flush;
> +	struct kvm_vm_stat_generic generic;
>   	ulong num_2M_pages;
>   	ulong num_1G_pages;
>   };
>   
>   struct kvm_vcpu_stat {
> +	struct kvm_vcpu_stat_generic generic;
>   	u64 sum_exits;
>   	u64 mmio_exits;
>   	u64 signal_exits;
> @@ -101,14 +102,8 @@ struct kvm_vcpu_stat {
>   	u64 emulated_inst_exits;
>   	u64 dec_exits;
>   	u64 ext_intr_exits;
> -	u64 halt_poll_success_ns;
> -	u64 halt_poll_fail_ns;
>   	u64 halt_wait_ns;
> -	u64 halt_successful_poll;
> -	u64 halt_attempted_poll;
>   	u64 halt_successful_wait;
> -	u64 halt_poll_invalid;
> -	u64 halt_wakeup;
>   	u64 dbell_exits;
>   	u64 gdbell_exits;
>   	u64 ld;
> diff --git a/arch/powerpc/kvm/book3s.c b/arch/powerpc/kvm/book3s.c
> index 2b691f4d1f26..92cdb4175945 100644
> --- a/arch/powerpc/kvm/book3s.c
> +++ b/arch/powerpc/kvm/book3s.c
> @@ -47,14 +47,14 @@ struct kvm_stats_debugfs_item debugfs_entries[] = {
>   	VCPU_STAT("dec", dec_exits),
>   	VCPU_STAT("ext_intr", ext_intr_exits),
>   	VCPU_STAT("queue_intr", queue_intr),
> -	VCPU_STAT("halt_poll_success_ns", halt_poll_success_ns),
> -	VCPU_STAT("halt_poll_fail_ns", halt_poll_fail_ns),
> +	VCPU_STAT_GENERIC("halt_poll_success_ns", halt_poll_success_ns),
> +	VCPU_STAT_GENERIC("halt_poll_fail_ns", halt_poll_fail_ns),
>   	VCPU_STAT("halt_wait_ns", halt_wait_ns),
> -	VCPU_STAT("halt_successful_poll", halt_successful_poll),
> -	VCPU_STAT("halt_attempted_poll", halt_attempted_poll),
> +	VCPU_STAT_GENERIC("halt_successful_poll", halt_successful_poll),
> +	VCPU_STAT_GENERIC("halt_attempted_poll", halt_attempted_poll),
>   	VCPU_STAT("halt_successful_wait", halt_successful_wait),
> -	VCPU_STAT("halt_poll_invalid", halt_poll_invalid),
> -	VCPU_STAT("halt_wakeup", halt_wakeup),
> +	VCPU_STAT_GENERIC("halt_poll_invalid", halt_poll_invalid),
> +	VCPU_STAT_GENERIC("halt_wakeup", halt_wakeup),
>   	VCPU_STAT("pf_storage", pf_storage),
>   	VCPU_STAT("sp_storage", sp_storage),
>   	VCPU_STAT("pf_instruc", pf_instruc),
> diff --git a/arch/powerpc/kvm/book3s_hv.c b/arch/powerpc/kvm/book3s_hv.c
> index 28a80d240b76..2a1cf29ba3fd 100644
> --- a/arch/powerpc/kvm/book3s_hv.c
> +++ b/arch/powerpc/kvm/book3s_hv.c
> @@ -236,7 +236,7 @@ static void kvmppc_fast_vcpu_kick_hv(struct kvm_vcpu *vcpu)
>   
>   	waitp = kvm_arch_vcpu_get_wait(vcpu);
>   	if (rcuwait_wake_up(waitp))
> -		++vcpu->stat.halt_wakeup;
> +		++vcpu->stat.generic.halt_wakeup;
>   
>   	cpu = READ_ONCE(vcpu->arch.thread_cpu);
>   	if (cpu >= 0 && kvmppc_ipi_thread(cpu))
> @@ -3925,7 +3925,7 @@ static void kvmppc_vcore_blocked(struct kvmppc_vcore *vc)
>   	cur = start_poll = ktime_get();
>   	if (vc->halt_poll_ns) {
>   		ktime_t stop = ktime_add_ns(start_poll, vc->halt_poll_ns);
> -		++vc->runner->stat.halt_attempted_poll;
> +		++vc->runner->stat.generic.halt_attempted_poll;
>   
>   		vc->vcore_state = VCORE_POLLING;
>   		spin_unlock(&vc->lock);
> @@ -3942,7 +3942,7 @@ static void kvmppc_vcore_blocked(struct kvmppc_vcore *vc)
>   		vc->vcore_state = VCORE_INACTIVE;
>   
>   		if (!do_sleep) {
> -			++vc->runner->stat.halt_successful_poll;
> +			++vc->runner->stat.generic.halt_successful_poll;
>   			goto out;
>   		}
>   	}
> @@ -3954,7 +3954,7 @@ static void kvmppc_vcore_blocked(struct kvmppc_vcore *vc)
>   		do_sleep = 0;
>   		/* If we polled, count this as a successful poll */
>   		if (vc->halt_poll_ns)
> -			++vc->runner->stat.halt_successful_poll;
> +			++vc->runner->stat.generic.halt_successful_poll;
>   		goto out;
>   	}
>   
> @@ -3981,13 +3981,13 @@ static void kvmppc_vcore_blocked(struct kvmppc_vcore *vc)
>   			ktime_to_ns(cur) - ktime_to_ns(start_wait);
>   		/* Attribute failed poll time */
>   		if (vc->halt_poll_ns)
> -			vc->runner->stat.halt_poll_fail_ns +=
> +			vc->runner->stat.generic.halt_poll_fail_ns +=
>   				ktime_to_ns(start_wait) -
>   				ktime_to_ns(start_poll);
>   	} else {
>   		/* Attribute successful poll time */
>   		if (vc->halt_poll_ns)
> -			vc->runner->stat.halt_poll_success_ns +=
> +			vc->runner->stat.generic.halt_poll_success_ns +=
>   				ktime_to_ns(cur) -
>   				ktime_to_ns(start_poll);
>   	}
> diff --git a/arch/powerpc/kvm/book3s_pr.c b/arch/powerpc/kvm/book3s_pr.c
> index d7733b07f489..71bcb0140461 100644
> --- a/arch/powerpc/kvm/book3s_pr.c
> +++ b/arch/powerpc/kvm/book3s_pr.c
> @@ -493,7 +493,7 @@ static void kvmppc_set_msr_pr(struct kvm_vcpu *vcpu, u64 msr)
>   		if (!vcpu->arch.pending_exceptions) {
>   			kvm_vcpu_block(vcpu);
>   			kvm_clear_request(KVM_REQ_UNHALT, vcpu);
> -			vcpu->stat.halt_wakeup++;
> +			vcpu->stat.generic.halt_wakeup++;
>   
>   			/* Unset POW bit after we woke up */
>   			msr &= ~MSR_POW;
> diff --git a/arch/powerpc/kvm/book3s_pr_papr.c b/arch/powerpc/kvm/book3s_pr_papr.c
> index 031c8015864a..ac14239f3424 100644
> --- a/arch/powerpc/kvm/book3s_pr_papr.c
> +++ b/arch/powerpc/kvm/book3s_pr_papr.c
> @@ -378,7 +378,7 @@ int kvmppc_h_pr(struct kvm_vcpu *vcpu, unsigned long cmd)
>   		kvmppc_set_msr_fast(vcpu, kvmppc_get_msr(vcpu) | MSR_EE);
>   		kvm_vcpu_block(vcpu);
>   		kvm_clear_request(KVM_REQ_UNHALT, vcpu);
> -		vcpu->stat.halt_wakeup++;
> +		vcpu->stat.generic.halt_wakeup++;
>   		return EMULATE_DONE;
>   	case H_LOGICAL_CI_LOAD:
>   		return kvmppc_h_pr_logical_ci_load(vcpu);
> diff --git a/arch/powerpc/kvm/booke.c b/arch/powerpc/kvm/booke.c
> index 7d5fe43f85c4..80d3b39aa7ac 100644
> --- a/arch/powerpc/kvm/booke.c
> +++ b/arch/powerpc/kvm/booke.c
> @@ -49,15 +49,15 @@ struct kvm_stats_debugfs_item debugfs_entries[] = {
>   	VCPU_STAT("inst_emu", emulated_inst_exits),
>   	VCPU_STAT("dec", dec_exits),
>   	VCPU_STAT("ext_intr", ext_intr_exits),
> -	VCPU_STAT("halt_successful_poll", halt_successful_poll),
> -	VCPU_STAT("halt_attempted_poll", halt_attempted_poll),
> -	VCPU_STAT("halt_poll_invalid", halt_poll_invalid),
> -	VCPU_STAT("halt_wakeup", halt_wakeup),
> +	VCPU_STAT_GENERIC("halt_successful_poll", halt_successful_poll),
> +	VCPU_STAT_GENERIC("halt_attempted_poll", halt_attempted_poll),
> +	VCPU_STAT_GENERIC("halt_poll_invalid", halt_poll_invalid),
> +	VCPU_STAT_GENERIC("halt_wakeup", halt_wakeup),
>   	VCPU_STAT("doorbell", dbell_exits),
>   	VCPU_STAT("guest doorbell", gdbell_exits),
> -	VCPU_STAT("halt_poll_success_ns", halt_poll_success_ns),
> -	VCPU_STAT("halt_poll_fail_ns", halt_poll_fail_ns),
> -	VM_STAT("remote_tlb_flush", remote_tlb_flush),
> +	VCPU_STAT_GENERIC("halt_poll_success_ns", halt_poll_success_ns),
> +	VCPU_STAT_GENERIC("halt_poll_fail_ns", halt_poll_fail_ns),
> +	VM_STAT_GENERIC("remote_tlb_flush", remote_tlb_flush),
>   	{ NULL }
>   };
>   
> diff --git a/arch/s390/include/asm/kvm_host.h b/arch/s390/include/asm/kvm_host.h
> index 8925f3969478..9b4473f76e56 100644
> --- a/arch/s390/include/asm/kvm_host.h
> +++ b/arch/s390/include/asm/kvm_host.h
> @@ -361,6 +361,7 @@ struct sie_page {
>   };
>   
>   struct kvm_vcpu_stat {
> +	struct kvm_vcpu_stat_generic generic;
>   	u64 exit_userspace;
>   	u64 exit_null;
>   	u64 exit_external_request;
> @@ -370,13 +371,7 @@ struct kvm_vcpu_stat {
>   	u64 exit_validity;
>   	u64 exit_instruction;
>   	u64 exit_pei;
> -	u64 halt_successful_poll;
> -	u64 halt_attempted_poll;
> -	u64 halt_poll_invalid;
>   	u64 halt_no_poll_steal;
> -	u64 halt_wakeup;
> -	u64 halt_poll_success_ns;
> -	u64 halt_poll_fail_ns;
>   	u64 instruction_lctl;
>   	u64 instruction_lctlg;
>   	u64 instruction_stctl;
> @@ -755,12 +750,12 @@ struct kvm_vcpu_arch {
>   };
>   
>   struct kvm_vm_stat {
> +	struct kvm_vm_stat_generic generic;
>   	u64 inject_io;
>   	u64 inject_float_mchk;
>   	u64 inject_pfault_done;
>   	u64 inject_service_signal;
>   	u64 inject_virtio;
> -	u64 remote_tlb_flush;
>   };
>   
>   struct kvm_arch_memory_slot {
> diff --git a/arch/s390/kvm/kvm-s390.c b/arch/s390/kvm/kvm-s390.c
> index 1296fc10f80c..e8bc7cd06794 100644
> --- a/arch/s390/kvm/kvm-s390.c
> +++ b/arch/s390/kvm/kvm-s390.c
> @@ -72,13 +72,13 @@ struct kvm_stats_debugfs_item debugfs_entries[] = {
>   	VCPU_STAT("exit_program_interruption", exit_program_interruption),
>   	VCPU_STAT("exit_instr_and_program_int", exit_instr_and_program),
>   	VCPU_STAT("exit_operation_exception", exit_operation_exception),
> -	VCPU_STAT("halt_successful_poll", halt_successful_poll),
> -	VCPU_STAT("halt_attempted_poll", halt_attempted_poll),
> -	VCPU_STAT("halt_poll_invalid", halt_poll_invalid),
> +	VCPU_STAT_GENERIC("halt_successful_poll", halt_successful_poll),
> +	VCPU_STAT_GENERIC("halt_attempted_poll", halt_attempted_poll),
> +	VCPU_STAT_GENERIC("halt_poll_invalid", halt_poll_invalid),
>   	VCPU_STAT("halt_no_poll_steal", halt_no_poll_steal),
> -	VCPU_STAT("halt_wakeup", halt_wakeup),
> -	VCPU_STAT("halt_poll_success_ns", halt_poll_success_ns),
> -	VCPU_STAT("halt_poll_fail_ns", halt_poll_fail_ns),
> +	VCPU_STAT_GENERIC("halt_wakeup", halt_wakeup),
> +	VCPU_STAT_GENERIC("halt_poll_success_ns", halt_poll_success_ns),
> +	VCPU_STAT_GENERIC("halt_poll_fail_ns", halt_poll_fail_ns),
>   	VCPU_STAT("instruction_lctlg", instruction_lctlg),
>   	VCPU_STAT("instruction_lctl", instruction_lctl),
>   	VCPU_STAT("instruction_stctl", instruction_stctl),
> diff --git a/arch/x86/include/asm/kvm_host.h b/arch/x86/include/asm/kvm_host.h
> index 55efbacfc244..db8eb3dead7e 100644
> --- a/arch/x86/include/asm/kvm_host.h
> +++ b/arch/x86/include/asm/kvm_host.h
> @@ -1127,6 +1127,7 @@ struct kvm_arch {
>   };
>   
>   struct kvm_vm_stat {
> +	struct kvm_vm_stat_generic generic;
>   	ulong mmu_shadow_zapped;
>   	ulong mmu_pte_write;
>   	ulong mmu_pde_zapped;
> @@ -1134,13 +1135,13 @@ struct kvm_vm_stat {
>   	ulong mmu_recycled;
>   	ulong mmu_cache_miss;
>   	ulong mmu_unsync;
> -	ulong remote_tlb_flush;
>   	ulong lpages;
>   	ulong nx_lpage_splits;
>   	ulong max_mmu_page_hash_collisions;
>   };
>   
>   struct kvm_vcpu_stat {
> +	struct kvm_vcpu_stat_generic generic;
>   	u64 pf_fixed;
>   	u64 pf_guest;
>   	u64 tlb_flush;
> @@ -1154,10 +1155,6 @@ struct kvm_vcpu_stat {
>   	u64 nmi_window_exits;
>   	u64 l1d_flush;
>   	u64 halt_exits;
> -	u64 halt_successful_poll;
> -	u64 halt_attempted_poll;
> -	u64 halt_poll_invalid;
> -	u64 halt_wakeup;
>   	u64 request_irq_exits;
>   	u64 irq_exits;
>   	u64 host_state_reload;
> @@ -1168,8 +1165,6 @@ struct kvm_vcpu_stat {
>   	u64 irq_injections;
>   	u64 nmi_injections;
>   	u64 req_event;
> -	u64 halt_poll_success_ns;
> -	u64 halt_poll_fail_ns;
>   	u64 nested_run;
>   	u64 directed_yield_attempted;
>   	u64 directed_yield_successful;
> diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c
> index 9b6bca616929..96d10253218a 100644
> --- a/arch/x86/kvm/x86.c
> +++ b/arch/x86/kvm/x86.c
> @@ -226,10 +226,10 @@ struct kvm_stats_debugfs_item debugfs_entries[] = {
>   	VCPU_STAT("irq_window", irq_window_exits),
>   	VCPU_STAT("nmi_window", nmi_window_exits),
>   	VCPU_STAT("halt_exits", halt_exits),
> -	VCPU_STAT("halt_successful_poll", halt_successful_poll),
> -	VCPU_STAT("halt_attempted_poll", halt_attempted_poll),
> -	VCPU_STAT("halt_poll_invalid", halt_poll_invalid),
> -	VCPU_STAT("halt_wakeup", halt_wakeup),
> +	VCPU_STAT_GENERIC("halt_successful_poll", halt_successful_poll),
> +	VCPU_STAT_GENERIC("halt_attempted_poll", halt_attempted_poll),
> +	VCPU_STAT_GENERIC("halt_poll_invalid", halt_poll_invalid),
> +	VCPU_STAT_GENERIC("halt_wakeup", halt_wakeup),
>   	VCPU_STAT("hypercalls", hypercalls),
>   	VCPU_STAT("request_irq", request_irq_exits),
>   	VCPU_STAT("irq_exits", irq_exits),
> @@ -241,8 +241,8 @@ struct kvm_stats_debugfs_item debugfs_entries[] = {
>   	VCPU_STAT("nmi_injections", nmi_injections),
>   	VCPU_STAT("req_event", req_event),
>   	VCPU_STAT("l1d_flush", l1d_flush),
> -	VCPU_STAT("halt_poll_success_ns", halt_poll_success_ns),
> -	VCPU_STAT("halt_poll_fail_ns", halt_poll_fail_ns),
> +	VCPU_STAT_GENERIC("halt_poll_success_ns", halt_poll_success_ns),
> +	VCPU_STAT_GENERIC("halt_poll_fail_ns", halt_poll_fail_ns),
>   	VCPU_STAT("nested_run", nested_run),
>   	VCPU_STAT("directed_yield_attempted", directed_yield_attempted),
>   	VCPU_STAT("directed_yield_successful", directed_yield_successful),
> @@ -253,7 +253,7 @@ struct kvm_stats_debugfs_item debugfs_entries[] = {
>   	VM_STAT("mmu_recycled", mmu_recycled),
>   	VM_STAT("mmu_cache_miss", mmu_cache_miss),
>   	VM_STAT("mmu_unsync", mmu_unsync),
> -	VM_STAT("remote_tlb_flush", remote_tlb_flush),
> +	VM_STAT_GENERIC("remote_tlb_flush", remote_tlb_flush),
>   	VM_STAT("largepages", lpages, .mode = 0444),
>   	VM_STAT("nx_largepages_splitted", nx_lpage_splits, .mode = 0444),
>   	VM_STAT("max_mmu_page_hash_collisions", max_mmu_page_hash_collisions),
> diff --git a/include/linux/kvm_host.h b/include/linux/kvm_host.h
> index 2f34487e21f2..1870fa928762 100644
> --- a/include/linux/kvm_host.h
> +++ b/include/linux/kvm_host.h
> @@ -1243,10 +1243,15 @@ struct kvm_stats_debugfs_item {
>   #define KVM_DBGFS_GET_MODE(dbgfs_item)                                         \
>   	((dbgfs_item)->mode ? (dbgfs_item)->mode : 0644)
>   
> -#define VM_STAT(n, x, ...) 							\
> +#define VM_STAT(n, x, ...)						       \
>   	{ n, offsetof(struct kvm, stat.x), KVM_STAT_VM, ## __VA_ARGS__ }
> -#define VCPU_STAT(n, x, ...)							\
> +#define VCPU_STAT(n, x, ...)						       \
>   	{ n, offsetof(struct kvm_vcpu, stat.x), KVM_STAT_VCPU, ## __VA_ARGS__ }
> +#define VM_STAT_GENERIC(n, x, ...)					       \
> +	{ n, offsetof(struct kvm, stat.generic.x), KVM_STAT_VM, ## __VA_ARGS__ }
> +#define VCPU_STAT_GENERIC(n, x, ...)					       \
> +	{ n, offsetof(struct kvm_vcpu, stat.generic.x),			       \
> +	  KVM_STAT_VCPU, ## __VA_ARGS__ }
>   
>   extern struct kvm_stats_debugfs_item debugfs_entries[];
>   extern struct dentry *kvm_debugfs_dir;
> diff --git a/include/linux/kvm_types.h b/include/linux/kvm_types.h
> index a7580f69dda0..7c39489f9953 100644
> --- a/include/linux/kvm_types.h
> +++ b/include/linux/kvm_types.h
> @@ -76,5 +76,17 @@ struct kvm_mmu_memory_cache {
>   };
>   #endif
>   
> +struct kvm_vm_stat_generic {
> +	ulong remote_tlb_flush;
> +};
> +
> +struct kvm_vcpu_stat_generic {
> +	u64 halt_successful_poll;
> +	u64 halt_attempted_poll;
> +	u64 halt_poll_invalid;
> +	u64 halt_wakeup;
> +	u64 halt_poll_success_ns;
> +	u64 halt_poll_fail_ns;
> +};
>   
>   #endif /* __KVM_TYPES_H__ */
> diff --git a/virt/kvm/kvm_main.c b/virt/kvm/kvm_main.c
> index 6b4feb92dc79..f6ad5b080994 100644
> --- a/virt/kvm/kvm_main.c
> +++ b/virt/kvm/kvm_main.c
> @@ -330,7 +330,7 @@ void kvm_flush_remote_tlbs(struct kvm *kvm)
>   	 */
>   	if (!kvm_arch_flush_remote_tlb(kvm)
>   	    || kvm_make_all_cpus_request(kvm, KVM_REQ_TLB_FLUSH))
> -		++kvm->stat.remote_tlb_flush;
> +		++kvm->stat.generic.remote_tlb_flush;
>   	cmpxchg(&kvm->tlbs_dirty, dirty_count, 0);
>   }
>   EXPORT_SYMBOL_GPL(kvm_flush_remote_tlbs);
> @@ -2940,9 +2940,9 @@ static inline void
>   update_halt_poll_stats(struct kvm_vcpu *vcpu, u64 poll_ns, bool waited)
>   {
>   	if (waited)
> -		vcpu->stat.halt_poll_fail_ns += poll_ns;
> +		vcpu->stat.generic.halt_poll_fail_ns += poll_ns;
>   	else
> -		vcpu->stat.halt_poll_success_ns += poll_ns;
> +		vcpu->stat.generic.halt_poll_success_ns += poll_ns;
>   }
>   
>   /*
> @@ -2960,16 +2960,16 @@ void kvm_vcpu_block(struct kvm_vcpu *vcpu)
>   	if (vcpu->halt_poll_ns && !kvm_arch_no_poll(vcpu)) {
>   		ktime_t stop = ktime_add_ns(ktime_get(), vcpu->halt_poll_ns);
>   
> -		++vcpu->stat.halt_attempted_poll;
> +		++vcpu->stat.generic.halt_attempted_poll;
>   		do {
>   			/*
>   			 * This sets KVM_REQ_UNHALT if an interrupt
>   			 * arrives.
>   			 */
>   			if (kvm_vcpu_check_block(vcpu) < 0) {
> -				++vcpu->stat.halt_successful_poll;
> +				++vcpu->stat.generic.halt_successful_poll;
>   				if (!vcpu_valid_wakeup(vcpu))
> -					++vcpu->stat.halt_poll_invalid;
> +					++vcpu->stat.generic.halt_poll_invalid;
>   				goto out;
>   			}
>   			poll_end = cur = ktime_get();
> @@ -3027,7 +3027,7 @@ bool kvm_vcpu_wake_up(struct kvm_vcpu *vcpu)
>   	waitp = kvm_arch_vcpu_get_wait(vcpu);
>   	if (rcuwait_wake_up(waitp)) {
>   		WRITE_ONCE(vcpu->ready, true);
> -		++vcpu->stat.halt_wakeup;
> +		++vcpu->stat.generic.halt_wakeup;
>   		return true;
>   	}
>   

Reviewed-by: Krish Sadhukhan <krish.sadhukhan@oracle.com>


^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: [PATCH v7 2/4] KVM: stats: Add fd-based API to read binary stats data
  2021-06-03 21:14 ` [PATCH v7 2/4] KVM: stats: Add fd-based API to read binary stats data Jing Zhang
@ 2021-06-07 19:22   ` Krish Sadhukhan
  2021-06-07 22:10     ` Jing Zhang
  2021-06-10 16:23   ` Paolo Bonzini
                     ` (2 subsequent siblings)
  3 siblings, 1 reply; 32+ messages in thread
From: Krish Sadhukhan @ 2021-06-07 19:22 UTC (permalink / raw)
  To: Jing Zhang, KVM, KVMARM, LinuxMIPS, KVMPPC, LinuxS390,
	Linuxkselftest, Paolo Bonzini, Marc Zyngier, James Morse,
	Julien Thierry, Suzuki K Poulose, Will Deacon, Huacai Chen,
	Aleksandar Markovic, Thomas Bogendoerfer, Paul Mackerras,
	Christian Borntraeger, Janosch Frank, David Hildenbrand,
	Cornelia Huck, Claudio Imbrenda, Sean Christopherson,
	Vitaly Kuznetsov, Jim Mattson, Peter Shier, Oliver Upton,
	David Rientjes, Emanuele Giuseppe Esposito, David Matlack,
	Ricardo Koller


On 6/3/21 2:14 PM, Jing Zhang wrote:
> Provides a file descriptor per VM to read VM stats info/data.
> Provides a file descriptor per vCPU to read vCPU stats info/data.
>
> Reviewed-by: David Matlack <dmatlack@google.com>
> Reviewed-by: Ricardo Koller <ricarkol@google.com>
> Signed-off-by: Jing Zhang <jingzhangos@google.com>
> ---
>   arch/arm64/kvm/guest.c    |  26 +++++++
>   arch/mips/kvm/mips.c      |  52 +++++++++++++
>   arch/powerpc/kvm/book3s.c |  52 +++++++++++++
>   arch/powerpc/kvm/booke.c  |  45 +++++++++++
>   arch/s390/kvm/kvm-s390.c  | 117 ++++++++++++++++++++++++++++
>   arch/x86/kvm/x86.c        |  53 +++++++++++++
>   include/linux/kvm_host.h  | 132 ++++++++++++++++++++++++++++++++
>   include/uapi/linux/kvm.h  |  50 ++++++++++++
>   virt/kvm/kvm_main.c       | 155 ++++++++++++++++++++++++++++++++++++++
>   9 files changed, 682 insertions(+)
>
> diff --git a/arch/arm64/kvm/guest.c b/arch/arm64/kvm/guest.c
> index 4962331d01e6..c68addd38cf8 100644
> --- a/arch/arm64/kvm/guest.c
> +++ b/arch/arm64/kvm/guest.c
> @@ -28,6 +28,32 @@
>   
>   #include "trace.h"
>   
> +struct _kvm_stats_desc kvm_vm_stats_desc[] = DEFINE_VM_STATS_DESC();
> +
> +struct _kvm_stats_header kvm_vm_stats_header = {
> +	.name_size = KVM_STATS_NAME_LEN,
> +	.count = ARRAY_SIZE(kvm_vm_stats_desc),
> +	.desc_offset = sizeof(struct kvm_stats_header),
> +	.data_offset = sizeof(struct kvm_stats_header) +
> +		sizeof(kvm_vm_stats_desc),
> +};
> +
> +struct _kvm_stats_desc kvm_vcpu_stats_desc[] = DEFINE_VCPU_STATS_DESC(
> +	STATS_DESC_COUNTER("hvc_exit_stat"),
> +	STATS_DESC_COUNTER("wfe_exit_stat"),
> +	STATS_DESC_COUNTER("wfi_exit_stat"),
> +	STATS_DESC_COUNTER("mmio_exit_user"),
> +	STATS_DESC_COUNTER("mmio_exit_kernel"),
> +	STATS_DESC_COUNTER("exits"));
> +
> +struct _kvm_stats_header kvm_vcpu_stats_header = {
> +	.name_size = KVM_STATS_NAME_LEN,
> +	.count = ARRAY_SIZE(kvm_vcpu_stats_desc),
> +	.desc_offset = sizeof(struct kvm_stats_header),
> +	.data_offset = sizeof(struct kvm_stats_header) +
> +		sizeof(kvm_vcpu_stats_desc),
> +};
> +
>   struct kvm_stats_debugfs_item debugfs_entries[] = {
>   	VCPU_STAT_GENERIC("halt_successful_poll", halt_successful_poll),
>   	VCPU_STAT_GENERIC("halt_attempted_poll", halt_attempted_poll),
> diff --git a/arch/mips/kvm/mips.c b/arch/mips/kvm/mips.c
> index ff205b35719b..eb8fd4a96952 100644
> --- a/arch/mips/kvm/mips.c
> +++ b/arch/mips/kvm/mips.c
> @@ -38,6 +38,58 @@
>   #define VECTORSPACING 0x100	/* for EI/VI mode */
>   #endif
>   
> +struct _kvm_stats_desc kvm_vm_stats_desc[] = DEFINE_VM_STATS_DESC();
> +
> +struct _kvm_stats_header kvm_vm_stats_header = {
> +	.name_size = KVM_STATS_NAME_LEN,
> +	.count = ARRAY_SIZE(kvm_vm_stats_desc),
> +	.desc_offset = sizeof(struct kvm_stats_header),
> +	.data_offset = sizeof(struct kvm_stats_header) +
> +		sizeof(kvm_vm_stats_desc),
> +};
> +
> +struct _kvm_stats_desc kvm_vcpu_stats_desc[] = DEFINE_VCPU_STATS_DESC(
> +	STATS_DESC_COUNTER("wait_exits"),
> +	STATS_DESC_COUNTER("cache_exits"),
> +	STATS_DESC_COUNTER("signal_exits"),
> +	STATS_DESC_COUNTER("int_exits"),
> +	STATS_DESC_COUNTER("cop_unusable_exits"),
> +	STATS_DESC_COUNTER("tlbmod_exits"),
> +	STATS_DESC_COUNTER("tlbmiss_ld_exits"),
> +	STATS_DESC_COUNTER("tlbmiss_st_exits"),
> +	STATS_DESC_COUNTER("addrerr_st_exits"),
> +	STATS_DESC_COUNTER("addrerr_ld_exits"),
> +	STATS_DESC_COUNTER("syscall_exits"),
> +	STATS_DESC_COUNTER("resvd_inst_exits"),
> +	STATS_DESC_COUNTER("break_inst_exits"),
> +	STATS_DESC_COUNTER("trap_inst_exits"),
> +	STATS_DESC_COUNTER("msa_fpe_exits"),
> +	STATS_DESC_COUNTER("fpe_exits"),
> +	STATS_DESC_COUNTER("msa_disabled_exits"),
> +	STATS_DESC_COUNTER("flush_dcache_exits"),
> +#ifdef CONFIG_KVM_MIPS_VZ
> +	STATS_DESC_COUNTER("vz_gpsi_exits"),
> +	STATS_DESC_COUNTER("vz_gsfc_exits"),
> +	STATS_DESC_COUNTER("vz_hc_exits"),
> +	STATS_DESC_COUNTER("vz_grr_exits"),
> +	STATS_DESC_COUNTER("vz_gva_exits"),
> +	STATS_DESC_COUNTER("vz_ghfc_exits"),
> +	STATS_DESC_COUNTER("vz_gpa_exits"),
> +	STATS_DESC_COUNTER("vz_resvd_exits"),
> +#ifdef CONFIG_CPU_LOONGSON64
> +	STATS_DESC_COUNTER("vz_cpucfg_exits"),
> +#endif
> +#endif
> +	);
> +
> +struct _kvm_stats_header kvm_vcpu_stats_header = {
> +	.name_size = KVM_STATS_NAME_LEN,
> +	.count = ARRAY_SIZE(kvm_vcpu_stats_desc),
> +	.desc_offset = sizeof(struct kvm_stats_header),
> +	.data_offset = sizeof(struct kvm_stats_header) +
> +		sizeof(kvm_vcpu_stats_desc),
> +};
> +
>   struct kvm_stats_debugfs_item debugfs_entries[] = {
>   	VCPU_STAT("wait", wait_exits),
>   	VCPU_STAT("cache", cache_exits),
> diff --git a/arch/powerpc/kvm/book3s.c b/arch/powerpc/kvm/book3s.c
> index 92cdb4175945..53fb1da049a7 100644
> --- a/arch/powerpc/kvm/book3s.c
> +++ b/arch/powerpc/kvm/book3s.c
> @@ -38,6 +38,58 @@
>   
>   /* #define EXIT_DEBUG */
>   
> +struct _kvm_stats_desc kvm_vm_stats_desc[] = DEFINE_VM_STATS_DESC(
> +	STATS_DESC_ICOUNTER("num_2M_pages"),
> +	STATS_DESC_ICOUNTER("num_1G_pages"));
> +
> +struct _kvm_stats_header kvm_vm_stats_header = {
> +	.name_size = KVM_STATS_NAME_LEN,
> +	.count = ARRAY_SIZE(kvm_vm_stats_desc),
> +	.desc_offset = sizeof(struct kvm_stats_header),
> +	.data_offset = sizeof(struct kvm_stats_header) +
> +		sizeof(kvm_vm_stats_desc),
> +};
> +
> +struct _kvm_stats_desc kvm_vcpu_stats_desc[] = DEFINE_VCPU_STATS_DESC(
> +	STATS_DESC_COUNTER("sum_exits"),
> +	STATS_DESC_COUNTER("mmio_exits"),
> +	STATS_DESC_COUNTER("signal_exits"),
> +	STATS_DESC_COUNTER("light_exits"),
> +	STATS_DESC_COUNTER("itlb_real_miss_exits"),
> +	STATS_DESC_COUNTER("itlb_virt_miss_exits"),
> +	STATS_DESC_COUNTER("dtlb_real_miss_exits"),
> +	STATS_DESC_COUNTER("dtlb_virt_miss_exits"),
> +	STATS_DESC_COUNTER("syscall_exits"),
> +	STATS_DESC_COUNTER("isi_exits"),
> +	STATS_DESC_COUNTER("dsi_exits"),
> +	STATS_DESC_COUNTER("emulated_inst_exits"),
> +	STATS_DESC_COUNTER("dec_exits"),
> +	STATS_DESC_COUNTER("ext_intr_exits"),
> +	STATS_DESC_TIME_NSEC("halt_wait_ns"),
> +	STATS_DESC_COUNTER("halt_successful_wait"),
> +	STATS_DESC_COUNTER("dbell_exits"),
> +	STATS_DESC_COUNTER("gdbell_exits"),
> +	STATS_DESC_COUNTER("ld"),
> +	STATS_DESC_COUNTER("st"),
> +	STATS_DESC_COUNTER("pf_storage"),
> +	STATS_DESC_COUNTER("pf_instruc"),
> +	STATS_DESC_COUNTER("sp_storage"),
> +	STATS_DESC_COUNTER("sp_instruc"),
> +	STATS_DESC_COUNTER("queue_intr"),
> +	STATS_DESC_COUNTER("ld_slow"),
> +	STATS_DESC_COUNTER("st_slow"),
> +	STATS_DESC_COUNTER("pthru_all"),
> +	STATS_DESC_COUNTER("pthru_host"),
> +	STATS_DESC_COUNTER("pthru_bad_aff"));
> +
> +struct _kvm_stats_header kvm_vcpu_stats_header = {
> +	.name_size = KVM_STATS_NAME_LEN,
> +	.count = ARRAY_SIZE(kvm_vcpu_stats_desc),
> +	.desc_offset = sizeof(struct kvm_stats_header),
> +	.data_offset = sizeof(struct kvm_stats_header) +
> +		sizeof(kvm_vcpu_stats_desc),
> +};
> +
>   struct kvm_stats_debugfs_item debugfs_entries[] = {
>   	VCPU_STAT("exits", sum_exits),
>   	VCPU_STAT("mmio", mmio_exits),
> diff --git a/arch/powerpc/kvm/booke.c b/arch/powerpc/kvm/booke.c
> index 80d3b39aa7ac..02f1e147d224 100644
> --- a/arch/powerpc/kvm/booke.c
> +++ b/arch/powerpc/kvm/booke.c
> @@ -36,6 +36,51 @@
>   
>   unsigned long kvmppc_booke_handlers;
>   
> +struct _kvm_stats_desc kvm_vm_stats_desc[] = DEFINE_VM_STATS_DESC(
> +	STATS_DESC_ICOUNTER("num_2M_pages"),
> +	STATS_DESC_ICOUNTER("num_1G_pages"));
> +
> +struct _kvm_stats_header kvm_vm_stats_header = {
> +	.name_size = KVM_STATS_NAME_LEN,
> +	.count = ARRAY_SIZE(kvm_vm_stats_desc),
> +	.desc_offset = sizeof(struct kvm_stats_header),
> +	.data_offset = sizeof(struct kvm_stats_header) +
> +		sizeof(kvm_vm_stats_desc),
> +};
> +
> +struct _kvm_stats_desc kvm_vcpu_stats_desc[] = DEFINE_VCPU_STATS_DESC(
> +	STATS_DESC_COUNTER("sum_exits"),
> +	STATS_DESC_COUNTER("mmio_exits"),
> +	STATS_DESC_COUNTER("signal_exits"),
> +	STATS_DESC_COUNTER("light_exits"),
> +	STATS_DESC_COUNTER("itlb_real_miss_exits"),
> +	STATS_DESC_COUNTER("itlb_virt_miss_exits"),
> +	STATS_DESC_COUNTER("dtlb_real_miss_exits"),
> +	STATS_DESC_COUNTER("dtlb_virt_miss_exits"),
> +	STATS_DESC_COUNTER("syscall_exits"),
> +	STATS_DESC_COUNTER("isi_exits"),
> +	STATS_DESC_COUNTER("dsi_exits"),
> +	STATS_DESC_COUNTER("emulated_inst_exits"),
> +	STATS_DESC_COUNTER("dec_exits"),
> +	STATS_DESC_COUNTER("ext_intr_exits"),
> +	STATS_DESC_TIME_NSEC("halt_wait_ns"),
> +	STATS_DESC_COUNTER("halt_successful_wait"),
> +	STATS_DESC_COUNTER("dbell_exits"),
> +	STATS_DESC_COUNTER("gdbell_exits"),
> +	STATS_DESC_COUNTER("ld"),
> +	STATS_DESC_COUNTER("st"),
> +	STATS_DESC_COUNTER("pthru_all"),
> +	STATS_DESC_COUNTER("pthru_host"),
> +	STATS_DESC_COUNTER("pthru_bad_aff"));
> +
> +struct _kvm_stats_header kvm_vcpu_stats_header = {
> +	.name_size = KVM_STATS_NAME_LEN,
> +	.count = ARRAY_SIZE(kvm_vcpu_stats_desc),
> +	.desc_offset = sizeof(struct kvm_stats_header),
> +	.data_offset = sizeof(struct kvm_stats_header) +
> +		sizeof(kvm_vcpu_stats_desc),
> +};
> +
>   struct kvm_stats_debugfs_item debugfs_entries[] = {
>   	VCPU_STAT("mmio", mmio_exits),
>   	VCPU_STAT("sig", signal_exits),
> diff --git a/arch/s390/kvm/kvm-s390.c b/arch/s390/kvm/kvm-s390.c
> index e8bc7cd06794..4e2141bee3bc 100644
> --- a/arch/s390/kvm/kvm-s390.c
> +++ b/arch/s390/kvm/kvm-s390.c
> @@ -58,6 +58,123 @@
>   #define VCPU_IRQS_MAX_BUF (sizeof(struct kvm_s390_irq) * \
>   			   (KVM_MAX_VCPUS + LOCAL_IRQS))
>   
> +struct _kvm_stats_desc kvm_vm_stats_desc[] = DEFINE_VM_STATS_DESC(
> +	STATS_DESC_COUNTER("inject_io"),
> +	STATS_DESC_COUNTER("inject_float_mchk"),
> +	STATS_DESC_COUNTER("inject_pfault_done"),
> +	STATS_DESC_COUNTER("inject_service_signal"),
> +	STATS_DESC_COUNTER("inject_virtio"));
> +
> +struct _kvm_stats_header kvm_vm_stats_header = {
> +	.name_size = KVM_STATS_NAME_LEN,
> +	.count = ARRAY_SIZE(kvm_vm_stats_desc),
> +	.desc_offset = sizeof(struct kvm_stats_header),
> +	.data_offset = sizeof(struct kvm_stats_header) +
> +		sizeof(kvm_vm_stats_desc),
> +};
> +
> +struct _kvm_stats_desc kvm_vcpu_stats_desc[] = DEFINE_VCPU_STATS_DESC(
> +	STATS_DESC_COUNTER("exit_userspace"),
> +	STATS_DESC_COUNTER("exit_null"),
> +	STATS_DESC_COUNTER("exit_external_request"),
> +	STATS_DESC_COUNTER("exit_io_request"),
> +	STATS_DESC_COUNTER("exit_external_interrupt"),
> +	STATS_DESC_COUNTER("exit_stop_request"),
> +	STATS_DESC_COUNTER("exit_validity"),
> +	STATS_DESC_COUNTER("exit_instruction"),
> +	STATS_DESC_COUNTER("exit_pei"),
> +	STATS_DESC_COUNTER("halt_no_poll_steal"),
> +	STATS_DESC_COUNTER("instruction_lctl"),
> +	STATS_DESC_COUNTER("instruction_lctlg"),
> +	STATS_DESC_COUNTER("instruction_stctl"),
> +	STATS_DESC_COUNTER("instruction_stctg"),
> +	STATS_DESC_COUNTER("exit_program_interruption"),
> +	STATS_DESC_COUNTER("exit_instr_and_program"),
> +	STATS_DESC_COUNTER("exit_operation_exception"),
> +	STATS_DESC_COUNTER("deliver_ckc"),
> +	STATS_DESC_COUNTER("deliver_cputm"),
> +	STATS_DESC_COUNTER("deliver_external_call"),
> +	STATS_DESC_COUNTER("deliver_emergency_signal"),
> +	STATS_DESC_COUNTER("deliver_service_signal"),
> +	STATS_DESC_COUNTER("deliver_virtio"),
> +	STATS_DESC_COUNTER("deliver_stop_signal"),
> +	STATS_DESC_COUNTER("deliver_prefix_signal"),
> +	STATS_DESC_COUNTER("deliver_restart_signal"),
> +	STATS_DESC_COUNTER("deliver_program"),
> +	STATS_DESC_COUNTER("deliver_io"),
> +	STATS_DESC_COUNTER("deliver_machine_check"),
> +	STATS_DESC_COUNTER("exit_wait_state"),
> +	STATS_DESC_COUNTER("inject_ckc"),
> +	STATS_DESC_COUNTER("inject_cputm"),
> +	STATS_DESC_COUNTER("inject_external_call"),
> +	STATS_DESC_COUNTER("inject_emergency_signal"),
> +	STATS_DESC_COUNTER("inject_mchk"),
> +	STATS_DESC_COUNTER("inject_pfault_init"),
> +	STATS_DESC_COUNTER("inject_program"),
> +	STATS_DESC_COUNTER("inject_restart"),
> +	STATS_DESC_COUNTER("inject_set_prefix"),
> +	STATS_DESC_COUNTER("inject_stop_signal"),
> +	STATS_DESC_COUNTER("instruction_epsw"),
> +	STATS_DESC_COUNTER("instruction_gs"),
> +	STATS_DESC_COUNTER("instruction_io_other"),
> +	STATS_DESC_COUNTER("instruction_lpsw"),
> +	STATS_DESC_COUNTER("instruction_lpswe"),
> +	STATS_DESC_COUNTER("instruction_pfmf"),
> +	STATS_DESC_COUNTER("instruction_ptff"),
> +	STATS_DESC_COUNTER("instruction_sck"),
> +	STATS_DESC_COUNTER("instruction_sckpf"),
> +	STATS_DESC_COUNTER("instruction_stidp"),
> +	STATS_DESC_COUNTER("instruction_spx"),
> +	STATS_DESC_COUNTER("instruction_stpx"),
> +	STATS_DESC_COUNTER("instruction_stap"),
> +	STATS_DESC_COUNTER("instruction_iske"),
> +	STATS_DESC_COUNTER("instruction_ri"),
> +	STATS_DESC_COUNTER("instruction_rrbe"),
> +	STATS_DESC_COUNTER("instruction_sske"),
> +	STATS_DESC_COUNTER("instruction_ipte_interlock"),
> +	STATS_DESC_COUNTER("instruction_stsi"),
> +	STATS_DESC_COUNTER("instruction_stfl"),
> +	STATS_DESC_COUNTER("instruction_tb"),
> +	STATS_DESC_COUNTER("instruction_tpi"),
> +	STATS_DESC_COUNTER("instruction_tprot"),
> +	STATS_DESC_COUNTER("instruction_tsch"),
> +	STATS_DESC_COUNTER("instruction_sie"),
> +	STATS_DESC_COUNTER("instruction_essa"),
> +	STATS_DESC_COUNTER("instruction_sthyi"),
> +	STATS_DESC_COUNTER("instruction_sigp_sense"),
> +	STATS_DESC_COUNTER("instruction_sigp_sense_running"),
> +	STATS_DESC_COUNTER("instruction_sigp_external_call"),
> +	STATS_DESC_COUNTER("instruction_sigp_emergency"),
> +	STATS_DESC_COUNTER("instruction_sigp_cond_emergency"),
> +	STATS_DESC_COUNTER("instruction_sigp_start"),
> +	STATS_DESC_COUNTER("instruction_sigp_stop"),
> +	STATS_DESC_COUNTER("instruction_sigp_stop_store_status"),
> +	STATS_DESC_COUNTER("instruction_sigp_store_status"),
> +	STATS_DESC_COUNTER("instruction_sigp_store_adtl_status"),
> +	STATS_DESC_COUNTER("instruction_sigp_arch"),
> +	STATS_DESC_COUNTER("instruction_sigp_prefix"),
> +	STATS_DESC_COUNTER("instruction_sigp_restart"),
> +	STATS_DESC_COUNTER("instruction_sigp_init_cpu_reset"),
> +	STATS_DESC_COUNTER("instruction_sigp_cpu_reset"),
> +	STATS_DESC_COUNTER("instruction_sigp_unknown"),
> +	STATS_DESC_COUNTER("diagnose_10"),
> +	STATS_DESC_COUNTER("diagnose_44"),
> +	STATS_DESC_COUNTER("diagnose_9c"),
> +	STATS_DESC_COUNTER("diagnose_9c_ignored"),
> +	STATS_DESC_COUNTER("diagnose_258"),
> +	STATS_DESC_COUNTER("diagnose_308"),
> +	STATS_DESC_COUNTER("diagnose_500"),
> +	STATS_DESC_COUNTER("diagnose_other"),
> +	STATS_DESC_COUNTER("pfault_sync"));
> +
> +struct _kvm_stats_header kvm_vcpu_stats_header = {
> +	.name_size = KVM_STATS_NAME_LEN,
> +	.count = ARRAY_SIZE(kvm_vcpu_stats_desc),
> +	.desc_offset = sizeof(struct kvm_stats_header),
> +	.data_offset = sizeof(struct kvm_stats_header) +
> +		sizeof(kvm_vcpu_stats_desc),
> +};
> +
>   struct kvm_stats_debugfs_item debugfs_entries[] = {
>   	VCPU_STAT("userspace_handled", exit_userspace),
>   	VCPU_STAT("exit_null", exit_null),
> diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c
> index 96d10253218a..8e900c482626 100644
> --- a/arch/x86/kvm/x86.c
> +++ b/arch/x86/kvm/x86.c
> @@ -214,6 +214,59 @@ EXPORT_SYMBOL_GPL(host_xss);
>   u64 __read_mostly supported_xss;
>   EXPORT_SYMBOL_GPL(supported_xss);
>   
> +struct _kvm_stats_desc kvm_vm_stats_desc[] = DEFINE_VM_STATS_DESC(
> +	STATS_DESC_COUNTER("mmu_shadow_zapped"),
> +	STATS_DESC_COUNTER("mmu_pte_write"),
> +	STATS_DESC_COUNTER("mmu_pde_zapped"),
> +	STATS_DESC_COUNTER("mmu_flooded"),
> +	STATS_DESC_COUNTER("mmu_recycled"),
> +	STATS_DESC_COUNTER("mmu_cache_miss"),
> +	STATS_DESC_ICOUNTER("mmu_unsync"),
> +	STATS_DESC_ICOUNTER("largepages"),
> +	STATS_DESC_ICOUNTER("nx_largepages_splits"),
> +	STATS_DESC_ICOUNTER("max_mmu_page_hash_collisions"));
> +
> +struct _kvm_stats_header kvm_vm_stats_header = {
> +	.name_size = KVM_STATS_NAME_LEN,
> +	.count = ARRAY_SIZE(kvm_vm_stats_desc),
> +	.desc_offset = sizeof(struct kvm_stats_header),
> +	.data_offset = sizeof(struct kvm_stats_header) +
> +		sizeof(kvm_vm_stats_desc),
> +};
> +
> +struct _kvm_stats_desc kvm_vcpu_stats_desc[] = DEFINE_VCPU_STATS_DESC(
> +	STATS_DESC_COUNTER("pf_fixed"),
> +	STATS_DESC_COUNTER("pf_guest"),
> +	STATS_DESC_COUNTER("tlb_flush"),
> +	STATS_DESC_COUNTER("invlpg"),
> +	STATS_DESC_COUNTER("exits"),
> +	STATS_DESC_COUNTER("io_exits"),
> +	STATS_DESC_COUNTER("mmio_exits"),
> +	STATS_DESC_COUNTER("signal_exits"),
> +	STATS_DESC_COUNTER("irq_window_exits"),
> +	STATS_DESC_COUNTER("nmi_window_exits"),
> +	STATS_DESC_COUNTER("l1d_flush"),
> +	STATS_DESC_COUNTER("halt_exits"),
> +	STATS_DESC_COUNTER("request_irq_exits"),
> +	STATS_DESC_COUNTER("irq_exits"),
> +	STATS_DESC_COUNTER("host_state_reload"),
> +	STATS_DESC_COUNTER("fpu_reload"),
> +	STATS_DESC_COUNTER("insn_emulation"),
> +	STATS_DESC_COUNTER("insn_emulation_fail"),
> +	STATS_DESC_COUNTER("hypercalls"),
> +	STATS_DESC_COUNTER("irq_injections"),
> +	STATS_DESC_COUNTER("nmi_injections"),
> +	STATS_DESC_COUNTER("req_event"),
> +	STATS_DESC_COUNTER("nested_run"));
> +
> +struct _kvm_stats_header kvm_vcpu_stats_header = {
> +	.name_size = KVM_STATS_NAME_LEN,
> +	.count = ARRAY_SIZE(kvm_vcpu_stats_desc),
> +	.desc_offset = sizeof(struct kvm_stats_header),
> +	.data_offset = sizeof(struct kvm_stats_header) +
> +		sizeof(kvm_vcpu_stats_desc),
> +};
> +
>   struct kvm_stats_debugfs_item debugfs_entries[] = {
>   	VCPU_STAT("pf_fixed", pf_fixed),
>   	VCPU_STAT("pf_guest", pf_guest),
> diff --git a/include/linux/kvm_host.h b/include/linux/kvm_host.h
> index 1870fa928762..873e21af36be 100644
> --- a/include/linux/kvm_host.h
> +++ b/include/linux/kvm_host.h
> @@ -1240,6 +1240,19 @@ struct kvm_stats_debugfs_item {
>   	int mode;
>   };
>   
> +struct _kvm_stats_header {
> +	__u32 name_size;
> +	__u32 count;
> +	__u32 desc_offset;
> +	__u32 data_offset;
> +};
> +
> +#define KVM_STATS_NAME_LEN	48
> +struct _kvm_stats_desc {
> +	struct kvm_stats_desc desc;
> +	char name[KVM_STATS_NAME_LEN];
> +};
> +
>   #define KVM_DBGFS_GET_MODE(dbgfs_item)                                         \
>   	((dbgfs_item)->mode ? (dbgfs_item)->mode : 0644)
>   
> @@ -1253,8 +1266,127 @@ struct kvm_stats_debugfs_item {
>   	{ n, offsetof(struct kvm_vcpu, stat.generic.x),			       \
>   	  KVM_STAT_VCPU, ## __VA_ARGS__ }
>   
> +#define STATS_DESC(stat, type, unit, scale, exp)			       \
> +	{								       \
> +		{							       \
> +			.flags = type | unit | scale,			       \
> +			.exponent = exp,				       \
> +			.size = 1					       \
> +		},							       \
> +		.name = stat,						       \
> +	}
> +#define STATS_DESC_CUMULATIVE(name, unit, scale, exponent)		       \
> +	STATS_DESC(name, KVM_STATS_TYPE_CUMULATIVE, unit, scale, exponent)
> +#define STATS_DESC_INSTANT(name, unit, scale, exponent)			       \
> +	STATS_DESC(name, KVM_STATS_TYPE_INSTANT, unit, scale, exponent)
> +
> +/* Cumulative counter */
> +#define STATS_DESC_COUNTER(name)					       \
> +	STATS_DESC_CUMULATIVE(name, KVM_STATS_UNIT_NONE,		       \
> +		KVM_STATS_BASE_POW10, 0)
> +/* Instantaneous counter */
> +#define STATS_DESC_ICOUNTER(name)					       \
> +	STATS_DESC_INSTANT(name, KVM_STATS_UNIT_NONE,			       \
> +		KVM_STATS_BASE_POW10, 0)
> +
> +/* Cumulative clock cycles */
> +#define STATS_DESC_CYCLE(name)						       \
> +	STATS_DESC_CUMULATIVE(name, KVM_STATS_UNIT_CYCLES,		       \
> +		KVM_STATS_BASE_POW10, 0)
> +/* Instantaneous clock cycles */
> +#define STATS_DESC_ICYCLE(name)						       \
> +	STATS_DESC_INSTANT(name, KVM_STATS_UNIT_CYCLES,			       \
> +		KVM_STATS_BASE_POW10, 0)
> +
> +/* Cumulative memory size in Byte */
> +#define STATS_DESC_SIZE_BYTE(name)					       \
> +	STATS_DESC_CUMULATIVE(name, KVM_STATS_UNIT_BYTES,		       \
> +		KVM_STATS_BASE_POW2, 0)
> +/* Cumulative memory size in KiByte */
> +#define STATS_DESC_SIZE_KBYTE(name)					       \
> +	STATS_DESC_CUMULATIVE(name, KVM_STATS_UNIT_BYTES,		       \
> +		KVM_STATS_BASE_POW2, 10)
> +/* Cumulative memory size in MiByte */
> +#define STATS_DESC_SIZE_MBYTE(name)					       \
> +	STATS_DESC_CUMULATIVE(name, KVM_STATS_UNIT_BYTES,		       \
> +		KVM_STATS_BASE_POW2, 20)
> +/* Cumulative memory size in GiByte */
> +#define STATS_DESC_SIZE_GBYTE(name)					       \
> +	STATS_DESC_CUMULATIVE(name, KVM_STATS_UNIT_BYTES,		       \
> +		KVM_STATS_BASE_POW2, 30)
> +
> +/* Instantaneous memory size in Byte */
> +#define STATS_DESC_ISIZE_BYTE(name)					       \
> +	STATS_DESC_INSTANT(name, KVM_STATS_UNIT_BYTES,			       \
> +		KVM_STATS_BASE_POW2, 0)
> +/* Instantaneous memory size in KiByte */
> +#define STATS_DESC_ISIZE_KBYTE(name)					       \
> +	STATS_DESC_INSTANT(name, KVM_STATS_UNIT_BYTES,			       \
> +		KVM_STATS_BASE_POW2, 10)
> +/* Instantaneous memory size in MiByte */
> +#define STATS_DESC_ISIZE_MBYTE(name)					       \
> +	STATS_DESC_INSTANT(name, KVM_STATS_UNIT_BYTES,			       \
> +		KVM_STATS_BASE_POW2, 20)
> +/* Instantaneous memory size in GiByte */
> +#define STATS_DESC_ISIZE_GBYTE(name)					       \
> +	STATS_DESC_INSTANT(name, KVM_STATS_UNIT_BYTES,			       \
> +		KVM_STATS_BASE_POW2, 30)
> +
> +/* Cumulative time in second */
> +#define STATS_DESC_TIME_SEC(name)					       \
> +	STATS_DESC_CUMULATIVE(name, KVM_STATS_UNIT_SECONDS,		       \
> +		KVM_STATS_BASE_POW10, 0)
> +/* Cumulative time in millisecond */
> +#define STATS_DESC_TIME_MSEC(name)					       \
> +	STATS_DESC_CUMULATIVE(name, KVM_STATS_UNIT_SECONDS,		       \
> +		KVM_STATS_BASE_POW10, -3)
> +/* Cumulative time in microsecond */
> +#define STATS_DESC_TIME_USEC(name)					       \
> +	STATS_DESC_CUMULATIVE(name, KVM_STATS_UNIT_SECONDS,		       \
> +		KVM_STATS_BASE_POW10, -6)
> +/* Cumulative time in nanosecond */
> +#define STATS_DESC_TIME_NSEC(name)					       \
> +	STATS_DESC_CUMULATIVE(name, KVM_STATS_UNIT_SECONDS,		       \
> +		KVM_STATS_BASE_POW10, -9)
> +
> +/* Instantaneous time in second */
> +#define STATS_DESC_ITIME_SEC(name)					       \
> +	STATS_DESC_INSTANT(name, KVM_STATS_UNIT_SECONDS,		       \
> +		KVM_STATS_BASE_POW10, 0)
> +/* Instantaneous time in millisecond */
> +#define STATS_DESC_ITIME_MSEC(name)					       \
> +	STATS_DESC_INSTANT(name, KVM_STATS_UNIT_SECONDS,		       \
> +		KVM_STATS_BASE_POW10, -3)
> +/* Instantaneous time in microsecond */
> +#define STATS_DESC_ITIME_USEC(name)					       \
> +	STATS_DESC_INSTANT(name, KVM_STATS_UNIT_SECONDS,		       \
> +		KVM_STATS_BASE_POW10, -6)
> +/* Instantaneous time in nanosecond */
> +#define STATS_DESC_ITIME_NSEC(name)					       \
> +	STATS_DESC_INSTANT(name, KVM_STATS_UNIT_SECONDS,		       \
> +		KVM_STATS_BASE_POW10, -9)
> +
> +#define DEFINE_VM_STATS_DESC(...) {					       \
> +	STATS_DESC_COUNTER("remote_tlb_flush"),				       \
> +	## __VA_ARGS__							       \
> +}
> +
> +#define DEFINE_VCPU_STATS_DESC(...) {					       \
> +	STATS_DESC_COUNTER("halt_successful_poll"),			       \
> +	STATS_DESC_COUNTER("halt_attempted_poll"),			       \
> +	STATS_DESC_COUNTER("halt_poll_invalid"),			       \
> +	STATS_DESC_COUNTER("halt_wakeup"),				       \
> +	STATS_DESC_TIME_NSEC("halt_poll_success_ns"),			       \
> +	STATS_DESC_TIME_NSEC("halt_poll_fail_ns"),			       \
> +	## __VA_ARGS__							       \
> +}
> +
>   extern struct kvm_stats_debugfs_item debugfs_entries[];
>   extern struct dentry *kvm_debugfs_dir;
> +extern struct _kvm_stats_header kvm_vm_stats_header;
> +extern struct _kvm_stats_header kvm_vcpu_stats_header;
> +extern struct _kvm_stats_desc kvm_vm_stats_desc[];
> +extern struct _kvm_stats_desc kvm_vcpu_stats_desc[];
>   
>   #if defined(CONFIG_MMU_NOTIFIER) && defined(KVM_ARCH_WANT_MMU_NOTIFIER)
>   static inline int mmu_notifier_retry(struct kvm *kvm, unsigned long mmu_seq)
> diff --git a/include/uapi/linux/kvm.h b/include/uapi/linux/kvm.h
> index 3fd9a7e9d90c..dcfa0315e3f9 100644
> --- a/include/uapi/linux/kvm.h
> +++ b/include/uapi/linux/kvm.h
> @@ -1082,6 +1082,7 @@ struct kvm_ppc_resize_hpt {
>   #define KVM_CAP_SGX_ATTRIBUTE 196
>   #define KVM_CAP_VM_COPY_ENC_CONTEXT_FROM 197
>   #define KVM_CAP_PTP_KVM 198
> +#define KVM_CAP_STATS_BINARY_FD 199
>   
>   #ifdef KVM_CAP_IRQ_ROUTING
>   
> @@ -1898,4 +1899,53 @@ struct kvm_dirty_gfn {
>   #define KVM_BUS_LOCK_DETECTION_OFF             (1 << 0)
>   #define KVM_BUS_LOCK_DETECTION_EXIT            (1 << 1)
>   
> +#define KVM_STATS_ID_MAXLEN		64
> +
> +struct kvm_stats_header {
> +	char id[KVM_STATS_ID_MAXLEN];
> +	__u32 name_size;
> +	__u32 count;
> +	__u32 desc_offset;
> +	__u32 data_offset;
> +};
> +
> +#define KVM_STATS_TYPE_SHIFT		0
> +#define KVM_STATS_TYPE_MASK		(0xF << KVM_STATS_TYPE_SHIFT)
> +#define KVM_STATS_TYPE_CUMULATIVE	(0x0 << KVM_STATS_TYPE_SHIFT)
> +#define KVM_STATS_TYPE_INSTANT		(0x1 << KVM_STATS_TYPE_SHIFT)
> +#define KVM_STATS_TYPE_MAX		KVM_STATS_TYPE_INSTANT
> +
> +#define KVM_STATS_UNIT_SHIFT		4
> +#define KVM_STATS_UNIT_MASK		(0xF << KVM_STATS_UNIT_SHIFT)
> +#define KVM_STATS_UNIT_NONE		(0x0 << KVM_STATS_UNIT_SHIFT)
> +#define KVM_STATS_UNIT_BYTES		(0x1 << KVM_STATS_UNIT_SHIFT)
> +#define KVM_STATS_UNIT_SECONDS		(0x2 << KVM_STATS_UNIT_SHIFT)
> +#define KVM_STATS_UNIT_CYCLES		(0x3 << KVM_STATS_UNIT_SHIFT)
> +#define KVM_STATS_UNIT_MAX		KVM_STATS_UNIT_CYCLES
> +
> +#define KVM_STATS_BASE_SHIFT		8
> +#define KVM_STATS_BASE_MASK		(0xF << KVM_STATS_BASE_SHIFT)
> +#define KVM_STATS_BASE_POW10		(0x0 << KVM_STATS_BASE_SHIFT)
> +#define KVM_STATS_BASE_POW2		(0x1 << KVM_STATS_BASE_SHIFT)
> +#define KVM_STATS_BASE_MAX		KVM_STATS_BASE_POW2
> +
> +struct kvm_stats_desc {
> +	__u32 flags;
> +	__s16 exponent;
> +	__u16 size;
> +	__u32 unused1;
> +	__u32 unused2;
> +	char name[0];
> +};
> +
> +struct kvm_vm_stats_data {
> +	unsigned long value[0];
> +};
> +
> +struct kvm_vcpu_stats_data {
> +	__u64 value[0];
> +};
> +
> +#define KVM_GET_STATS_FD  _IOR(KVMIO,  0xcc, struct kvm_stats_header)
> +
>   #endif /* __LINUX_KVM_H */
> diff --git a/virt/kvm/kvm_main.c b/virt/kvm/kvm_main.c
> index f6ad5b080994..d84bb17bdea8 100644
> --- a/virt/kvm/kvm_main.c
> +++ b/virt/kvm/kvm_main.c
> @@ -3409,6 +3409,115 @@ static int kvm_vcpu_ioctl_set_sigmask(struct kvm_vcpu *vcpu, sigset_t *sigset)
>   	return 0;
>   }
>   
> +static ssize_t kvm_stats_read(char *id, struct _kvm_stats_header *header,
> +		struct _kvm_stats_desc *desc, void *stats, size_t size_stats,
> +		char __user *user_buffer, size_t size, loff_t *offset)
> +{
> +	ssize_t copylen, len, remain = size;
> +	size_t size_header, size_desc;
> +	loff_t pos = *offset;
> +	char __user *dest = user_buffer;
> +	void *src;
> +
> +	size_header = sizeof(*header);
> +	size_desc = header->count * sizeof(*desc);
> +
> +	len = KVM_STATS_ID_MAXLEN + size_header + size_desc + size_stats - pos;
> +	len = min(len, remain);
> +	if (len <= 0)
> +		return 0;
> +	remain = len;
> +
> +	/* Copy kvm stats header id string */
> +	copylen = KVM_STATS_ID_MAXLEN - pos;
> +	copylen = min(copylen, remain);
> +	if (copylen > 0) {
> +		src = id + pos;
> +		if (copy_to_user(dest, src, copylen))
> +			return -EFAULT;
> +		remain -= copylen;
> +		pos += copylen;
> +		dest += copylen;
> +	}
> +	/* Copy kvm stats header */
> +	copylen = KVM_STATS_ID_MAXLEN + size_header - pos;
> +	copylen = min(copylen, remain);
> +	if (copylen > 0) {
> +		src = header + pos - KVM_STATS_ID_MAXLEN;
> +		if (copy_to_user(dest, src, copylen))
> +			return -EFAULT;
> +		remain -= copylen;
> +		pos += copylen;
> +		dest += copylen;
> +	}
> +	/* Copy kvm stats descriptors */
> +	copylen = header->desc_offset + size_desc - pos;
> +	copylen = min(copylen, remain);
> +	if (copylen > 0) {
> +		src = desc + pos - header->desc_offset;
> +		if (copy_to_user(dest, src, copylen))
> +			return -EFAULT;
> +		remain -= copylen;
> +		pos += copylen;
> +		dest += copylen;
> +	}
> +	/* Copy kvm stats values */
> +	copylen = header->data_offset + size_stats - pos;
> +	copylen = min(copylen, remain);
> +	if (copylen > 0) {
> +		src = stats + pos - header->data_offset;
> +		if (copy_to_user(dest, src, copylen))
> +			return -EFAULT;
> +		remain -= copylen;
> +		pos += copylen;
> +		dest += copylen;
> +	}
> +
> +	*offset = pos;
> +	return len;
> +}
> +
> +static ssize_t kvm_vcpu_stats_read(struct file *file, char __user *user_buffer,
> +			      size_t size, loff_t *offset)
> +{
> +	char id[KVM_STATS_ID_MAXLEN];
> +	struct kvm_vcpu *vcpu = file->private_data;
> +
> +	snprintf(id, sizeof(id), "kvm-%d/vcpu-%d",
> +			task_pid_nr(current), vcpu->vcpu_id);
> +	return kvm_stats_read(id, &kvm_vcpu_stats_header,
> +			&kvm_vcpu_stats_desc[0], &vcpu->stat,
> +			sizeof(vcpu->stat), user_buffer, size, offset);
> +}
> +
> +static const struct file_operations kvm_vcpu_stats_fops = {
> +	.read = kvm_vcpu_stats_read,
> +	.llseek = noop_llseek,
> +};
> +
> +static int kvm_vcpu_ioctl_get_stats_fd(struct kvm_vcpu *vcpu)
> +{
> +	int fd;
> +	struct file *file;
> +	char name[15 + ITOA_MAX_LEN + 1];
> +
> +	snprintf(name, sizeof(name), "kvm-vcpu-stats:%d", vcpu->vcpu_id);
> +
> +	fd = get_unused_fd_flags(O_CLOEXEC);
> +	if (fd < 0)
> +		return fd;
> +
> +	file = anon_inode_getfile(name, &kvm_vcpu_stats_fops, vcpu, O_RDONLY);
> +	if (IS_ERR(file)) {
> +		put_unused_fd(fd);
> +		return PTR_ERR(file);
> +	}
> +	file->f_mode |= FMODE_PREAD;
> +	fd_install(fd, file);
> +
> +	return fd;
> +}
> +
>   static long kvm_vcpu_ioctl(struct file *filp,
>   			   unsigned int ioctl, unsigned long arg)
>   {
> @@ -3606,6 +3715,10 @@ static long kvm_vcpu_ioctl(struct file *filp,
>   		r = kvm_arch_vcpu_ioctl_set_fpu(vcpu, fpu);
>   		break;
>   	}
> +	case KVM_GET_STATS_FD: {
> +		r = kvm_vcpu_ioctl_get_stats_fd(vcpu);
> +		break;
> +	}
>   	default:
>   		r = kvm_arch_vcpu_ioctl(filp, ioctl, arg);
>   	}
> @@ -3864,6 +3977,8 @@ static long kvm_vm_ioctl_check_extension_generic(struct kvm *kvm, long arg)
>   #else
>   		return 0;
>   #endif
> +	case KVM_CAP_STATS_BINARY_FD:

Nit:   KVM_CAP_BINARY_STATS_FD

> +		return 1;
>   	default:
>   		break;
>   	}
> @@ -3967,6 +4082,43 @@ static int kvm_vm_ioctl_enable_cap_generic(struct kvm *kvm,
>   	}
>   }
>   
> +static ssize_t kvm_vm_stats_read(struct file *file, char __user *user_buffer,
> +			      size_t size, loff_t *offset)
> +{
> +	char id[KVM_STATS_ID_MAXLEN];
> +	struct kvm *kvm = file->private_data;
> +
> +	snprintf(id, sizeof(id), "kvm-%d", task_pid_nr(current));
> +	return kvm_stats_read(id, &kvm_vm_stats_header, &kvm_vm_stats_desc[0],
> +		&kvm->stat, sizeof(kvm->stat), user_buffer, size, offset);
> +}
> +
> +static const struct file_operations kvm_vm_stats_fops = {
> +	.read = kvm_vm_stats_read,
> +	.llseek = noop_llseek,
> +};
> +
> +static int kvm_vm_ioctl_get_stats_fd(struct kvm *kvm)
> +{
> +	int fd;
> +	struct file *file;
> +
> +	fd = get_unused_fd_flags(O_CLOEXEC);
> +	if (fd < 0)
> +		return fd;
> +
> +	file = anon_inode_getfile("kvm-vm-stats",
> +			&kvm_vm_stats_fops, kvm, O_RDONLY);
> +	if (IS_ERR(file)) {
> +		put_unused_fd(fd);
> +		return PTR_ERR(file);
> +	}
> +	file->f_mode |= FMODE_PREAD;
> +	fd_install(fd, file);
> +
> +	return fd;
> +}
> +
>   static long kvm_vm_ioctl(struct file *filp,
>   			   unsigned int ioctl, unsigned long arg)
>   {
> @@ -4149,6 +4301,9 @@ static long kvm_vm_ioctl(struct file *filp,
>   	case KVM_RESET_DIRTY_RINGS:
>   		r = kvm_vm_ioctl_reset_dirty_pages(kvm);
>   		break;
> +	case KVM_GET_STATS_FD:
> +		r = kvm_vm_ioctl_get_stats_fd(kvm);
> +		break;
>   	default:
>   		r = kvm_arch_vm_ioctl(filp, ioctl, arg);
>   	}
Reviewed-by: Krish Sadhukhan <krish.sadhukhan@oracle.com>

^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: [PATCH v7 3/4] KVM: stats: Add documentation for statistics data binary interface
  2021-06-03 21:14 ` [PATCH v7 3/4] KVM: stats: Add documentation for statistics data binary interface Jing Zhang
@ 2021-06-07 19:22   ` Krish Sadhukhan
  2021-06-07 22:09     ` Jing Zhang
  2021-06-14  7:56   ` Fuad Tabba
  1 sibling, 1 reply; 32+ messages in thread
From: Krish Sadhukhan @ 2021-06-07 19:22 UTC (permalink / raw)
  To: Jing Zhang, KVM, KVMARM, LinuxMIPS, KVMPPC, LinuxS390,
	Linuxkselftest, Paolo Bonzini, Marc Zyngier, James Morse,
	Julien Thierry, Suzuki K Poulose, Will Deacon, Huacai Chen,
	Aleksandar Markovic, Thomas Bogendoerfer, Paul Mackerras,
	Christian Borntraeger, Janosch Frank, David Hildenbrand,
	Cornelia Huck, Claudio Imbrenda, Sean Christopherson,
	Vitaly Kuznetsov, Jim Mattson, Peter Shier, Oliver Upton,
	David Rientjes, Emanuele Giuseppe Esposito, David Matlack,
	Ricardo Koller


On 6/3/21 2:14 PM, Jing Zhang wrote:
> Update KVM API documentation for binary statistics.
>
> Reviewed-by: David Matlack <dmatlack@google.com>
> Reviewed-by: Ricardo Koller <ricarkol@google.com>
> Signed-off-by: Jing Zhang <jingzhangos@google.com>
> ---
>   Documentation/virt/kvm/api.rst | 180 +++++++++++++++++++++++++++++++++
>   1 file changed, 180 insertions(+)
>
> diff --git a/Documentation/virt/kvm/api.rst b/Documentation/virt/kvm/api.rst
> index 7fcb2fd38f42..550bfbdf611b 100644
> --- a/Documentation/virt/kvm/api.rst
> +++ b/Documentation/virt/kvm/api.rst
> @@ -5034,6 +5034,178 @@ see KVM_XEN_VCPU_SET_ATTR above.
>   The KVM_XEN_VCPU_ATTR_TYPE_RUNSTATE_ADJUST type may not be used
>   with the KVM_XEN_VCPU_GET_ATTR ioctl.
>   
> +4.130 KVM_GET_STATS_FD
> +---------------------
> +
> +:Capability: KVM_CAP_STATS_BINARY_FD
> +:Architectures: all
> +:Type: vm ioctl, vcpu ioctl
> +:Parameters: none
> +:Returns: statistics file descriptor on success, < 0 on error
> +
> +Errors:
> +
> +  ======     ======================================================
> +  ENOMEM     if the fd could not be created due to lack of memory
> +  EMFILE     if the number of opened files exceeds the limit
> +  ======     ======================================================
> +
> +The file descriptor can be used to read VM/vCPU statistics data in binary
> +format. The file data is organized into three blocks as below:
> ++-------------+
> +|   Header    |
> ++-------------+
> +| Descriptors |
> ++-------------+
> +| Stats Data  |
> ++-------------+
> +
> +The Header block is always at the start of the file. It is only needed to be
> +read one time for the lifetime of the file descriptor.
> +It is in the form of ``struct kvm_stats_header`` as below::
> +
> +	#define KVM_STATS_ID_MAXLEN		64
> +
> +	struct kvm_stats_header {
> +		char id[KVM_STATS_ID_MAXLEN];
> +		__u32 name_size;
> +		__u32 count;
> +		__u32 desc_offset;
> +		__u32 data_offset;
> +	};
> +
> +The ``id`` field is identification for the corresponding KVM statistics. For
> +VM statistics, it is in the form of "kvm-{kvm pid}", like "kvm-12345". For
> +VCPU statistics, it is in the form of "kvm-{kvm pid}/vcpu-{vcpu id}", like
> +"kvm-12345/vcpu-12".

Currently, KVM debugfs shows VCPUs as "vcpuXX" where is XX is the id. 
Should we follow the same convention ?
> +
> +The ``name_size`` field is the size (byte) of the statistics name string
> +(including trailing '\0') appended to the end of every statistics descriptor.
> +
> +The ``count`` field is the number of statistics.
> +
> +The ``desc_offset`` field is the offset of the Descriptors block from the start
> +of the file indicated by the file descriptor.
> +
> +The ``data_offset`` field is the offset of the Stats Data block from the start
> +of the file indicated by the file descriptor.
> +
> +The Descriptors block is only needed to be read once for the lifetime of the
> +file descriptor. It is an array of ``struct kvm_stats_desc`` as shown in
> +below code block::
> +
> +	#define KVM_STATS_TYPE_SHIFT		0
> +	#define KVM_STATS_TYPE_MASK		(0xF << KVM_STATS_TYPE_SHIFT)
> +	#define KVM_STATS_TYPE_CUMULATIVE	(0x0 << KVM_STATS_TYPE_SHIFT)
> +	#define KVM_STATS_TYPE_INSTANT		(0x1 << KVM_STATS_TYPE_SHIFT)
> +	#define KVM_STATS_TYPE_MAX		KVM_STATS_TYPE_INSTANT
> +
> +	#define KVM_STATS_UNIT_SHIFT		4
> +	#define KVM_STATS_UNIT_MASK		(0xF << KVM_STATS_UNIT_SHIFT)
> +	#define KVM_STATS_UNIT_NONE		(0x0 << KVM_STATS_UNIT_SHIFT)
> +	#define KVM_STATS_UNIT_BYTES		(0x1 << KVM_STATS_UNIT_SHIFT)
> +	#define KVM_STATS_UNIT_SECONDS		(0x2 << KVM_STATS_UNIT_SHIFT)
> +	#define KVM_STATS_UNIT_CYCLES		(0x3 << KVM_STATS_UNIT_SHIFT)
> +	#define KVM_STATS_UNIT_MAX		KVM_STATS_UNIT_CYCLES
> +
> +	#define KVM_STATS_BASE_SHIFT		8
> +	#define KVM_STATS_BASE_MASK		(0xF << KVM_STATS_BASE_SHIFT)
> +	#define KVM_STATS_BASE_POW10		(0x0 << KVM_STATS_BASE_SHIFT)
> +	#define KVM_STATS_BASE_POW2		(0x1 << KVM_STATS_BASE_SHIFT)
> +	#define KVM_STATS_BASE_MAX		KVM_STATS_BASE_POW2
> +
> +	struct kvm_stats_desc {
> +		__u32 flags;
> +		__s16 exponent;
> +		__u16 size;
> +		__u32 unused1;
> +		__u32 unused2;
> +		char name[0];
> +	};
> +
> +The ``flags`` field contains the type and unit of the statistics data described
> +by this descriptor. The following flags are supported:
> +
> +Bits 0-3 of ``flags`` encode the type:
> +  * ``KVM_STATS_TYPE_CUMULATIVE``
> +    The statistics data is cumulative. The value of data can only be increased.
> +    Most of the counters used in KVM are of this type.
> +    The corresponding ``count`` filed for this type is always 1.
> +  * ``KVM_STATS_TYPE_INSTANT``
> +    The statistics data is instantaneous. Its value can be increased or
> +    decreased. This type is usually used as a measurement of some resources,
> +    like the number of dirty pages, the number of large pages, etc.
> +    The corresponding ``count`` field for this type is always 1.
> +
> +Bits 4-7 of ``flags`` encode the unit:
> +  * ``KVM_STATS_UNIT_NONE``
> +    There is no unit for the value of statistics data. This usually means that
> +    the value is a simple counter of an event.
> +  * ``KVM_STATS_UNIT_BYTES``
> +    It indicates that the statistics data is used to measure memory size, in the
> +    unit of Byte, KiByte, MiByte, GiByte, etc. The unit of the data is
> +    determined by the ``exponent`` field in the descriptor. The
> +    ``KVM_STATS_BASE_POW2`` flag is valid in this case. The unit of the data is
> +    determined by ``pow(2, exponent)``. For example, if value is 10,
> +    ``exponent`` is 20, which means the unit of statistics data is MiByte, we
> +    can get the statistics data in the unit of Byte by
> +    ``value * pow(2, exponent) = 10 * pow(2, 20) = 10 MiByte`` which is
> +    10 * 1024 * 1024 Bytes.
> +  * ``KVM_STATS_UNIT_SECONDS``
> +    It indicates that the statistics data is used to measure time/latency, in
> +    the unit of nanosecond, microsecond, millisecond and second. The unit of the
> +    data is determined by the ``exponent`` field in the descriptor. The
> +    ``KVM_STATS_BASE_POW10`` flag is valid in this case. The unit of the data
> +    is determined by ``pow(10, exponent)``. For example, if value is 2000000,
> +    ``exponent`` is -6, which means the unit of statistics data is microsecond,
> +    we can get the statistics data in the unit of second by
> +    ``value * pow(10, exponent) = 2000000 * pow(10, -6) = 2 seconds``.
> +  * ``KVM_STATS_UNIT_CYCLES``
> +    It indicates that the statistics data is used to measure CPU clock cycles.
> +    The ``KVM_STATS_BASE_POW10`` flag is valid in this case. For example, if
> +    value is 200, ``exponent`` is 4, we can get the number of CPU clock cycles
> +    by ``value * pow(10, exponent) = 200 * pow(10, 4) = 2000000``.
> +
> +Bits 7-11 of ``flags`` encode the base:
> +  * ``KVM_STATS_BASE_POW10``
> +    The scale is based on power of 10. It is used for measurement of time and
> +    CPU clock cycles.
> +  * ``KVM_STATS_BASE_POW2``
> +    The scale is based on power of 2. It is used for measurement of memory size.
> +
> +The ``exponent`` field is the scale of corresponding statistics data. For
> +example, if the unit is ``KVM_STATS_UNIT_BYTES``, the base is
> +``KVM_STATS_BASE_POW2``, the ``exponent`` is 10, then we know that the real
> +unit of the statistics data is KBytes a.k.a pow(2, 10) = 1024 bytes.
> +
> +The ``size`` field is the number of values of this statistics data. It is in the
> +unit of ``unsigned long`` for VM or ``__u64`` for VCPU.
> +
> +The ``unused1`` and ``unused2`` fields are reserved for future
> +support for other types of statistics data, like log/linear histogram.
> +
> +The ``name`` field points to the name string of the statistics data. The name
> +string starts at the end of ``struct kvm_stats_desc``.
> +The maximum length (including trailing '\0') is indicated by ``name_size``
> +in ``struct kvm_stats_header``.
> +
> +The Stats Data block contains an array of data values of type ``struct
> +kvm_vm_stats_data`` or ``struct kvm_vcpu_stats_data``. It would be read by
> +user space periodically to pull statistics data.
> +The order of data value in Stats Data block is the same as the order of
> +descriptors in Descriptors block.
> +  * Statistics data for VM::
> +
> +	struct kvm_vm_stats_data {
> +		unsigned long value[0];
> +	};
> +
> +  * Statistics data for VCPU::
> +
> +	struct kvm_vcpu_stats_data {
> +		__u64 value[0];
> +	};
> +
>   5. The kvm_run structure
>   ========================
>   
> @@ -6891,3 +7063,11 @@ This capability is always enabled.
>   This capability indicates that the KVM virtual PTP service is
>   supported in the host. A VMM can check whether the service is
>   available to the guest on migration.
> +
> +8.33 KVM_CAP_STATS_BINARY_FD
> +----------------------------
> +
> +:Architectures: all
> +
> +This capability indicates the feature that user space can create get a file
> +descriptor for every VM and VCPU to read statistics data in binary format.


Reviewed-by: Krish Sadhukhan <krish.sadhukhan@oracle.com>


^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: [PATCH v7 4/4] KVM: selftests: Add selftest for KVM statistics data binary interface
  2021-06-03 21:14 ` [PATCH v7 4/4] KVM: selftests: Add selftest for KVM " Jing Zhang
@ 2021-06-07 19:23   ` Krish Sadhukhan
  2021-06-07 22:12     ` Jing Zhang
  0 siblings, 1 reply; 32+ messages in thread
From: Krish Sadhukhan @ 2021-06-07 19:23 UTC (permalink / raw)
  To: Jing Zhang, KVM, KVMARM, LinuxMIPS, KVMPPC, LinuxS390,
	Linuxkselftest, Paolo Bonzini, Marc Zyngier, James Morse,
	Julien Thierry, Suzuki K Poulose, Will Deacon, Huacai Chen,
	Aleksandar Markovic, Thomas Bogendoerfer, Paul Mackerras,
	Christian Borntraeger, Janosch Frank, David Hildenbrand,
	Cornelia Huck, Claudio Imbrenda, Sean Christopherson,
	Vitaly Kuznetsov, Jim Mattson, Peter Shier, Oliver Upton,
	David Rientjes, Emanuele Giuseppe Esposito, David Matlack,
	Ricardo Koller


On 6/3/21 2:14 PM, Jing Zhang wrote:
> Add selftest to check KVM stats descriptors validity.
>
> Reviewed-by: David Matlack <dmatlack@google.com>
> Reviewed-by: Ricardo Koller <ricarkol@google.com>
> Signed-off-by: Jing Zhang <jingzhangos@google.com>
> ---
>   tools/testing/selftests/kvm/.gitignore        |   1 +
>   tools/testing/selftests/kvm/Makefile          |   3 +
>   .../testing/selftests/kvm/include/kvm_util.h  |   3 +
>   .../selftests/kvm/kvm_binary_stats_test.c     | 215 ++++++++++++++++++
>   tools/testing/selftests/kvm/lib/kvm_util.c    |  12 +
>   5 files changed, 234 insertions(+)
>   create mode 100644 tools/testing/selftests/kvm/kvm_binary_stats_test.c
>
> diff --git a/tools/testing/selftests/kvm/.gitignore b/tools/testing/selftests/kvm/.gitignore
> index bd83158e0e0b..d1c3ee7d3e41 100644
> --- a/tools/testing/selftests/kvm/.gitignore
> +++ b/tools/testing/selftests/kvm/.gitignore
> @@ -43,3 +43,4 @@
>   /memslot_modification_stress_test
>   /set_memory_region_test
>   /steal_time
> +/kvm_binary_stats_test
> diff --git a/tools/testing/selftests/kvm/Makefile b/tools/testing/selftests/kvm/Makefile
> index e439d027939d..0cd46d6d1e15 100644
> --- a/tools/testing/selftests/kvm/Makefile
> +++ b/tools/testing/selftests/kvm/Makefile
> @@ -76,6 +76,7 @@ TEST_GEN_PROGS_x86_64 += kvm_page_table_test
>   TEST_GEN_PROGS_x86_64 += memslot_modification_stress_test
>   TEST_GEN_PROGS_x86_64 += set_memory_region_test
>   TEST_GEN_PROGS_x86_64 += steal_time
> +TEST_GEN_PROGS_x86_64 += kvm_binary_stats_test
>   
>   TEST_GEN_PROGS_aarch64 += aarch64/get-reg-list
>   TEST_GEN_PROGS_aarch64 += aarch64/get-reg-list-sve
> @@ -87,6 +88,7 @@ TEST_GEN_PROGS_aarch64 += kvm_create_max_vcpus
>   TEST_GEN_PROGS_aarch64 += kvm_page_table_test
>   TEST_GEN_PROGS_aarch64 += set_memory_region_test
>   TEST_GEN_PROGS_aarch64 += steal_time
> +TEST_GEN_PROGS_aarch64 += kvm_binary_stats_test
>   
>   TEST_GEN_PROGS_s390x = s390x/memop
>   TEST_GEN_PROGS_s390x += s390x/resets
> @@ -96,6 +98,7 @@ TEST_GEN_PROGS_s390x += dirty_log_test
>   TEST_GEN_PROGS_s390x += kvm_create_max_vcpus
>   TEST_GEN_PROGS_s390x += kvm_page_table_test
>   TEST_GEN_PROGS_s390x += set_memory_region_test
> +TEST_GEN_PROGS_s390x += kvm_binary_stats_test
>   
>   TEST_GEN_PROGS += $(TEST_GEN_PROGS_$(UNAME_M))
>   LIBKVM += $(LIBKVM_$(UNAME_M))
> diff --git a/tools/testing/selftests/kvm/include/kvm_util.h b/tools/testing/selftests/kvm/include/kvm_util.h
> index a8f022794ce3..96d15da3d72e 100644
> --- a/tools/testing/selftests/kvm/include/kvm_util.h
> +++ b/tools/testing/selftests/kvm/include/kvm_util.h
> @@ -387,4 +387,7 @@ uint64_t get_ucall(struct kvm_vm *vm, uint32_t vcpu_id, struct ucall *uc);
>   #define GUEST_ASSERT_4(_condition, arg1, arg2, arg3, arg4) \
>   	__GUEST_ASSERT((_condition), 4, (arg1), (arg2), (arg3), (arg4))
>   
> +int vm_get_stats_fd(struct kvm_vm *vm);
> +int vcpu_get_stats_fd(struct kvm_vm *vm, uint32_t vcpuid);
> +
>   #endif /* SELFTEST_KVM_UTIL_H */
> diff --git a/tools/testing/selftests/kvm/kvm_binary_stats_test.c b/tools/testing/selftests/kvm/kvm_binary_stats_test.c
> new file mode 100644
> index 000000000000..081983110dc5
> --- /dev/null
> +++ b/tools/testing/selftests/kvm/kvm_binary_stats_test.c
> @@ -0,0 +1,215 @@
> +// SPDX-License-Identifier: GPL-2.0-only
> +/*
> + * kvm_binary_stats_test
> + *
> + * Copyright (C) 2021, Google LLC.
> + *
> + * Test the fd-based interface for KVM statistics.
> + */
> +
> +#define _GNU_SOURCE /* for program_invocation_short_name */
> +#include <fcntl.h>
> +#include <stdio.h>
> +#include <stdlib.h>
> +#include <string.h>
> +#include <errno.h>
> +
> +#include "test_util.h"
> +
> +#include "kvm_util.h"
> +#include "asm/kvm.h"
> +#include "linux/kvm.h"
> +
> +void stats_test(int stats_fd, int size_stat)
> +{
> +	ssize_t ret;
> +	int i;
> +	size_t size_desc, size_data = 0;
> +	struct kvm_stats_header header;
> +	struct kvm_stats_desc *stats_desc, *pdesc;
> +	void *stats_data;
> +
> +	/* Read kvm stats header */
> +	ret = read(stats_fd, &header, sizeof(header));
> +	TEST_ASSERT(ret == sizeof(header), "Read stats header");
> +	size_desc = sizeof(*stats_desc) + header.name_size;
> +
> +	/* Check id string in header, that should start with "kvm" */
> +	TEST_ASSERT(!strncmp(header.id, "kvm", 3) &&
> +			strlen(header.id) < KVM_STATS_ID_MAXLEN,
> +			"Invalid KVM stats type");
> +
> +	/* Sanity check for other fields in header */
> +	if (header.count == 0) {
> +		printf("No KVM stats defined!");
> +		return;
> +	}
> +	/* Check overlap */
> +	TEST_ASSERT(header.desc_offset > 0 && header.data_offset > 0
> +			&& header.desc_offset >= sizeof(header)
> +			&& header.data_offset >= sizeof(header),
> +			"Invalid offset fields in header");
> +	TEST_ASSERT(header.desc_offset > header.data_offset
> +			|| (header.desc_offset + size_desc * header.count <=
> +				header.data_offset),
> +			"Descriptor block is overlapped with data block");
> +
> +	/* Allocate memory for stats descriptors */
> +	stats_desc = calloc(header.count, size_desc);
> +	TEST_ASSERT(stats_desc, "Allocate memory for stats descriptors");
> +	/* Read kvm stats descriptors */
> +	ret = pread(stats_fd, stats_desc,
> +			size_desc * header.count, header.desc_offset);
> +	TEST_ASSERT(ret == size_desc * header.count,
> +			"Read KVM stats descriptors");
> +
> +	/* Sanity check for fields in descriptors */
> +	for (i = 0; i < header.count; ++i) {
> +		pdesc = (void *)stats_desc + i * size_desc;
> +		/* Check type,unit,base boundaries */
> +		TEST_ASSERT((pdesc->flags & KVM_STATS_TYPE_MASK)
> +				<= KVM_STATS_TYPE_MAX, "Unknown KVM stats type");
> +		TEST_ASSERT((pdesc->flags & KVM_STATS_UNIT_MASK)
> +				<= KVM_STATS_UNIT_MAX, "Unknown KVM stats unit");
> +		TEST_ASSERT((pdesc->flags & KVM_STATS_BASE_MASK)
> +				<= KVM_STATS_BASE_MAX, "Unknown KVM stats base");
> +		/* Check exponent for stats unit
> +		 * Exponent for counter should be greater than or equal to 0
> +		 * Exponent for unit bytes should be greater than or equal to 0
> +		 * Exponent for unit seconds should be less than or equal to 0
> +		 * Exponent for unit clock cycles should be greater than or
> +		 * equal to 0
> +		 */
> +		switch (pdesc->flags & KVM_STATS_UNIT_MASK) {
> +		case KVM_STATS_UNIT_NONE:
> +		case KVM_STATS_UNIT_BYTES:
> +		case KVM_STATS_UNIT_CYCLES:
> +			TEST_ASSERT(pdesc->exponent >= 0,
> +					"Unsupported KVM stats unit");
> +			break;
> +		case KVM_STATS_UNIT_SECONDS:
> +			TEST_ASSERT(pdesc->exponent <= 0,
> +					"Unsupported KVM stats unit");
> +			break;
> +		}
> +		/* Check name string */
> +		TEST_ASSERT(strlen(pdesc->name) < header.name_size,
> +				"KVM stats name(%s) too long", pdesc->name);
> +		/* Check size field, which should not be zero */
> +		TEST_ASSERT(pdesc->size, "KVM descriptor(%s) with size of 0",
> +				pdesc->name);
> +		size_data += pdesc->size * size_stat;
> +	}
> +	/* Check overlap */
> +	TEST_ASSERT(header.data_offset >= header.desc_offset
> +			|| header.data_offset + size_data <= header.desc_offset,
> +			"Data block is overlapped with Descriptor block");
> +	/* Check validity of all stats data size */
> +	TEST_ASSERT(size_data >= header.count * size_stat,
> +			"Data size is not correct");
> +
> +	/* Allocate memory for stats data */
> +	stats_data = malloc(size_data);
> +	TEST_ASSERT(stats_data, "Allocate memory for stats data");
> +	/* Read kvm stats data as a bulk */
> +	ret = pread(stats_fd, stats_data, size_data, header.data_offset);
> +	TEST_ASSERT(ret == size_data, "Read KVM stats data");
> +	/* Read kvm stats data one by one */
> +	size_data = 0;
> +	for (i = 0; i < header.count; ++i) {
> +		pdesc = (void *)stats_desc + i * size_desc;
> +		ret = pread(stats_fd, stats_data, pdesc->size * size_stat,
> +				header.data_offset + size_data);
> +		TEST_ASSERT(ret == pdesc->size * size_stat,
> +				"Read data of KVM stats: %s", pdesc->name);
> +		size_data += pdesc->size * size_stat;
> +	}
> +
> +	free(stats_data);
> +	free(stats_desc);
> +}
> +
> +
> +void vm_stats_test(struct kvm_vm *vm)
> +{
> +	int stats_fd;
> +	struct kvm_vm_stats_data *stats_data;
> +
> +	/* Get fd for VM stats */
> +	stats_fd = vm_get_stats_fd(vm);
> +	TEST_ASSERT(stats_fd >= 0, "Get VM stats fd");
> +
> +	stats_test(stats_fd, sizeof(stats_data->value[0]));
> +	close(stats_fd);
> +	TEST_ASSERT(fcntl(stats_fd, F_GETFD) == -1, "Stats fd not freed");
> +}
> +
> +void vcpu_stats_test(struct kvm_vm *vm, int vcpu_id)
> +{
> +	int stats_fd;
> +	struct kvm_vcpu_stats_data *stats_data;
> +
> +	/* Get fd for VCPU stats */
> +	stats_fd = vcpu_get_stats_fd(vm, vcpu_id);
> +	TEST_ASSERT(stats_fd >= 0, "Get VCPU stats fd");
> +
> +	stats_test(stats_fd, sizeof(stats_data->value[0]));
> +	close(stats_fd);
> +	TEST_ASSERT(fcntl(stats_fd, F_GETFD) == -1, "Stats fd not freed");
> +}
> +
> +#define DEFAULT_NUM_VM		4
> +#define DEFAULT_NUM_VCPU	4
> +
> +/*
> + * Usage: kvm_bin_form_stats [#vm] [#vcpu]
> + * The first parameter #vm set the number of VMs being created.
> + * The second parameter #vcpu set the number of VCPUs being created.
> + * By default, DEFAULT_NUM_VM VM and DEFAULT_NUM_VCPU VCPU for the VM would be
> + * created for testing.
> + */
> +
> +int main(int argc, char *argv[])
> +{
> +	int max_vm = DEFAULT_NUM_VM, max_vcpu = DEFAULT_NUM_VCPU, ret, i, j;
> +	struct kvm_vm **vms;
> +
> +	/* Get the number of VMs and VCPUs that would be created for testing. */
> +	if (argc > 1) {
> +		max_vm = strtol(argv[1], NULL, 0);
> +		if (max_vm <= 0)
> +			max_vm = DEFAULT_NUM_VM;
> +	}
> +	if (argc > 2) {
> +		max_vcpu = strtol(argv[2], NULL, 0);
> +		if (max_vcpu <= 0)
> +			max_vcpu = DEFAULT_NUM_VCPU;
> +	}
> +
> +	/* Check the extension for binary stats */
> +	ret = kvm_check_cap(KVM_CAP_STATS_BINARY_FD);
> +	TEST_ASSERT(ret >= 0,
> +			"Binary form statistics interface is not supported");
> +
> +	/* Create VMs and VCPUs */
> +	vms = malloc(sizeof(vms[0]) * max_vm);
> +	TEST_ASSERT(vms, "Allocate memory for storing VM pointers");
> +	for (i = 0; i < max_vm; ++i) {
> +		vms[i] = vm_create(VM_MODE_DEFAULT,
> +				DEFAULT_GUEST_PHY_PAGES, O_RDWR);
> +		for (j = 0; j < max_vcpu; ++j)
> +			vm_vcpu_add(vms[i], j);
> +	}
> +
> +	/* Check stats read for every VM and VCPU */
> +	for (i = 0; i < max_vm; ++i) {
> +		vm_stats_test(vms[i]);
> +		for (j = 0; j < max_vcpu; ++j)
> +			vcpu_stats_test(vms[i], j);
> +	}
> +
> +	for (i = 0; i < max_vm; ++i)
> +		kvm_vm_free(vms[i]);
> +	free(vms);
> +	return 0;
> +}
> diff --git a/tools/testing/selftests/kvm/lib/kvm_util.c b/tools/testing/selftests/kvm/lib/kvm_util.c
> index fc83f6c5902d..10385b76fe11 100644
> --- a/tools/testing/selftests/kvm/lib/kvm_util.c
> +++ b/tools/testing/selftests/kvm/lib/kvm_util.c
> @@ -2090,3 +2090,15 @@ unsigned int vm_calc_num_guest_pages(enum vm_guest_mode mode, size_t size)
>   	n = DIV_ROUND_UP(size, vm_guest_mode_params[mode].page_size);
>   	return vm_adjust_num_guest_pages(mode, n);
>   }
> +
> +int vm_get_stats_fd(struct kvm_vm *vm)
> +{
> +	return ioctl(vm->fd, KVM_GET_STATS_FD, NULL);
> +}
> +
> +int vcpu_get_stats_fd(struct kvm_vm *vm, uint32_t vcpuid)
> +{
> +	struct vcpu *vcpu = vcpu_find(vm, vcpuid);
> +
> +	return ioctl(vcpu->fd, KVM_GET_STATS_FD, NULL);
> +}

We don't want to add a test case for testing the fd interface on a 
deleted VM and a deleted VCPU ?

Anyway, for the current content,

Reviewed-by: Krish Sadhukhan <krish.sadhukhan@oracle.com>


^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: [PATCH v7 3/4] KVM: stats: Add documentation for statistics data binary interface
  2021-06-07 19:22   ` Krish Sadhukhan
@ 2021-06-07 22:09     ` Jing Zhang
  0 siblings, 0 replies; 32+ messages in thread
From: Jing Zhang @ 2021-06-07 22:09 UTC (permalink / raw)
  To: Krish Sadhukhan
  Cc: KVM, KVMARM, LinuxMIPS, KVMPPC, LinuxS390, Linuxkselftest,
	Paolo Bonzini, Marc Zyngier, James Morse, Julien Thierry,
	Suzuki K Poulose, Will Deacon, Huacai Chen, Aleksandar Markovic,
	Thomas Bogendoerfer, Paul Mackerras, Christian Borntraeger,
	Janosch Frank, David Hildenbrand, Cornelia Huck,
	Claudio Imbrenda, Sean Christopherson, Vitaly Kuznetsov,
	Jim Mattson, Peter Shier, Oliver Upton, David Rientjes,
	Emanuele Giuseppe Esposito, David Matlack, Ricardo Koller

On Mon, Jun 7, 2021 at 2:23 PM Krish Sadhukhan
<krish.sadhukhan@oracle.com> wrote:
>
>
> On 6/3/21 2:14 PM, Jing Zhang wrote:
> > Update KVM API documentation for binary statistics.
> >
> > Reviewed-by: David Matlack <dmatlack@google.com>
> > Reviewed-by: Ricardo Koller <ricarkol@google.com>
> > Signed-off-by: Jing Zhang <jingzhangos@google.com>
> > ---
> >   Documentation/virt/kvm/api.rst | 180 +++++++++++++++++++++++++++++++++
> >   1 file changed, 180 insertions(+)
> >
> > diff --git a/Documentation/virt/kvm/api.rst b/Documentation/virt/kvm/api.rst
> > index 7fcb2fd38f42..550bfbdf611b 100644
> > --- a/Documentation/virt/kvm/api.rst
> > +++ b/Documentation/virt/kvm/api.rst
> > @@ -5034,6 +5034,178 @@ see KVM_XEN_VCPU_SET_ATTR above.
> >   The KVM_XEN_VCPU_ATTR_TYPE_RUNSTATE_ADJUST type may not be used
> >   with the KVM_XEN_VCPU_GET_ATTR ioctl.
> >
> > +4.130 KVM_GET_STATS_FD
> > +---------------------
> > +
> > +:Capability: KVM_CAP_STATS_BINARY_FD
> > +:Architectures: all
> > +:Type: vm ioctl, vcpu ioctl
> > +:Parameters: none
> > +:Returns: statistics file descriptor on success, < 0 on error
> > +
> > +Errors:
> > +
> > +  ======     ======================================================
> > +  ENOMEM     if the fd could not be created due to lack of memory
> > +  EMFILE     if the number of opened files exceeds the limit
> > +  ======     ======================================================
> > +
> > +The file descriptor can be used to read VM/vCPU statistics data in binary
> > +format. The file data is organized into three blocks as below:
> > ++-------------+
> > +|   Header    |
> > ++-------------+
> > +| Descriptors |
> > ++-------------+
> > +| Stats Data  |
> > ++-------------+
> > +
> > +The Header block is always at the start of the file. It is only needed to be
> > +read one time for the lifetime of the file descriptor.
> > +It is in the form of ``struct kvm_stats_header`` as below::
> > +
> > +     #define KVM_STATS_ID_MAXLEN             64
> > +
> > +     struct kvm_stats_header {
> > +             char id[KVM_STATS_ID_MAXLEN];
> > +             __u32 name_size;
> > +             __u32 count;
> > +             __u32 desc_offset;
> > +             __u32 data_offset;
> > +     };
> > +
> > +The ``id`` field is identification for the corresponding KVM statistics. For
> > +VM statistics, it is in the form of "kvm-{kvm pid}", like "kvm-12345". For
> > +VCPU statistics, it is in the form of "kvm-{kvm pid}/vcpu-{vcpu id}", like
> > +"kvm-12345/vcpu-12".
>
> Currently, KVM debugfs shows VCPUs as "vcpuXX" where is XX is the id.
> Should we follow the same convention ?
It looks more clear to be like vcpu-xx. Let's keep it this way. Thanks.
> > +
> > +The ``name_size`` field is the size (byte) of the statistics name string
> > +(including trailing '\0') appended to the end of every statistics descriptor.
> > +
> > +The ``count`` field is the number of statistics.
> > +
> > +The ``desc_offset`` field is the offset of the Descriptors block from the start
> > +of the file indicated by the file descriptor.
> > +
> > +The ``data_offset`` field is the offset of the Stats Data block from the start
> > +of the file indicated by the file descriptor.
> > +
> > +The Descriptors block is only needed to be read once for the lifetime of the
> > +file descriptor. It is an array of ``struct kvm_stats_desc`` as shown in
> > +below code block::
> > +
> > +     #define KVM_STATS_TYPE_SHIFT            0
> > +     #define KVM_STATS_TYPE_MASK             (0xF << KVM_STATS_TYPE_SHIFT)
> > +     #define KVM_STATS_TYPE_CUMULATIVE       (0x0 << KVM_STATS_TYPE_SHIFT)
> > +     #define KVM_STATS_TYPE_INSTANT          (0x1 << KVM_STATS_TYPE_SHIFT)
> > +     #define KVM_STATS_TYPE_MAX              KVM_STATS_TYPE_INSTANT
> > +
> > +     #define KVM_STATS_UNIT_SHIFT            4
> > +     #define KVM_STATS_UNIT_MASK             (0xF << KVM_STATS_UNIT_SHIFT)
> > +     #define KVM_STATS_UNIT_NONE             (0x0 << KVM_STATS_UNIT_SHIFT)
> > +     #define KVM_STATS_UNIT_BYTES            (0x1 << KVM_STATS_UNIT_SHIFT)
> > +     #define KVM_STATS_UNIT_SECONDS          (0x2 << KVM_STATS_UNIT_SHIFT)
> > +     #define KVM_STATS_UNIT_CYCLES           (0x3 << KVM_STATS_UNIT_SHIFT)
> > +     #define KVM_STATS_UNIT_MAX              KVM_STATS_UNIT_CYCLES
> > +
> > +     #define KVM_STATS_BASE_SHIFT            8
> > +     #define KVM_STATS_BASE_MASK             (0xF << KVM_STATS_BASE_SHIFT)
> > +     #define KVM_STATS_BASE_POW10            (0x0 << KVM_STATS_BASE_SHIFT)
> > +     #define KVM_STATS_BASE_POW2             (0x1 << KVM_STATS_BASE_SHIFT)
> > +     #define KVM_STATS_BASE_MAX              KVM_STATS_BASE_POW2
> > +
> > +     struct kvm_stats_desc {
> > +             __u32 flags;
> > +             __s16 exponent;
> > +             __u16 size;
> > +             __u32 unused1;
> > +             __u32 unused2;
> > +             char name[0];
> > +     };
> > +
> > +The ``flags`` field contains the type and unit of the statistics data described
> > +by this descriptor. The following flags are supported:
> > +
> > +Bits 0-3 of ``flags`` encode the type:
> > +  * ``KVM_STATS_TYPE_CUMULATIVE``
> > +    The statistics data is cumulative. The value of data can only be increased.
> > +    Most of the counters used in KVM are of this type.
> > +    The corresponding ``count`` filed for this type is always 1.
> > +  * ``KVM_STATS_TYPE_INSTANT``
> > +    The statistics data is instantaneous. Its value can be increased or
> > +    decreased. This type is usually used as a measurement of some resources,
> > +    like the number of dirty pages, the number of large pages, etc.
> > +    The corresponding ``count`` field for this type is always 1.
> > +
> > +Bits 4-7 of ``flags`` encode the unit:
> > +  * ``KVM_STATS_UNIT_NONE``
> > +    There is no unit for the value of statistics data. This usually means that
> > +    the value is a simple counter of an event.
> > +  * ``KVM_STATS_UNIT_BYTES``
> > +    It indicates that the statistics data is used to measure memory size, in the
> > +    unit of Byte, KiByte, MiByte, GiByte, etc. The unit of the data is
> > +    determined by the ``exponent`` field in the descriptor. The
> > +    ``KVM_STATS_BASE_POW2`` flag is valid in this case. The unit of the data is
> > +    determined by ``pow(2, exponent)``. For example, if value is 10,
> > +    ``exponent`` is 20, which means the unit of statistics data is MiByte, we
> > +    can get the statistics data in the unit of Byte by
> > +    ``value * pow(2, exponent) = 10 * pow(2, 20) = 10 MiByte`` which is
> > +    10 * 1024 * 1024 Bytes.
> > +  * ``KVM_STATS_UNIT_SECONDS``
> > +    It indicates that the statistics data is used to measure time/latency, in
> > +    the unit of nanosecond, microsecond, millisecond and second. The unit of the
> > +    data is determined by the ``exponent`` field in the descriptor. The
> > +    ``KVM_STATS_BASE_POW10`` flag is valid in this case. The unit of the data
> > +    is determined by ``pow(10, exponent)``. For example, if value is 2000000,
> > +    ``exponent`` is -6, which means the unit of statistics data is microsecond,
> > +    we can get the statistics data in the unit of second by
> > +    ``value * pow(10, exponent) = 2000000 * pow(10, -6) = 2 seconds``.
> > +  * ``KVM_STATS_UNIT_CYCLES``
> > +    It indicates that the statistics data is used to measure CPU clock cycles.
> > +    The ``KVM_STATS_BASE_POW10`` flag is valid in this case. For example, if
> > +    value is 200, ``exponent`` is 4, we can get the number of CPU clock cycles
> > +    by ``value * pow(10, exponent) = 200 * pow(10, 4) = 2000000``.
> > +
> > +Bits 7-11 of ``flags`` encode the base:
> > +  * ``KVM_STATS_BASE_POW10``
> > +    The scale is based on power of 10. It is used for measurement of time and
> > +    CPU clock cycles.
> > +  * ``KVM_STATS_BASE_POW2``
> > +    The scale is based on power of 2. It is used for measurement of memory size.
> > +
> > +The ``exponent`` field is the scale of corresponding statistics data. For
> > +example, if the unit is ``KVM_STATS_UNIT_BYTES``, the base is
> > +``KVM_STATS_BASE_POW2``, the ``exponent`` is 10, then we know that the real
> > +unit of the statistics data is KBytes a.k.a pow(2, 10) = 1024 bytes.
> > +
> > +The ``size`` field is the number of values of this statistics data. It is in the
> > +unit of ``unsigned long`` for VM or ``__u64`` for VCPU.
> > +
> > +The ``unused1`` and ``unused2`` fields are reserved for future
> > +support for other types of statistics data, like log/linear histogram.
> > +
> > +The ``name`` field points to the name string of the statistics data. The name
> > +string starts at the end of ``struct kvm_stats_desc``.
> > +The maximum length (including trailing '\0') is indicated by ``name_size``
> > +in ``struct kvm_stats_header``.
> > +
> > +The Stats Data block contains an array of data values of type ``struct
> > +kvm_vm_stats_data`` or ``struct kvm_vcpu_stats_data``. It would be read by
> > +user space periodically to pull statistics data.
> > +The order of data value in Stats Data block is the same as the order of
> > +descriptors in Descriptors block.
> > +  * Statistics data for VM::
> > +
> > +     struct kvm_vm_stats_data {
> > +             unsigned long value[0];
> > +     };
> > +
> > +  * Statistics data for VCPU::
> > +
> > +     struct kvm_vcpu_stats_data {
> > +             __u64 value[0];
> > +     };
> > +
> >   5. The kvm_run structure
> >   ========================
> >
> > @@ -6891,3 +7063,11 @@ This capability is always enabled.
> >   This capability indicates that the KVM virtual PTP service is
> >   supported in the host. A VMM can check whether the service is
> >   available to the guest on migration.
> > +
> > +8.33 KVM_CAP_STATS_BINARY_FD
> > +----------------------------
> > +
> > +:Architectures: all
> > +
> > +This capability indicates the feature that user space can create get a file
> > +descriptor for every VM and VCPU to read statistics data in binary format.
>
>
> Reviewed-by: Krish Sadhukhan <krish.sadhukhan@oracle.com>
>

^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: [PATCH v7 2/4] KVM: stats: Add fd-based API to read binary stats data
  2021-06-07 19:22   ` Krish Sadhukhan
@ 2021-06-07 22:10     ` Jing Zhang
  0 siblings, 0 replies; 32+ messages in thread
From: Jing Zhang @ 2021-06-07 22:10 UTC (permalink / raw)
  To: Krish Sadhukhan
  Cc: KVM, KVMARM, LinuxMIPS, KVMPPC, LinuxS390, Linuxkselftest,
	Paolo Bonzini, Marc Zyngier, James Morse, Julien Thierry,
	Suzuki K Poulose, Will Deacon, Huacai Chen, Aleksandar Markovic,
	Thomas Bogendoerfer, Paul Mackerras, Christian Borntraeger,
	Janosch Frank, David Hildenbrand, Cornelia Huck,
	Claudio Imbrenda, Sean Christopherson, Vitaly Kuznetsov,
	Jim Mattson, Peter Shier, Oliver Upton, David Rientjes,
	Emanuele Giuseppe Esposito, David Matlack, Ricardo Koller

On Mon, Jun 7, 2021 at 2:23 PM Krish Sadhukhan
<krish.sadhukhan@oracle.com> wrote:
>
>
> On 6/3/21 2:14 PM, Jing Zhang wrote:
> > Provides a file descriptor per VM to read VM stats info/data.
> > Provides a file descriptor per vCPU to read vCPU stats info/data.
> >
> > Reviewed-by: David Matlack <dmatlack@google.com>
> > Reviewed-by: Ricardo Koller <ricarkol@google.com>
> > Signed-off-by: Jing Zhang <jingzhangos@google.com>
> > ---
> >   arch/arm64/kvm/guest.c    |  26 +++++++
> >   arch/mips/kvm/mips.c      |  52 +++++++++++++
> >   arch/powerpc/kvm/book3s.c |  52 +++++++++++++
> >   arch/powerpc/kvm/booke.c  |  45 +++++++++++
> >   arch/s390/kvm/kvm-s390.c  | 117 ++++++++++++++++++++++++++++
> >   arch/x86/kvm/x86.c        |  53 +++++++++++++
> >   include/linux/kvm_host.h  | 132 ++++++++++++++++++++++++++++++++
> >   include/uapi/linux/kvm.h  |  50 ++++++++++++
> >   virt/kvm/kvm_main.c       | 155 ++++++++++++++++++++++++++++++++++++++
> >   9 files changed, 682 insertions(+)
> >
> > diff --git a/arch/arm64/kvm/guest.c b/arch/arm64/kvm/guest.c
> > index 4962331d01e6..c68addd38cf8 100644
> > --- a/arch/arm64/kvm/guest.c
> > +++ b/arch/arm64/kvm/guest.c
> > @@ -28,6 +28,32 @@
> >
> >   #include "trace.h"
> >
> > +struct _kvm_stats_desc kvm_vm_stats_desc[] = DEFINE_VM_STATS_DESC();
> > +
> > +struct _kvm_stats_header kvm_vm_stats_header = {
> > +     .name_size = KVM_STATS_NAME_LEN,
> > +     .count = ARRAY_SIZE(kvm_vm_stats_desc),
> > +     .desc_offset = sizeof(struct kvm_stats_header),
> > +     .data_offset = sizeof(struct kvm_stats_header) +
> > +             sizeof(kvm_vm_stats_desc),
> > +};
> > +
> > +struct _kvm_stats_desc kvm_vcpu_stats_desc[] = DEFINE_VCPU_STATS_DESC(
> > +     STATS_DESC_COUNTER("hvc_exit_stat"),
> > +     STATS_DESC_COUNTER("wfe_exit_stat"),
> > +     STATS_DESC_COUNTER("wfi_exit_stat"),
> > +     STATS_DESC_COUNTER("mmio_exit_user"),
> > +     STATS_DESC_COUNTER("mmio_exit_kernel"),
> > +     STATS_DESC_COUNTER("exits"));
> > +
> > +struct _kvm_stats_header kvm_vcpu_stats_header = {
> > +     .name_size = KVM_STATS_NAME_LEN,
> > +     .count = ARRAY_SIZE(kvm_vcpu_stats_desc),
> > +     .desc_offset = sizeof(struct kvm_stats_header),
> > +     .data_offset = sizeof(struct kvm_stats_header) +
> > +             sizeof(kvm_vcpu_stats_desc),
> > +};
> > +
> >   struct kvm_stats_debugfs_item debugfs_entries[] = {
> >       VCPU_STAT_GENERIC("halt_successful_poll", halt_successful_poll),
> >       VCPU_STAT_GENERIC("halt_attempted_poll", halt_attempted_poll),
> > diff --git a/arch/mips/kvm/mips.c b/arch/mips/kvm/mips.c
> > index ff205b35719b..eb8fd4a96952 100644
> > --- a/arch/mips/kvm/mips.c
> > +++ b/arch/mips/kvm/mips.c
> > @@ -38,6 +38,58 @@
> >   #define VECTORSPACING 0x100 /* for EI/VI mode */
> >   #endif
> >
> > +struct _kvm_stats_desc kvm_vm_stats_desc[] = DEFINE_VM_STATS_DESC();
> > +
> > +struct _kvm_stats_header kvm_vm_stats_header = {
> > +     .name_size = KVM_STATS_NAME_LEN,
> > +     .count = ARRAY_SIZE(kvm_vm_stats_desc),
> > +     .desc_offset = sizeof(struct kvm_stats_header),
> > +     .data_offset = sizeof(struct kvm_stats_header) +
> > +             sizeof(kvm_vm_stats_desc),
> > +};
> > +
> > +struct _kvm_stats_desc kvm_vcpu_stats_desc[] = DEFINE_VCPU_STATS_DESC(
> > +     STATS_DESC_COUNTER("wait_exits"),
> > +     STATS_DESC_COUNTER("cache_exits"),
> > +     STATS_DESC_COUNTER("signal_exits"),
> > +     STATS_DESC_COUNTER("int_exits"),
> > +     STATS_DESC_COUNTER("cop_unusable_exits"),
> > +     STATS_DESC_COUNTER("tlbmod_exits"),
> > +     STATS_DESC_COUNTER("tlbmiss_ld_exits"),
> > +     STATS_DESC_COUNTER("tlbmiss_st_exits"),
> > +     STATS_DESC_COUNTER("addrerr_st_exits"),
> > +     STATS_DESC_COUNTER("addrerr_ld_exits"),
> > +     STATS_DESC_COUNTER("syscall_exits"),
> > +     STATS_DESC_COUNTER("resvd_inst_exits"),
> > +     STATS_DESC_COUNTER("break_inst_exits"),
> > +     STATS_DESC_COUNTER("trap_inst_exits"),
> > +     STATS_DESC_COUNTER("msa_fpe_exits"),
> > +     STATS_DESC_COUNTER("fpe_exits"),
> > +     STATS_DESC_COUNTER("msa_disabled_exits"),
> > +     STATS_DESC_COUNTER("flush_dcache_exits"),
> > +#ifdef CONFIG_KVM_MIPS_VZ
> > +     STATS_DESC_COUNTER("vz_gpsi_exits"),
> > +     STATS_DESC_COUNTER("vz_gsfc_exits"),
> > +     STATS_DESC_COUNTER("vz_hc_exits"),
> > +     STATS_DESC_COUNTER("vz_grr_exits"),
> > +     STATS_DESC_COUNTER("vz_gva_exits"),
> > +     STATS_DESC_COUNTER("vz_ghfc_exits"),
> > +     STATS_DESC_COUNTER("vz_gpa_exits"),
> > +     STATS_DESC_COUNTER("vz_resvd_exits"),
> > +#ifdef CONFIG_CPU_LOONGSON64
> > +     STATS_DESC_COUNTER("vz_cpucfg_exits"),
> > +#endif
> > +#endif
> > +     );
> > +
> > +struct _kvm_stats_header kvm_vcpu_stats_header = {
> > +     .name_size = KVM_STATS_NAME_LEN,
> > +     .count = ARRAY_SIZE(kvm_vcpu_stats_desc),
> > +     .desc_offset = sizeof(struct kvm_stats_header),
> > +     .data_offset = sizeof(struct kvm_stats_header) +
> > +             sizeof(kvm_vcpu_stats_desc),
> > +};
> > +
> >   struct kvm_stats_debugfs_item debugfs_entries[] = {
> >       VCPU_STAT("wait", wait_exits),
> >       VCPU_STAT("cache", cache_exits),
> > diff --git a/arch/powerpc/kvm/book3s.c b/arch/powerpc/kvm/book3s.c
> > index 92cdb4175945..53fb1da049a7 100644
> > --- a/arch/powerpc/kvm/book3s.c
> > +++ b/arch/powerpc/kvm/book3s.c
> > @@ -38,6 +38,58 @@
> >
> >   /* #define EXIT_DEBUG */
> >
> > +struct _kvm_stats_desc kvm_vm_stats_desc[] = DEFINE_VM_STATS_DESC(
> > +     STATS_DESC_ICOUNTER("num_2M_pages"),
> > +     STATS_DESC_ICOUNTER("num_1G_pages"));
> > +
> > +struct _kvm_stats_header kvm_vm_stats_header = {
> > +     .name_size = KVM_STATS_NAME_LEN,
> > +     .count = ARRAY_SIZE(kvm_vm_stats_desc),
> > +     .desc_offset = sizeof(struct kvm_stats_header),
> > +     .data_offset = sizeof(struct kvm_stats_header) +
> > +             sizeof(kvm_vm_stats_desc),
> > +};
> > +
> > +struct _kvm_stats_desc kvm_vcpu_stats_desc[] = DEFINE_VCPU_STATS_DESC(
> > +     STATS_DESC_COUNTER("sum_exits"),
> > +     STATS_DESC_COUNTER("mmio_exits"),
> > +     STATS_DESC_COUNTER("signal_exits"),
> > +     STATS_DESC_COUNTER("light_exits"),
> > +     STATS_DESC_COUNTER("itlb_real_miss_exits"),
> > +     STATS_DESC_COUNTER("itlb_virt_miss_exits"),
> > +     STATS_DESC_COUNTER("dtlb_real_miss_exits"),
> > +     STATS_DESC_COUNTER("dtlb_virt_miss_exits"),
> > +     STATS_DESC_COUNTER("syscall_exits"),
> > +     STATS_DESC_COUNTER("isi_exits"),
> > +     STATS_DESC_COUNTER("dsi_exits"),
> > +     STATS_DESC_COUNTER("emulated_inst_exits"),
> > +     STATS_DESC_COUNTER("dec_exits"),
> > +     STATS_DESC_COUNTER("ext_intr_exits"),
> > +     STATS_DESC_TIME_NSEC("halt_wait_ns"),
> > +     STATS_DESC_COUNTER("halt_successful_wait"),
> > +     STATS_DESC_COUNTER("dbell_exits"),
> > +     STATS_DESC_COUNTER("gdbell_exits"),
> > +     STATS_DESC_COUNTER("ld"),
> > +     STATS_DESC_COUNTER("st"),
> > +     STATS_DESC_COUNTER("pf_storage"),
> > +     STATS_DESC_COUNTER("pf_instruc"),
> > +     STATS_DESC_COUNTER("sp_storage"),
> > +     STATS_DESC_COUNTER("sp_instruc"),
> > +     STATS_DESC_COUNTER("queue_intr"),
> > +     STATS_DESC_COUNTER("ld_slow"),
> > +     STATS_DESC_COUNTER("st_slow"),
> > +     STATS_DESC_COUNTER("pthru_all"),
> > +     STATS_DESC_COUNTER("pthru_host"),
> > +     STATS_DESC_COUNTER("pthru_bad_aff"));
> > +
> > +struct _kvm_stats_header kvm_vcpu_stats_header = {
> > +     .name_size = KVM_STATS_NAME_LEN,
> > +     .count = ARRAY_SIZE(kvm_vcpu_stats_desc),
> > +     .desc_offset = sizeof(struct kvm_stats_header),
> > +     .data_offset = sizeof(struct kvm_stats_header) +
> > +             sizeof(kvm_vcpu_stats_desc),
> > +};
> > +
> >   struct kvm_stats_debugfs_item debugfs_entries[] = {
> >       VCPU_STAT("exits", sum_exits),
> >       VCPU_STAT("mmio", mmio_exits),
> > diff --git a/arch/powerpc/kvm/booke.c b/arch/powerpc/kvm/booke.c
> > index 80d3b39aa7ac..02f1e147d224 100644
> > --- a/arch/powerpc/kvm/booke.c
> > +++ b/arch/powerpc/kvm/booke.c
> > @@ -36,6 +36,51 @@
> >
> >   unsigned long kvmppc_booke_handlers;
> >
> > +struct _kvm_stats_desc kvm_vm_stats_desc[] = DEFINE_VM_STATS_DESC(
> > +     STATS_DESC_ICOUNTER("num_2M_pages"),
> > +     STATS_DESC_ICOUNTER("num_1G_pages"));
> > +
> > +struct _kvm_stats_header kvm_vm_stats_header = {
> > +     .name_size = KVM_STATS_NAME_LEN,
> > +     .count = ARRAY_SIZE(kvm_vm_stats_desc),
> > +     .desc_offset = sizeof(struct kvm_stats_header),
> > +     .data_offset = sizeof(struct kvm_stats_header) +
> > +             sizeof(kvm_vm_stats_desc),
> > +};
> > +
> > +struct _kvm_stats_desc kvm_vcpu_stats_desc[] = DEFINE_VCPU_STATS_DESC(
> > +     STATS_DESC_COUNTER("sum_exits"),
> > +     STATS_DESC_COUNTER("mmio_exits"),
> > +     STATS_DESC_COUNTER("signal_exits"),
> > +     STATS_DESC_COUNTER("light_exits"),
> > +     STATS_DESC_COUNTER("itlb_real_miss_exits"),
> > +     STATS_DESC_COUNTER("itlb_virt_miss_exits"),
> > +     STATS_DESC_COUNTER("dtlb_real_miss_exits"),
> > +     STATS_DESC_COUNTER("dtlb_virt_miss_exits"),
> > +     STATS_DESC_COUNTER("syscall_exits"),
> > +     STATS_DESC_COUNTER("isi_exits"),
> > +     STATS_DESC_COUNTER("dsi_exits"),
> > +     STATS_DESC_COUNTER("emulated_inst_exits"),
> > +     STATS_DESC_COUNTER("dec_exits"),
> > +     STATS_DESC_COUNTER("ext_intr_exits"),
> > +     STATS_DESC_TIME_NSEC("halt_wait_ns"),
> > +     STATS_DESC_COUNTER("halt_successful_wait"),
> > +     STATS_DESC_COUNTER("dbell_exits"),
> > +     STATS_DESC_COUNTER("gdbell_exits"),
> > +     STATS_DESC_COUNTER("ld"),
> > +     STATS_DESC_COUNTER("st"),
> > +     STATS_DESC_COUNTER("pthru_all"),
> > +     STATS_DESC_COUNTER("pthru_host"),
> > +     STATS_DESC_COUNTER("pthru_bad_aff"));
> > +
> > +struct _kvm_stats_header kvm_vcpu_stats_header = {
> > +     .name_size = KVM_STATS_NAME_LEN,
> > +     .count = ARRAY_SIZE(kvm_vcpu_stats_desc),
> > +     .desc_offset = sizeof(struct kvm_stats_header),
> > +     .data_offset = sizeof(struct kvm_stats_header) +
> > +             sizeof(kvm_vcpu_stats_desc),
> > +};
> > +
> >   struct kvm_stats_debugfs_item debugfs_entries[] = {
> >       VCPU_STAT("mmio", mmio_exits),
> >       VCPU_STAT("sig", signal_exits),
> > diff --git a/arch/s390/kvm/kvm-s390.c b/arch/s390/kvm/kvm-s390.c
> > index e8bc7cd06794..4e2141bee3bc 100644
> > --- a/arch/s390/kvm/kvm-s390.c
> > +++ b/arch/s390/kvm/kvm-s390.c
> > @@ -58,6 +58,123 @@
> >   #define VCPU_IRQS_MAX_BUF (sizeof(struct kvm_s390_irq) * \
> >                          (KVM_MAX_VCPUS + LOCAL_IRQS))
> >
> > +struct _kvm_stats_desc kvm_vm_stats_desc[] = DEFINE_VM_STATS_DESC(
> > +     STATS_DESC_COUNTER("inject_io"),
> > +     STATS_DESC_COUNTER("inject_float_mchk"),
> > +     STATS_DESC_COUNTER("inject_pfault_done"),
> > +     STATS_DESC_COUNTER("inject_service_signal"),
> > +     STATS_DESC_COUNTER("inject_virtio"));
> > +
> > +struct _kvm_stats_header kvm_vm_stats_header = {
> > +     .name_size = KVM_STATS_NAME_LEN,
> > +     .count = ARRAY_SIZE(kvm_vm_stats_desc),
> > +     .desc_offset = sizeof(struct kvm_stats_header),
> > +     .data_offset = sizeof(struct kvm_stats_header) +
> > +             sizeof(kvm_vm_stats_desc),
> > +};
> > +
> > +struct _kvm_stats_desc kvm_vcpu_stats_desc[] = DEFINE_VCPU_STATS_DESC(
> > +     STATS_DESC_COUNTER("exit_userspace"),
> > +     STATS_DESC_COUNTER("exit_null"),
> > +     STATS_DESC_COUNTER("exit_external_request"),
> > +     STATS_DESC_COUNTER("exit_io_request"),
> > +     STATS_DESC_COUNTER("exit_external_interrupt"),
> > +     STATS_DESC_COUNTER("exit_stop_request"),
> > +     STATS_DESC_COUNTER("exit_validity"),
> > +     STATS_DESC_COUNTER("exit_instruction"),
> > +     STATS_DESC_COUNTER("exit_pei"),
> > +     STATS_DESC_COUNTER("halt_no_poll_steal"),
> > +     STATS_DESC_COUNTER("instruction_lctl"),
> > +     STATS_DESC_COUNTER("instruction_lctlg"),
> > +     STATS_DESC_COUNTER("instruction_stctl"),
> > +     STATS_DESC_COUNTER("instruction_stctg"),
> > +     STATS_DESC_COUNTER("exit_program_interruption"),
> > +     STATS_DESC_COUNTER("exit_instr_and_program"),
> > +     STATS_DESC_COUNTER("exit_operation_exception"),
> > +     STATS_DESC_COUNTER("deliver_ckc"),
> > +     STATS_DESC_COUNTER("deliver_cputm"),
> > +     STATS_DESC_COUNTER("deliver_external_call"),
> > +     STATS_DESC_COUNTER("deliver_emergency_signal"),
> > +     STATS_DESC_COUNTER("deliver_service_signal"),
> > +     STATS_DESC_COUNTER("deliver_virtio"),
> > +     STATS_DESC_COUNTER("deliver_stop_signal"),
> > +     STATS_DESC_COUNTER("deliver_prefix_signal"),
> > +     STATS_DESC_COUNTER("deliver_restart_signal"),
> > +     STATS_DESC_COUNTER("deliver_program"),
> > +     STATS_DESC_COUNTER("deliver_io"),
> > +     STATS_DESC_COUNTER("deliver_machine_check"),
> > +     STATS_DESC_COUNTER("exit_wait_state"),
> > +     STATS_DESC_COUNTER("inject_ckc"),
> > +     STATS_DESC_COUNTER("inject_cputm"),
> > +     STATS_DESC_COUNTER("inject_external_call"),
> > +     STATS_DESC_COUNTER("inject_emergency_signal"),
> > +     STATS_DESC_COUNTER("inject_mchk"),
> > +     STATS_DESC_COUNTER("inject_pfault_init"),
> > +     STATS_DESC_COUNTER("inject_program"),
> > +     STATS_DESC_COUNTER("inject_restart"),
> > +     STATS_DESC_COUNTER("inject_set_prefix"),
> > +     STATS_DESC_COUNTER("inject_stop_signal"),
> > +     STATS_DESC_COUNTER("instruction_epsw"),
> > +     STATS_DESC_COUNTER("instruction_gs"),
> > +     STATS_DESC_COUNTER("instruction_io_other"),
> > +     STATS_DESC_COUNTER("instruction_lpsw"),
> > +     STATS_DESC_COUNTER("instruction_lpswe"),
> > +     STATS_DESC_COUNTER("instruction_pfmf"),
> > +     STATS_DESC_COUNTER("instruction_ptff"),
> > +     STATS_DESC_COUNTER("instruction_sck"),
> > +     STATS_DESC_COUNTER("instruction_sckpf"),
> > +     STATS_DESC_COUNTER("instruction_stidp"),
> > +     STATS_DESC_COUNTER("instruction_spx"),
> > +     STATS_DESC_COUNTER("instruction_stpx"),
> > +     STATS_DESC_COUNTER("instruction_stap"),
> > +     STATS_DESC_COUNTER("instruction_iske"),
> > +     STATS_DESC_COUNTER("instruction_ri"),
> > +     STATS_DESC_COUNTER("instruction_rrbe"),
> > +     STATS_DESC_COUNTER("instruction_sske"),
> > +     STATS_DESC_COUNTER("instruction_ipte_interlock"),
> > +     STATS_DESC_COUNTER("instruction_stsi"),
> > +     STATS_DESC_COUNTER("instruction_stfl"),
> > +     STATS_DESC_COUNTER("instruction_tb"),
> > +     STATS_DESC_COUNTER("instruction_tpi"),
> > +     STATS_DESC_COUNTER("instruction_tprot"),
> > +     STATS_DESC_COUNTER("instruction_tsch"),
> > +     STATS_DESC_COUNTER("instruction_sie"),
> > +     STATS_DESC_COUNTER("instruction_essa"),
> > +     STATS_DESC_COUNTER("instruction_sthyi"),
> > +     STATS_DESC_COUNTER("instruction_sigp_sense"),
> > +     STATS_DESC_COUNTER("instruction_sigp_sense_running"),
> > +     STATS_DESC_COUNTER("instruction_sigp_external_call"),
> > +     STATS_DESC_COUNTER("instruction_sigp_emergency"),
> > +     STATS_DESC_COUNTER("instruction_sigp_cond_emergency"),
> > +     STATS_DESC_COUNTER("instruction_sigp_start"),
> > +     STATS_DESC_COUNTER("instruction_sigp_stop"),
> > +     STATS_DESC_COUNTER("instruction_sigp_stop_store_status"),
> > +     STATS_DESC_COUNTER("instruction_sigp_store_status"),
> > +     STATS_DESC_COUNTER("instruction_sigp_store_adtl_status"),
> > +     STATS_DESC_COUNTER("instruction_sigp_arch"),
> > +     STATS_DESC_COUNTER("instruction_sigp_prefix"),
> > +     STATS_DESC_COUNTER("instruction_sigp_restart"),
> > +     STATS_DESC_COUNTER("instruction_sigp_init_cpu_reset"),
> > +     STATS_DESC_COUNTER("instruction_sigp_cpu_reset"),
> > +     STATS_DESC_COUNTER("instruction_sigp_unknown"),
> > +     STATS_DESC_COUNTER("diagnose_10"),
> > +     STATS_DESC_COUNTER("diagnose_44"),
> > +     STATS_DESC_COUNTER("diagnose_9c"),
> > +     STATS_DESC_COUNTER("diagnose_9c_ignored"),
> > +     STATS_DESC_COUNTER("diagnose_258"),
> > +     STATS_DESC_COUNTER("diagnose_308"),
> > +     STATS_DESC_COUNTER("diagnose_500"),
> > +     STATS_DESC_COUNTER("diagnose_other"),
> > +     STATS_DESC_COUNTER("pfault_sync"));
> > +
> > +struct _kvm_stats_header kvm_vcpu_stats_header = {
> > +     .name_size = KVM_STATS_NAME_LEN,
> > +     .count = ARRAY_SIZE(kvm_vcpu_stats_desc),
> > +     .desc_offset = sizeof(struct kvm_stats_header),
> > +     .data_offset = sizeof(struct kvm_stats_header) +
> > +             sizeof(kvm_vcpu_stats_desc),
> > +};
> > +
> >   struct kvm_stats_debugfs_item debugfs_entries[] = {
> >       VCPU_STAT("userspace_handled", exit_userspace),
> >       VCPU_STAT("exit_null", exit_null),
> > diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c
> > index 96d10253218a..8e900c482626 100644
> > --- a/arch/x86/kvm/x86.c
> > +++ b/arch/x86/kvm/x86.c
> > @@ -214,6 +214,59 @@ EXPORT_SYMBOL_GPL(host_xss);
> >   u64 __read_mostly supported_xss;
> >   EXPORT_SYMBOL_GPL(supported_xss);
> >
> > +struct _kvm_stats_desc kvm_vm_stats_desc[] = DEFINE_VM_STATS_DESC(
> > +     STATS_DESC_COUNTER("mmu_shadow_zapped"),
> > +     STATS_DESC_COUNTER("mmu_pte_write"),
> > +     STATS_DESC_COUNTER("mmu_pde_zapped"),
> > +     STATS_DESC_COUNTER("mmu_flooded"),
> > +     STATS_DESC_COUNTER("mmu_recycled"),
> > +     STATS_DESC_COUNTER("mmu_cache_miss"),
> > +     STATS_DESC_ICOUNTER("mmu_unsync"),
> > +     STATS_DESC_ICOUNTER("largepages"),
> > +     STATS_DESC_ICOUNTER("nx_largepages_splits"),
> > +     STATS_DESC_ICOUNTER("max_mmu_page_hash_collisions"));
> > +
> > +struct _kvm_stats_header kvm_vm_stats_header = {
> > +     .name_size = KVM_STATS_NAME_LEN,
> > +     .count = ARRAY_SIZE(kvm_vm_stats_desc),
> > +     .desc_offset = sizeof(struct kvm_stats_header),
> > +     .data_offset = sizeof(struct kvm_stats_header) +
> > +             sizeof(kvm_vm_stats_desc),
> > +};
> > +
> > +struct _kvm_stats_desc kvm_vcpu_stats_desc[] = DEFINE_VCPU_STATS_DESC(
> > +     STATS_DESC_COUNTER("pf_fixed"),
> > +     STATS_DESC_COUNTER("pf_guest"),
> > +     STATS_DESC_COUNTER("tlb_flush"),
> > +     STATS_DESC_COUNTER("invlpg"),
> > +     STATS_DESC_COUNTER("exits"),
> > +     STATS_DESC_COUNTER("io_exits"),
> > +     STATS_DESC_COUNTER("mmio_exits"),
> > +     STATS_DESC_COUNTER("signal_exits"),
> > +     STATS_DESC_COUNTER("irq_window_exits"),
> > +     STATS_DESC_COUNTER("nmi_window_exits"),
> > +     STATS_DESC_COUNTER("l1d_flush"),
> > +     STATS_DESC_COUNTER("halt_exits"),
> > +     STATS_DESC_COUNTER("request_irq_exits"),
> > +     STATS_DESC_COUNTER("irq_exits"),
> > +     STATS_DESC_COUNTER("host_state_reload"),
> > +     STATS_DESC_COUNTER("fpu_reload"),
> > +     STATS_DESC_COUNTER("insn_emulation"),
> > +     STATS_DESC_COUNTER("insn_emulation_fail"),
> > +     STATS_DESC_COUNTER("hypercalls"),
> > +     STATS_DESC_COUNTER("irq_injections"),
> > +     STATS_DESC_COUNTER("nmi_injections"),
> > +     STATS_DESC_COUNTER("req_event"),
> > +     STATS_DESC_COUNTER("nested_run"));
> > +
> > +struct _kvm_stats_header kvm_vcpu_stats_header = {
> > +     .name_size = KVM_STATS_NAME_LEN,
> > +     .count = ARRAY_SIZE(kvm_vcpu_stats_desc),
> > +     .desc_offset = sizeof(struct kvm_stats_header),
> > +     .data_offset = sizeof(struct kvm_stats_header) +
> > +             sizeof(kvm_vcpu_stats_desc),
> > +};
> > +
> >   struct kvm_stats_debugfs_item debugfs_entries[] = {
> >       VCPU_STAT("pf_fixed", pf_fixed),
> >       VCPU_STAT("pf_guest", pf_guest),
> > diff --git a/include/linux/kvm_host.h b/include/linux/kvm_host.h
> > index 1870fa928762..873e21af36be 100644
> > --- a/include/linux/kvm_host.h
> > +++ b/include/linux/kvm_host.h
> > @@ -1240,6 +1240,19 @@ struct kvm_stats_debugfs_item {
> >       int mode;
> >   };
> >
> > +struct _kvm_stats_header {
> > +     __u32 name_size;
> > +     __u32 count;
> > +     __u32 desc_offset;
> > +     __u32 data_offset;
> > +};
> > +
> > +#define KVM_STATS_NAME_LEN   48
> > +struct _kvm_stats_desc {
> > +     struct kvm_stats_desc desc;
> > +     char name[KVM_STATS_NAME_LEN];
> > +};
> > +
> >   #define KVM_DBGFS_GET_MODE(dbgfs_item)                                         \
> >       ((dbgfs_item)->mode ? (dbgfs_item)->mode : 0644)
> >
> > @@ -1253,8 +1266,127 @@ struct kvm_stats_debugfs_item {
> >       { n, offsetof(struct kvm_vcpu, stat.generic.x),                        \
> >         KVM_STAT_VCPU, ## __VA_ARGS__ }
> >
> > +#define STATS_DESC(stat, type, unit, scale, exp)                            \
> > +     {                                                                      \
> > +             {                                                              \
> > +                     .flags = type | unit | scale,                          \
> > +                     .exponent = exp,                                       \
> > +                     .size = 1                                              \
> > +             },                                                             \
> > +             .name = stat,                                                  \
> > +     }
> > +#define STATS_DESC_CUMULATIVE(name, unit, scale, exponent)                  \
> > +     STATS_DESC(name, KVM_STATS_TYPE_CUMULATIVE, unit, scale, exponent)
> > +#define STATS_DESC_INSTANT(name, unit, scale, exponent)                             \
> > +     STATS_DESC(name, KVM_STATS_TYPE_INSTANT, unit, scale, exponent)
> > +
> > +/* Cumulative counter */
> > +#define STATS_DESC_COUNTER(name)                                            \
> > +     STATS_DESC_CUMULATIVE(name, KVM_STATS_UNIT_NONE,                       \
> > +             KVM_STATS_BASE_POW10, 0)
> > +/* Instantaneous counter */
> > +#define STATS_DESC_ICOUNTER(name)                                           \
> > +     STATS_DESC_INSTANT(name, KVM_STATS_UNIT_NONE,                          \
> > +             KVM_STATS_BASE_POW10, 0)
> > +
> > +/* Cumulative clock cycles */
> > +#define STATS_DESC_CYCLE(name)                                                      \
> > +     STATS_DESC_CUMULATIVE(name, KVM_STATS_UNIT_CYCLES,                     \
> > +             KVM_STATS_BASE_POW10, 0)
> > +/* Instantaneous clock cycles */
> > +#define STATS_DESC_ICYCLE(name)                                                     \
> > +     STATS_DESC_INSTANT(name, KVM_STATS_UNIT_CYCLES,                        \
> > +             KVM_STATS_BASE_POW10, 0)
> > +
> > +/* Cumulative memory size in Byte */
> > +#define STATS_DESC_SIZE_BYTE(name)                                          \
> > +     STATS_DESC_CUMULATIVE(name, KVM_STATS_UNIT_BYTES,                      \
> > +             KVM_STATS_BASE_POW2, 0)
> > +/* Cumulative memory size in KiByte */
> > +#define STATS_DESC_SIZE_KBYTE(name)                                         \
> > +     STATS_DESC_CUMULATIVE(name, KVM_STATS_UNIT_BYTES,                      \
> > +             KVM_STATS_BASE_POW2, 10)
> > +/* Cumulative memory size in MiByte */
> > +#define STATS_DESC_SIZE_MBYTE(name)                                         \
> > +     STATS_DESC_CUMULATIVE(name, KVM_STATS_UNIT_BYTES,                      \
> > +             KVM_STATS_BASE_POW2, 20)
> > +/* Cumulative memory size in GiByte */
> > +#define STATS_DESC_SIZE_GBYTE(name)                                         \
> > +     STATS_DESC_CUMULATIVE(name, KVM_STATS_UNIT_BYTES,                      \
> > +             KVM_STATS_BASE_POW2, 30)
> > +
> > +/* Instantaneous memory size in Byte */
> > +#define STATS_DESC_ISIZE_BYTE(name)                                         \
> > +     STATS_DESC_INSTANT(name, KVM_STATS_UNIT_BYTES,                         \
> > +             KVM_STATS_BASE_POW2, 0)
> > +/* Instantaneous memory size in KiByte */
> > +#define STATS_DESC_ISIZE_KBYTE(name)                                        \
> > +     STATS_DESC_INSTANT(name, KVM_STATS_UNIT_BYTES,                         \
> > +             KVM_STATS_BASE_POW2, 10)
> > +/* Instantaneous memory size in MiByte */
> > +#define STATS_DESC_ISIZE_MBYTE(name)                                        \
> > +     STATS_DESC_INSTANT(name, KVM_STATS_UNIT_BYTES,                         \
> > +             KVM_STATS_BASE_POW2, 20)
> > +/* Instantaneous memory size in GiByte */
> > +#define STATS_DESC_ISIZE_GBYTE(name)                                        \
> > +     STATS_DESC_INSTANT(name, KVM_STATS_UNIT_BYTES,                         \
> > +             KVM_STATS_BASE_POW2, 30)
> > +
> > +/* Cumulative time in second */
> > +#define STATS_DESC_TIME_SEC(name)                                           \
> > +     STATS_DESC_CUMULATIVE(name, KVM_STATS_UNIT_SECONDS,                    \
> > +             KVM_STATS_BASE_POW10, 0)
> > +/* Cumulative time in millisecond */
> > +#define STATS_DESC_TIME_MSEC(name)                                          \
> > +     STATS_DESC_CUMULATIVE(name, KVM_STATS_UNIT_SECONDS,                    \
> > +             KVM_STATS_BASE_POW10, -3)
> > +/* Cumulative time in microsecond */
> > +#define STATS_DESC_TIME_USEC(name)                                          \
> > +     STATS_DESC_CUMULATIVE(name, KVM_STATS_UNIT_SECONDS,                    \
> > +             KVM_STATS_BASE_POW10, -6)
> > +/* Cumulative time in nanosecond */
> > +#define STATS_DESC_TIME_NSEC(name)                                          \
> > +     STATS_DESC_CUMULATIVE(name, KVM_STATS_UNIT_SECONDS,                    \
> > +             KVM_STATS_BASE_POW10, -9)
> > +
> > +/* Instantaneous time in second */
> > +#define STATS_DESC_ITIME_SEC(name)                                          \
> > +     STATS_DESC_INSTANT(name, KVM_STATS_UNIT_SECONDS,                       \
> > +             KVM_STATS_BASE_POW10, 0)
> > +/* Instantaneous time in millisecond */
> > +#define STATS_DESC_ITIME_MSEC(name)                                         \
> > +     STATS_DESC_INSTANT(name, KVM_STATS_UNIT_SECONDS,                       \
> > +             KVM_STATS_BASE_POW10, -3)
> > +/* Instantaneous time in microsecond */
> > +#define STATS_DESC_ITIME_USEC(name)                                         \
> > +     STATS_DESC_INSTANT(name, KVM_STATS_UNIT_SECONDS,                       \
> > +             KVM_STATS_BASE_POW10, -6)
> > +/* Instantaneous time in nanosecond */
> > +#define STATS_DESC_ITIME_NSEC(name)                                         \
> > +     STATS_DESC_INSTANT(name, KVM_STATS_UNIT_SECONDS,                       \
> > +             KVM_STATS_BASE_POW10, -9)
> > +
> > +#define DEFINE_VM_STATS_DESC(...) {                                         \
> > +     STATS_DESC_COUNTER("remote_tlb_flush"),                                \
> > +     ## __VA_ARGS__                                                         \
> > +}
> > +
> > +#define DEFINE_VCPU_STATS_DESC(...) {                                               \
> > +     STATS_DESC_COUNTER("halt_successful_poll"),                            \
> > +     STATS_DESC_COUNTER("halt_attempted_poll"),                             \
> > +     STATS_DESC_COUNTER("halt_poll_invalid"),                               \
> > +     STATS_DESC_COUNTER("halt_wakeup"),                                     \
> > +     STATS_DESC_TIME_NSEC("halt_poll_success_ns"),                          \
> > +     STATS_DESC_TIME_NSEC("halt_poll_fail_ns"),                             \
> > +     ## __VA_ARGS__                                                         \
> > +}
> > +
> >   extern struct kvm_stats_debugfs_item debugfs_entries[];
> >   extern struct dentry *kvm_debugfs_dir;
> > +extern struct _kvm_stats_header kvm_vm_stats_header;
> > +extern struct _kvm_stats_header kvm_vcpu_stats_header;
> > +extern struct _kvm_stats_desc kvm_vm_stats_desc[];
> > +extern struct _kvm_stats_desc kvm_vcpu_stats_desc[];
> >
> >   #if defined(CONFIG_MMU_NOTIFIER) && defined(KVM_ARCH_WANT_MMU_NOTIFIER)
> >   static inline int mmu_notifier_retry(struct kvm *kvm, unsigned long mmu_seq)
> > diff --git a/include/uapi/linux/kvm.h b/include/uapi/linux/kvm.h
> > index 3fd9a7e9d90c..dcfa0315e3f9 100644
> > --- a/include/uapi/linux/kvm.h
> > +++ b/include/uapi/linux/kvm.h
> > @@ -1082,6 +1082,7 @@ struct kvm_ppc_resize_hpt {
> >   #define KVM_CAP_SGX_ATTRIBUTE 196
> >   #define KVM_CAP_VM_COPY_ENC_CONTEXT_FROM 197
> >   #define KVM_CAP_PTP_KVM 198
> > +#define KVM_CAP_STATS_BINARY_FD 199
> >
> >   #ifdef KVM_CAP_IRQ_ROUTING
> >
> > @@ -1898,4 +1899,53 @@ struct kvm_dirty_gfn {
> >   #define KVM_BUS_LOCK_DETECTION_OFF             (1 << 0)
> >   #define KVM_BUS_LOCK_DETECTION_EXIT            (1 << 1)
> >
> > +#define KVM_STATS_ID_MAXLEN          64
> > +
> > +struct kvm_stats_header {
> > +     char id[KVM_STATS_ID_MAXLEN];
> > +     __u32 name_size;
> > +     __u32 count;
> > +     __u32 desc_offset;
> > +     __u32 data_offset;
> > +};
> > +
> > +#define KVM_STATS_TYPE_SHIFT         0
> > +#define KVM_STATS_TYPE_MASK          (0xF << KVM_STATS_TYPE_SHIFT)
> > +#define KVM_STATS_TYPE_CUMULATIVE    (0x0 << KVM_STATS_TYPE_SHIFT)
> > +#define KVM_STATS_TYPE_INSTANT               (0x1 << KVM_STATS_TYPE_SHIFT)
> > +#define KVM_STATS_TYPE_MAX           KVM_STATS_TYPE_INSTANT
> > +
> > +#define KVM_STATS_UNIT_SHIFT         4
> > +#define KVM_STATS_UNIT_MASK          (0xF << KVM_STATS_UNIT_SHIFT)
> > +#define KVM_STATS_UNIT_NONE          (0x0 << KVM_STATS_UNIT_SHIFT)
> > +#define KVM_STATS_UNIT_BYTES         (0x1 << KVM_STATS_UNIT_SHIFT)
> > +#define KVM_STATS_UNIT_SECONDS               (0x2 << KVM_STATS_UNIT_SHIFT)
> > +#define KVM_STATS_UNIT_CYCLES                (0x3 << KVM_STATS_UNIT_SHIFT)
> > +#define KVM_STATS_UNIT_MAX           KVM_STATS_UNIT_CYCLES
> > +
> > +#define KVM_STATS_BASE_SHIFT         8
> > +#define KVM_STATS_BASE_MASK          (0xF << KVM_STATS_BASE_SHIFT)
> > +#define KVM_STATS_BASE_POW10         (0x0 << KVM_STATS_BASE_SHIFT)
> > +#define KVM_STATS_BASE_POW2          (0x1 << KVM_STATS_BASE_SHIFT)
> > +#define KVM_STATS_BASE_MAX           KVM_STATS_BASE_POW2
> > +
> > +struct kvm_stats_desc {
> > +     __u32 flags;
> > +     __s16 exponent;
> > +     __u16 size;
> > +     __u32 unused1;
> > +     __u32 unused2;
> > +     char name[0];
> > +};
> > +
> > +struct kvm_vm_stats_data {
> > +     unsigned long value[0];
> > +};
> > +
> > +struct kvm_vcpu_stats_data {
> > +     __u64 value[0];
> > +};
> > +
> > +#define KVM_GET_STATS_FD  _IOR(KVMIO,  0xcc, struct kvm_stats_header)
> > +
> >   #endif /* __LINUX_KVM_H */
> > diff --git a/virt/kvm/kvm_main.c b/virt/kvm/kvm_main.c
> > index f6ad5b080994..d84bb17bdea8 100644
> > --- a/virt/kvm/kvm_main.c
> > +++ b/virt/kvm/kvm_main.c
> > @@ -3409,6 +3409,115 @@ static int kvm_vcpu_ioctl_set_sigmask(struct kvm_vcpu *vcpu, sigset_t *sigset)
> >       return 0;
> >   }
> >
> > +static ssize_t kvm_stats_read(char *id, struct _kvm_stats_header *header,
> > +             struct _kvm_stats_desc *desc, void *stats, size_t size_stats,
> > +             char __user *user_buffer, size_t size, loff_t *offset)
> > +{
> > +     ssize_t copylen, len, remain = size;
> > +     size_t size_header, size_desc;
> > +     loff_t pos = *offset;
> > +     char __user *dest = user_buffer;
> > +     void *src;
> > +
> > +     size_header = sizeof(*header);
> > +     size_desc = header->count * sizeof(*desc);
> > +
> > +     len = KVM_STATS_ID_MAXLEN + size_header + size_desc + size_stats - pos;
> > +     len = min(len, remain);
> > +     if (len <= 0)
> > +             return 0;
> > +     remain = len;
> > +
> > +     /* Copy kvm stats header id string */
> > +     copylen = KVM_STATS_ID_MAXLEN - pos;
> > +     copylen = min(copylen, remain);
> > +     if (copylen > 0) {
> > +             src = id + pos;
> > +             if (copy_to_user(dest, src, copylen))
> > +                     return -EFAULT;
> > +             remain -= copylen;
> > +             pos += copylen;
> > +             dest += copylen;
> > +     }
> > +     /* Copy kvm stats header */
> > +     copylen = KVM_STATS_ID_MAXLEN + size_header - pos;
> > +     copylen = min(copylen, remain);
> > +     if (copylen > 0) {
> > +             src = header + pos - KVM_STATS_ID_MAXLEN;
> > +             if (copy_to_user(dest, src, copylen))
> > +                     return -EFAULT;
> > +             remain -= copylen;
> > +             pos += copylen;
> > +             dest += copylen;
> > +     }
> > +     /* Copy kvm stats descriptors */
> > +     copylen = header->desc_offset + size_desc - pos;
> > +     copylen = min(copylen, remain);
> > +     if (copylen > 0) {
> > +             src = desc + pos - header->desc_offset;
> > +             if (copy_to_user(dest, src, copylen))
> > +                     return -EFAULT;
> > +             remain -= copylen;
> > +             pos += copylen;
> > +             dest += copylen;
> > +     }
> > +     /* Copy kvm stats values */
> > +     copylen = header->data_offset + size_stats - pos;
> > +     copylen = min(copylen, remain);
> > +     if (copylen > 0) {
> > +             src = stats + pos - header->data_offset;
> > +             if (copy_to_user(dest, src, copylen))
> > +                     return -EFAULT;
> > +             remain -= copylen;
> > +             pos += copylen;
> > +             dest += copylen;
> > +     }
> > +
> > +     *offset = pos;
> > +     return len;
> > +}
> > +
> > +static ssize_t kvm_vcpu_stats_read(struct file *file, char __user *user_buffer,
> > +                           size_t size, loff_t *offset)
> > +{
> > +     char id[KVM_STATS_ID_MAXLEN];
> > +     struct kvm_vcpu *vcpu = file->private_data;
> > +
> > +     snprintf(id, sizeof(id), "kvm-%d/vcpu-%d",
> > +                     task_pid_nr(current), vcpu->vcpu_id);
> > +     return kvm_stats_read(id, &kvm_vcpu_stats_header,
> > +                     &kvm_vcpu_stats_desc[0], &vcpu->stat,
> > +                     sizeof(vcpu->stat), user_buffer, size, offset);
> > +}
> > +
> > +static const struct file_operations kvm_vcpu_stats_fops = {
> > +     .read = kvm_vcpu_stats_read,
> > +     .llseek = noop_llseek,
> > +};
> > +
> > +static int kvm_vcpu_ioctl_get_stats_fd(struct kvm_vcpu *vcpu)
> > +{
> > +     int fd;
> > +     struct file *file;
> > +     char name[15 + ITOA_MAX_LEN + 1];
> > +
> > +     snprintf(name, sizeof(name), "kvm-vcpu-stats:%d", vcpu->vcpu_id);
> > +
> > +     fd = get_unused_fd_flags(O_CLOEXEC);
> > +     if (fd < 0)
> > +             return fd;
> > +
> > +     file = anon_inode_getfile(name, &kvm_vcpu_stats_fops, vcpu, O_RDONLY);
> > +     if (IS_ERR(file)) {
> > +             put_unused_fd(fd);
> > +             return PTR_ERR(file);
> > +     }
> > +     file->f_mode |= FMODE_PREAD;
> > +     fd_install(fd, file);
> > +
> > +     return fd;
> > +}
> > +
> >   static long kvm_vcpu_ioctl(struct file *filp,
> >                          unsigned int ioctl, unsigned long arg)
> >   {
> > @@ -3606,6 +3715,10 @@ static long kvm_vcpu_ioctl(struct file *filp,
> >               r = kvm_arch_vcpu_ioctl_set_fpu(vcpu, fpu);
> >               break;
> >       }
> > +     case KVM_GET_STATS_FD: {
> > +             r = kvm_vcpu_ioctl_get_stats_fd(vcpu);
> > +             break;
> > +     }
> >       default:
> >               r = kvm_arch_vcpu_ioctl(filp, ioctl, arg);
> >       }
> > @@ -3864,6 +3977,8 @@ static long kvm_vm_ioctl_check_extension_generic(struct kvm *kvm, long arg)
> >   #else
> >               return 0;
> >   #endif
> > +     case KVM_CAP_STATS_BINARY_FD:
>
> Nit:   KVM_CAP_BINARY_STATS_FD
>
Thanks. Will do.
> > +             return 1;
> >       default:
> >               break;
> >       }
> > @@ -3967,6 +4082,43 @@ static int kvm_vm_ioctl_enable_cap_generic(struct kvm *kvm,
> >       }
> >   }
> >
> > +static ssize_t kvm_vm_stats_read(struct file *file, char __user *user_buffer,
> > +                           size_t size, loff_t *offset)
> > +{
> > +     char id[KVM_STATS_ID_MAXLEN];
> > +     struct kvm *kvm = file->private_data;
> > +
> > +     snprintf(id, sizeof(id), "kvm-%d", task_pid_nr(current));
> > +     return kvm_stats_read(id, &kvm_vm_stats_header, &kvm_vm_stats_desc[0],
> > +             &kvm->stat, sizeof(kvm->stat), user_buffer, size, offset);
> > +}
> > +
> > +static const struct file_operations kvm_vm_stats_fops = {
> > +     .read = kvm_vm_stats_read,
> > +     .llseek = noop_llseek,
> > +};
> > +
> > +static int kvm_vm_ioctl_get_stats_fd(struct kvm *kvm)
> > +{
> > +     int fd;
> > +     struct file *file;
> > +
> > +     fd = get_unused_fd_flags(O_CLOEXEC);
> > +     if (fd < 0)
> > +             return fd;
> > +
> > +     file = anon_inode_getfile("kvm-vm-stats",
> > +                     &kvm_vm_stats_fops, kvm, O_RDONLY);
> > +     if (IS_ERR(file)) {
> > +             put_unused_fd(fd);
> > +             return PTR_ERR(file);
> > +     }
> > +     file->f_mode |= FMODE_PREAD;
> > +     fd_install(fd, file);
> > +
> > +     return fd;
> > +}
> > +
> >   static long kvm_vm_ioctl(struct file *filp,
> >                          unsigned int ioctl, unsigned long arg)
> >   {
> > @@ -4149,6 +4301,9 @@ static long kvm_vm_ioctl(struct file *filp,
> >       case KVM_RESET_DIRTY_RINGS:
> >               r = kvm_vm_ioctl_reset_dirty_pages(kvm);
> >               break;
> > +     case KVM_GET_STATS_FD:
> > +             r = kvm_vm_ioctl_get_stats_fd(kvm);
> > +             break;
> >       default:
> >               r = kvm_arch_vm_ioctl(filp, ioctl, arg);
> >       }
> Reviewed-by: Krish Sadhukhan <krish.sadhukhan@oracle.com>

^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: [PATCH v7 4/4] KVM: selftests: Add selftest for KVM statistics data binary interface
  2021-06-07 19:23   ` Krish Sadhukhan
@ 2021-06-07 22:12     ` Jing Zhang
  0 siblings, 0 replies; 32+ messages in thread
From: Jing Zhang @ 2021-06-07 22:12 UTC (permalink / raw)
  To: Krish Sadhukhan
  Cc: KVM, KVMARM, LinuxMIPS, KVMPPC, LinuxS390, Linuxkselftest,
	Paolo Bonzini, Marc Zyngier, James Morse, Julien Thierry,
	Suzuki K Poulose, Will Deacon, Huacai Chen, Aleksandar Markovic,
	Thomas Bogendoerfer, Paul Mackerras, Christian Borntraeger,
	Janosch Frank, David Hildenbrand, Cornelia Huck,
	Claudio Imbrenda, Sean Christopherson, Vitaly Kuznetsov,
	Jim Mattson, Peter Shier, Oliver Upton, David Rientjes,
	Emanuele Giuseppe Esposito, David Matlack, Ricardo Koller

On Mon, Jun 7, 2021 at 2:23 PM Krish Sadhukhan
<krish.sadhukhan@oracle.com> wrote:
>
>
> On 6/3/21 2:14 PM, Jing Zhang wrote:
> > Add selftest to check KVM stats descriptors validity.
> >
> > Reviewed-by: David Matlack <dmatlack@google.com>
> > Reviewed-by: Ricardo Koller <ricarkol@google.com>
> > Signed-off-by: Jing Zhang <jingzhangos@google.com>
> > ---
> >   tools/testing/selftests/kvm/.gitignore        |   1 +
> >   tools/testing/selftests/kvm/Makefile          |   3 +
> >   .../testing/selftests/kvm/include/kvm_util.h  |   3 +
> >   .../selftests/kvm/kvm_binary_stats_test.c     | 215 ++++++++++++++++++
> >   tools/testing/selftests/kvm/lib/kvm_util.c    |  12 +
> >   5 files changed, 234 insertions(+)
> >   create mode 100644 tools/testing/selftests/kvm/kvm_binary_stats_test.c
> >
> > diff --git a/tools/testing/selftests/kvm/.gitignore b/tools/testing/selftests/kvm/.gitignore
> > index bd83158e0e0b..d1c3ee7d3e41 100644
> > --- a/tools/testing/selftests/kvm/.gitignore
> > +++ b/tools/testing/selftests/kvm/.gitignore
> > @@ -43,3 +43,4 @@
> >   /memslot_modification_stress_test
> >   /set_memory_region_test
> >   /steal_time
> > +/kvm_binary_stats_test
> > diff --git a/tools/testing/selftests/kvm/Makefile b/tools/testing/selftests/kvm/Makefile
> > index e439d027939d..0cd46d6d1e15 100644
> > --- a/tools/testing/selftests/kvm/Makefile
> > +++ b/tools/testing/selftests/kvm/Makefile
> > @@ -76,6 +76,7 @@ TEST_GEN_PROGS_x86_64 += kvm_page_table_test
> >   TEST_GEN_PROGS_x86_64 += memslot_modification_stress_test
> >   TEST_GEN_PROGS_x86_64 += set_memory_region_test
> >   TEST_GEN_PROGS_x86_64 += steal_time
> > +TEST_GEN_PROGS_x86_64 += kvm_binary_stats_test
> >
> >   TEST_GEN_PROGS_aarch64 += aarch64/get-reg-list
> >   TEST_GEN_PROGS_aarch64 += aarch64/get-reg-list-sve
> > @@ -87,6 +88,7 @@ TEST_GEN_PROGS_aarch64 += kvm_create_max_vcpus
> >   TEST_GEN_PROGS_aarch64 += kvm_page_table_test
> >   TEST_GEN_PROGS_aarch64 += set_memory_region_test
> >   TEST_GEN_PROGS_aarch64 += steal_time
> > +TEST_GEN_PROGS_aarch64 += kvm_binary_stats_test
> >
> >   TEST_GEN_PROGS_s390x = s390x/memop
> >   TEST_GEN_PROGS_s390x += s390x/resets
> > @@ -96,6 +98,7 @@ TEST_GEN_PROGS_s390x += dirty_log_test
> >   TEST_GEN_PROGS_s390x += kvm_create_max_vcpus
> >   TEST_GEN_PROGS_s390x += kvm_page_table_test
> >   TEST_GEN_PROGS_s390x += set_memory_region_test
> > +TEST_GEN_PROGS_s390x += kvm_binary_stats_test
> >
> >   TEST_GEN_PROGS += $(TEST_GEN_PROGS_$(UNAME_M))
> >   LIBKVM += $(LIBKVM_$(UNAME_M))
> > diff --git a/tools/testing/selftests/kvm/include/kvm_util.h b/tools/testing/selftests/kvm/include/kvm_util.h
> > index a8f022794ce3..96d15da3d72e 100644
> > --- a/tools/testing/selftests/kvm/include/kvm_util.h
> > +++ b/tools/testing/selftests/kvm/include/kvm_util.h
> > @@ -387,4 +387,7 @@ uint64_t get_ucall(struct kvm_vm *vm, uint32_t vcpu_id, struct ucall *uc);
> >   #define GUEST_ASSERT_4(_condition, arg1, arg2, arg3, arg4) \
> >       __GUEST_ASSERT((_condition), 4, (arg1), (arg2), (arg3), (arg4))
> >
> > +int vm_get_stats_fd(struct kvm_vm *vm);
> > +int vcpu_get_stats_fd(struct kvm_vm *vm, uint32_t vcpuid);
> > +
> >   #endif /* SELFTEST_KVM_UTIL_H */
> > diff --git a/tools/testing/selftests/kvm/kvm_binary_stats_test.c b/tools/testing/selftests/kvm/kvm_binary_stats_test.c
> > new file mode 100644
> > index 000000000000..081983110dc5
> > --- /dev/null
> > +++ b/tools/testing/selftests/kvm/kvm_binary_stats_test.c
> > @@ -0,0 +1,215 @@
> > +// SPDX-License-Identifier: GPL-2.0-only
> > +/*
> > + * kvm_binary_stats_test
> > + *
> > + * Copyright (C) 2021, Google LLC.
> > + *
> > + * Test the fd-based interface for KVM statistics.
> > + */
> > +
> > +#define _GNU_SOURCE /* for program_invocation_short_name */
> > +#include <fcntl.h>
> > +#include <stdio.h>
> > +#include <stdlib.h>
> > +#include <string.h>
> > +#include <errno.h>
> > +
> > +#include "test_util.h"
> > +
> > +#include "kvm_util.h"
> > +#include "asm/kvm.h"
> > +#include "linux/kvm.h"
> > +
> > +void stats_test(int stats_fd, int size_stat)
> > +{
> > +     ssize_t ret;
> > +     int i;
> > +     size_t size_desc, size_data = 0;
> > +     struct kvm_stats_header header;
> > +     struct kvm_stats_desc *stats_desc, *pdesc;
> > +     void *stats_data;
> > +
> > +     /* Read kvm stats header */
> > +     ret = read(stats_fd, &header, sizeof(header));
> > +     TEST_ASSERT(ret == sizeof(header), "Read stats header");
> > +     size_desc = sizeof(*stats_desc) + header.name_size;
> > +
> > +     /* Check id string in header, that should start with "kvm" */
> > +     TEST_ASSERT(!strncmp(header.id, "kvm", 3) &&
> > +                     strlen(header.id) < KVM_STATS_ID_MAXLEN,
> > +                     "Invalid KVM stats type");
> > +
> > +     /* Sanity check for other fields in header */
> > +     if (header.count == 0) {
> > +             printf("No KVM stats defined!");
> > +             return;
> > +     }
> > +     /* Check overlap */
> > +     TEST_ASSERT(header.desc_offset > 0 && header.data_offset > 0
> > +                     && header.desc_offset >= sizeof(header)
> > +                     && header.data_offset >= sizeof(header),
> > +                     "Invalid offset fields in header");
> > +     TEST_ASSERT(header.desc_offset > header.data_offset
> > +                     || (header.desc_offset + size_desc * header.count <=
> > +                             header.data_offset),
> > +                     "Descriptor block is overlapped with data block");
> > +
> > +     /* Allocate memory for stats descriptors */
> > +     stats_desc = calloc(header.count, size_desc);
> > +     TEST_ASSERT(stats_desc, "Allocate memory for stats descriptors");
> > +     /* Read kvm stats descriptors */
> > +     ret = pread(stats_fd, stats_desc,
> > +                     size_desc * header.count, header.desc_offset);
> > +     TEST_ASSERT(ret == size_desc * header.count,
> > +                     "Read KVM stats descriptors");
> > +
> > +     /* Sanity check for fields in descriptors */
> > +     for (i = 0; i < header.count; ++i) {
> > +             pdesc = (void *)stats_desc + i * size_desc;
> > +             /* Check type,unit,base boundaries */
> > +             TEST_ASSERT((pdesc->flags & KVM_STATS_TYPE_MASK)
> > +                             <= KVM_STATS_TYPE_MAX, "Unknown KVM stats type");
> > +             TEST_ASSERT((pdesc->flags & KVM_STATS_UNIT_MASK)
> > +                             <= KVM_STATS_UNIT_MAX, "Unknown KVM stats unit");
> > +             TEST_ASSERT((pdesc->flags & KVM_STATS_BASE_MASK)
> > +                             <= KVM_STATS_BASE_MAX, "Unknown KVM stats base");
> > +             /* Check exponent for stats unit
> > +              * Exponent for counter should be greater than or equal to 0
> > +              * Exponent for unit bytes should be greater than or equal to 0
> > +              * Exponent for unit seconds should be less than or equal to 0
> > +              * Exponent for unit clock cycles should be greater than or
> > +              * equal to 0
> > +              */
> > +             switch (pdesc->flags & KVM_STATS_UNIT_MASK) {
> > +             case KVM_STATS_UNIT_NONE:
> > +             case KVM_STATS_UNIT_BYTES:
> > +             case KVM_STATS_UNIT_CYCLES:
> > +                     TEST_ASSERT(pdesc->exponent >= 0,
> > +                                     "Unsupported KVM stats unit");
> > +                     break;
> > +             case KVM_STATS_UNIT_SECONDS:
> > +                     TEST_ASSERT(pdesc->exponent <= 0,
> > +                                     "Unsupported KVM stats unit");
> > +                     break;
> > +             }
> > +             /* Check name string */
> > +             TEST_ASSERT(strlen(pdesc->name) < header.name_size,
> > +                             "KVM stats name(%s) too long", pdesc->name);
> > +             /* Check size field, which should not be zero */
> > +             TEST_ASSERT(pdesc->size, "KVM descriptor(%s) with size of 0",
> > +                             pdesc->name);
> > +             size_data += pdesc->size * size_stat;
> > +     }
> > +     /* Check overlap */
> > +     TEST_ASSERT(header.data_offset >= header.desc_offset
> > +                     || header.data_offset + size_data <= header.desc_offset,
> > +                     "Data block is overlapped with Descriptor block");
> > +     /* Check validity of all stats data size */
> > +     TEST_ASSERT(size_data >= header.count * size_stat,
> > +                     "Data size is not correct");
> > +
> > +     /* Allocate memory for stats data */
> > +     stats_data = malloc(size_data);
> > +     TEST_ASSERT(stats_data, "Allocate memory for stats data");
> > +     /* Read kvm stats data as a bulk */
> > +     ret = pread(stats_fd, stats_data, size_data, header.data_offset);
> > +     TEST_ASSERT(ret == size_data, "Read KVM stats data");
> > +     /* Read kvm stats data one by one */
> > +     size_data = 0;
> > +     for (i = 0; i < header.count; ++i) {
> > +             pdesc = (void *)stats_desc + i * size_desc;
> > +             ret = pread(stats_fd, stats_data, pdesc->size * size_stat,
> > +                             header.data_offset + size_data);
> > +             TEST_ASSERT(ret == pdesc->size * size_stat,
> > +                             "Read data of KVM stats: %s", pdesc->name);
> > +             size_data += pdesc->size * size_stat;
> > +     }
> > +
> > +     free(stats_data);
> > +     free(stats_desc);
> > +}
> > +
> > +
> > +void vm_stats_test(struct kvm_vm *vm)
> > +{
> > +     int stats_fd;
> > +     struct kvm_vm_stats_data *stats_data;
> > +
> > +     /* Get fd for VM stats */
> > +     stats_fd = vm_get_stats_fd(vm);
> > +     TEST_ASSERT(stats_fd >= 0, "Get VM stats fd");
> > +
> > +     stats_test(stats_fd, sizeof(stats_data->value[0]));
> > +     close(stats_fd);
> > +     TEST_ASSERT(fcntl(stats_fd, F_GETFD) == -1, "Stats fd not freed");
> > +}
> > +
> > +void vcpu_stats_test(struct kvm_vm *vm, int vcpu_id)
> > +{
> > +     int stats_fd;
> > +     struct kvm_vcpu_stats_data *stats_data;
> > +
> > +     /* Get fd for VCPU stats */
> > +     stats_fd = vcpu_get_stats_fd(vm, vcpu_id);
> > +     TEST_ASSERT(stats_fd >= 0, "Get VCPU stats fd");
> > +
> > +     stats_test(stats_fd, sizeof(stats_data->value[0]));
> > +     close(stats_fd);
> > +     TEST_ASSERT(fcntl(stats_fd, F_GETFD) == -1, "Stats fd not freed");
> > +}
> > +
> > +#define DEFAULT_NUM_VM               4
> > +#define DEFAULT_NUM_VCPU     4
> > +
> > +/*
> > + * Usage: kvm_bin_form_stats [#vm] [#vcpu]
> > + * The first parameter #vm set the number of VMs being created.
> > + * The second parameter #vcpu set the number of VCPUs being created.
> > + * By default, DEFAULT_NUM_VM VM and DEFAULT_NUM_VCPU VCPU for the VM would be
> > + * created for testing.
> > + */
> > +
> > +int main(int argc, char *argv[])
> > +{
> > +     int max_vm = DEFAULT_NUM_VM, max_vcpu = DEFAULT_NUM_VCPU, ret, i, j;
> > +     struct kvm_vm **vms;
> > +
> > +     /* Get the number of VMs and VCPUs that would be created for testing. */
> > +     if (argc > 1) {
> > +             max_vm = strtol(argv[1], NULL, 0);
> > +             if (max_vm <= 0)
> > +                     max_vm = DEFAULT_NUM_VM;
> > +     }
> > +     if (argc > 2) {
> > +             max_vcpu = strtol(argv[2], NULL, 0);
> > +             if (max_vcpu <= 0)
> > +                     max_vcpu = DEFAULT_NUM_VCPU;
> > +     }
> > +
> > +     /* Check the extension for binary stats */
> > +     ret = kvm_check_cap(KVM_CAP_STATS_BINARY_FD);
> > +     TEST_ASSERT(ret >= 0,
> > +                     "Binary form statistics interface is not supported");
> > +
> > +     /* Create VMs and VCPUs */
> > +     vms = malloc(sizeof(vms[0]) * max_vm);
> > +     TEST_ASSERT(vms, "Allocate memory for storing VM pointers");
> > +     for (i = 0; i < max_vm; ++i) {
> > +             vms[i] = vm_create(VM_MODE_DEFAULT,
> > +                             DEFAULT_GUEST_PHY_PAGES, O_RDWR);
> > +             for (j = 0; j < max_vcpu; ++j)
> > +                     vm_vcpu_add(vms[i], j);
> > +     }
> > +
> > +     /* Check stats read for every VM and VCPU */
> > +     for (i = 0; i < max_vm; ++i) {
> > +             vm_stats_test(vms[i]);
> > +             for (j = 0; j < max_vcpu; ++j)
> > +                     vcpu_stats_test(vms[i], j);
> > +     }
> > +
> > +     for (i = 0; i < max_vm; ++i)
> > +             kvm_vm_free(vms[i]);
> > +     free(vms);
> > +     return 0;
> > +}
> > diff --git a/tools/testing/selftests/kvm/lib/kvm_util.c b/tools/testing/selftests/kvm/lib/kvm_util.c
> > index fc83f6c5902d..10385b76fe11 100644
> > --- a/tools/testing/selftests/kvm/lib/kvm_util.c
> > +++ b/tools/testing/selftests/kvm/lib/kvm_util.c
> > @@ -2090,3 +2090,15 @@ unsigned int vm_calc_num_guest_pages(enum vm_guest_mode mode, size_t size)
> >       n = DIV_ROUND_UP(size, vm_guest_mode_params[mode].page_size);
> >       return vm_adjust_num_guest_pages(mode, n);
> >   }
> > +
> > +int vm_get_stats_fd(struct kvm_vm *vm)
> > +{
> > +     return ioctl(vm->fd, KVM_GET_STATS_FD, NULL);
> > +}
> > +
> > +int vcpu_get_stats_fd(struct kvm_vm *vm, uint32_t vcpuid)
> > +{
> > +     struct vcpu *vcpu = vcpu_find(vm, vcpuid);
> > +
> > +     return ioctl(vcpu->fd, KVM_GET_STATS_FD, NULL);
> > +}
>
> We don't want to add a test case for testing the fd interface on a
> deleted VM and a deleted VCPU ?
>
> Anyway, for the current content,
>
> Reviewed-by: Krish Sadhukhan <krish.sadhukhan@oracle.com>
>
Thanks Krish, those invalid fd tests are added at the end of
vm_stats_test and vcpu_stats_test.

^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: [PATCH v7 2/4] KVM: stats: Add fd-based API to read binary stats data
  2021-06-03 21:14 ` [PATCH v7 2/4] KVM: stats: Add fd-based API to read binary stats data Jing Zhang
  2021-06-07 19:22   ` Krish Sadhukhan
@ 2021-06-10 16:23   ` Paolo Bonzini
  2021-06-10 17:26     ` Jing Zhang
  2021-06-10 22:47     ` Jing Zhang
  2021-06-10 16:42   ` Paolo Bonzini
  2021-06-13 15:19   ` Fuad Tabba
  3 siblings, 2 replies; 32+ messages in thread
From: Paolo Bonzini @ 2021-06-10 16:23 UTC (permalink / raw)
  To: Jing Zhang, KVM, KVMARM, LinuxMIPS, KVMPPC, LinuxS390,
	Linuxkselftest, Marc Zyngier, James Morse, Julien Thierry,
	Suzuki K Poulose, Will Deacon, Huacai Chen, Aleksandar Markovic,
	Thomas Bogendoerfer, Paul Mackerras, Christian Borntraeger,
	Janosch Frank, David Hildenbrand, Cornelia Huck,
	Claudio Imbrenda, Sean Christopherson, Vitaly Kuznetsov,
	Jim Mattson, Peter Shier, Oliver Upton, David Rientjes,
	Emanuele Giuseppe Esposito, David Matlack, Ricardo Koller,
	Krish Sadhukhan

On 03/06/21 23:14, Jing Zhang wrote:
> +#define DEFINE_VM_STATS_DESC(...) {					       \
> +	STATS_DESC_COUNTER("remote_tlb_flush"),				       \
> +	## __VA_ARGS__							       \
> +}
> +
> +#define DEFINE_VCPU_STATS_DESC(...) {					       \
> +	STATS_DESC_COUNTER("halt_successful_poll"),			       \
> +	STATS_DESC_COUNTER("halt_attempted_poll"),			       \
> +	STATS_DESC_COUNTER("halt_poll_invalid"),			       \
> +	STATS_DESC_COUNTER("halt_wakeup"),				       \
> +	STATS_DESC_TIME_NSEC("halt_poll_success_ns"),			       \
> +	STATS_DESC_TIME_NSEC("halt_poll_fail_ns"),			       \
> +	## __VA_ARGS__							       \

Let's instead put this (note it's without braces) in macros like these

#define KVM_GENERIC_VM_STATS()							\
	STATS_DESC_COUNTER("remote_tlb_flush"),

#define KVM_GENERIC_VCPU_STATS(...)						\
	STATS_DESC_COUNTER("halt_successful_poll"),				\
	STATS_DESC_COUNTER("halt_attempted_poll"),				\
	STATS_DESC_COUNTER("halt_poll_invalid"),				\
	STATS_DESC_COUNTER("halt_wakeup"),					\
	STATS_DESC_TIME_NSEC("halt_poll_success_ns"),				\
	STATS_DESC_TIME_NSEC("halt_poll_fail_ns"),

and it can be used in the arch files.  In fact it can even be added in patch 1 and
switched to STATS_DESC_* here.

Paolo


^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: [PATCH v7 2/4] KVM: stats: Add fd-based API to read binary stats data
  2021-06-03 21:14 ` [PATCH v7 2/4] KVM: stats: Add fd-based API to read binary stats data Jing Zhang
  2021-06-07 19:22   ` Krish Sadhukhan
  2021-06-10 16:23   ` Paolo Bonzini
@ 2021-06-10 16:42   ` Paolo Bonzini
  2021-06-10 17:27     ` Jing Zhang
  2021-06-13 15:19   ` Fuad Tabba
  3 siblings, 1 reply; 32+ messages in thread
From: Paolo Bonzini @ 2021-06-10 16:42 UTC (permalink / raw)
  To: Jing Zhang, KVM, KVMARM, LinuxMIPS, KVMPPC, LinuxS390,
	Linuxkselftest, Marc Zyngier, James Morse, Julien Thierry,
	Suzuki K Poulose, Will Deacon, Huacai Chen, Aleksandar Markovic,
	Thomas Bogendoerfer, Paul Mackerras, Christian Borntraeger,
	Janosch Frank, David Hildenbrand, Cornelia Huck,
	Claudio Imbrenda, Sean Christopherson, Vitaly Kuznetsov,
	Jim Mattson, Peter Shier, Oliver Upton, David Rientjes,
	Emanuele Giuseppe Esposito, David Matlack, Ricardo Koller,
	Krish Sadhukhan

On 03/06/21 23:14, Jing Zhang wrote:
> +struct _kvm_stats_header {
> +	__u32 name_size;
> +	__u32 count;
> +	__u32 desc_offset;
> +	__u32 data_offset;
> +};
> +

Keeping this struct in sync with kvm_stats_header is a bit messy.  If 
you move the id at the end of the header, however, you can use the same 
trick with the zero-sized array that you used for _kvm_stats_desc.

> +struct kvm_vm_stats_data {
> +	unsigned long value[0];
> +};
> +

I posted the patch to switch the VM statistics to 64-bit; you can rebase 
on top of it.

> +#define KVM_GET_STATS_FD  _IOR(KVMIO,  0xcc, struct kvm_stats_header)

This should be _IO(KVMIO, 0xcc) since it does not have an argument.

> +#define STATS_DESC(stat, type, unit, scale, exp)			       \
> +	{								       \
> +		{							       \
> +			.flags = type | unit | scale,			       \
> +			.exponent = exp,				       \
> +			.size = 1					       \
> +		},							       \
> +		.name = stat,						       \

Here you can use

	type | BUILD_BUG_ON_ZERO(type & ~KVM_STATS_TYPE_MASK) |
	unit | BUILD_BUG_ON_ZERO(unit & ~KVM_STATS_UNIT_MASK) |
	scale | BUILD_BUG_ON_ZERO(scale & ~KVM_STATS_SCALE_MASK) |

to get a little bit of type checking.

Paolo


^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: [PATCH v7 0/4] KVM statistics data fd-based binary interface
  2021-06-03 21:14 [PATCH v7 0/4] KVM statistics data fd-based binary interface Jing Zhang
                   ` (3 preceding siblings ...)
  2021-06-03 21:14 ` [PATCH v7 4/4] KVM: selftests: Add selftest for KVM " Jing Zhang
@ 2021-06-10 16:46 ` Paolo Bonzini
  2021-06-10 17:30   ` Jing Zhang
  2021-06-11  6:57 ` Christian Borntraeger
  5 siblings, 1 reply; 32+ messages in thread
From: Paolo Bonzini @ 2021-06-10 16:46 UTC (permalink / raw)
  To: Jing Zhang, KVM, KVMARM, LinuxMIPS, KVMPPC, LinuxS390,
	Linuxkselftest, Marc Zyngier, James Morse, Julien Thierry,
	Suzuki K Poulose, Will Deacon, Huacai Chen, Aleksandar Markovic,
	Thomas Bogendoerfer, Paul Mackerras, Christian Borntraeger,
	Janosch Frank, David Hildenbrand, Cornelia Huck,
	Claudio Imbrenda, Sean Christopherson, Vitaly Kuznetsov,
	Jim Mattson, Peter Shier, Oliver Upton, David Rientjes,
	Emanuele Giuseppe Esposito, David Matlack, Ricardo Koller,
	Krish Sadhukhan

On 03/06/21 23:14, Jing Zhang wrote:
> This patchset provides a file descriptor for every VM and VCPU to read
> KVM statistics data in binary format.
> It is meant to provide a lightweight, flexible, scalable and efficient
> lock-free solution for user space telemetry applications to pull the
> statistics data periodically for large scale systems. The pulling
> frequency could be as high as a few times per second.
> In this patchset, every statistics data are treated to have some
> attributes as below:
>    * architecture dependent or generic
>    * VM statistics data or VCPU statistics data
>    * type: cumulative, instantaneous,
>    * unit: none for simple counter, nanosecond, microsecond,
>      millisecond, second, Byte, KiByte, MiByte, GiByte. Clock Cycles
> Since no lock/synchronization is used, the consistency between all
> the statistics data is not guaranteed. That means not all statistics
> data are read out at the exact same time, since the statistics date
> are still being updated by KVM subsystems while they are read out.
> 
> ---
> 
> * v6 -> v7
>    - Improve file descriptor allocation function by Krish suggestion
>    - Use "generic stats" instead of "common stats" as Krish suggested
>    - Addressed some other nits from Krish and David Matlack
> 
> * v5 -> v6
>    - Use designated initializers for STATS_DESC
>    - Change KVM_STATS_SCALE... to KVM_STATS_BASE...
>    - Use a common function for kvm_[vm|vcpu]_stats_read
>    - Fix some documentation errors/missings
>    - Use TEST_ASSERT in selftest
>    - Use a common function for [vm|vcpu]_stats_test in selftest
> 
> * v4 -> v5
>    - Rebase to kvm/queue, commit a4345a7cecfb ("Merge tag
>      'kvmarm-fixes-5.13-1'")
>    - Change maximum stats name length to 48
>    - Replace VM_STATS_COMMON/VCPU_STATS_COMMON macros with stats
>      descriptor definition macros.
>    - Fixed some errors/warnings reported by checkpatch.pl
> 
> * v3 -> v4
>    - Rebase to kvm/queue, commit 9f242010c3b4 ("KVM: avoid "deadlock"
>      between install_new_memslots and MMU notifier")
>    - Use C-stype comments in the whole patch
>    - Fix wrong count for x86 VCPU stats descriptors
>    - Fix KVM stats data size counting and validity check in selftest
> 
> * v2 -> v3
>    - Rebase to kvm/queue, commit edf408f5257b ("KVM: avoid "deadlock"
>      between install_new_memslots and MMU notifier")
>    - Resolve some nitpicks about format
> 
> * v1 -> v2
>    - Use ARRAY_SIZE to count the number of stats descriptors
>    - Fix missing `size` field initialization in macro STATS_DESC
> 
> [1] https://lore.kernel.org/kvm/20210402224359.2297157-1-jingzhangos@google.com
> [2] https://lore.kernel.org/kvm/20210415151741.1607806-1-jingzhangos@google.com
> [3] https://lore.kernel.org/kvm/20210423181727.596466-1-jingzhangos@google.com
> [4] https://lore.kernel.org/kvm/20210429203740.1935629-1-jingzhangos@google.com
> [5] https://lore.kernel.org/kvm/20210517145314.157626-1-jingzhangos@google.com
> [6] https://lore.kernel.org/kvm/20210524151828.4113777-1-jingzhangos@google.com
> 
> ---
> 
> Jing Zhang (4):
>    KVM: stats: Separate generic stats from architecture specific ones
>    KVM: stats: Add fd-based API to read binary stats data
>    KVM: stats: Add documentation for statistics data binary interface
>    KVM: selftests: Add selftest for KVM statistics data binary interface
> 
>   Documentation/virt/kvm/api.rst                | 180 +++++++++++++++
>   arch/arm64/include/asm/kvm_host.h             |   9 +-
>   arch/arm64/kvm/guest.c                        |  38 +++-
>   arch/mips/include/asm/kvm_host.h              |   9 +-
>   arch/mips/kvm/mips.c                          |  64 +++++-
>   arch/powerpc/include/asm/kvm_host.h           |   9 +-
>   arch/powerpc/kvm/book3s.c                     |  64 +++++-
>   arch/powerpc/kvm/book3s_hv.c                  |  12 +-
>   arch/powerpc/kvm/book3s_pr.c                  |   2 +-
>   arch/powerpc/kvm/book3s_pr_papr.c             |   2 +-
>   arch/powerpc/kvm/booke.c                      |  59 ++++-
>   arch/s390/include/asm/kvm_host.h              |   9 +-
>   arch/s390/kvm/kvm-s390.c                      | 129 ++++++++++-
>   arch/x86/include/asm/kvm_host.h               |   9 +-
>   arch/x86/kvm/x86.c                            |  67 +++++-
>   include/linux/kvm_host.h                      | 141 +++++++++++-
>   include/linux/kvm_types.h                     |  12 +
>   include/uapi/linux/kvm.h                      |  50 ++++
>   tools/testing/selftests/kvm/.gitignore        |   1 +
>   tools/testing/selftests/kvm/Makefile          |   3 +
>   .../testing/selftests/kvm/include/kvm_util.h  |   3 +
>   .../selftests/kvm/kvm_binary_stats_test.c     | 215 ++++++++++++++++++
>   tools/testing/selftests/kvm/lib/kvm_util.c    |  12 +
>   virt/kvm/kvm_main.c                           | 169 +++++++++++++-
>   24 files changed, 1178 insertions(+), 90 deletions(-)
>   create mode 100644 tools/testing/selftests/kvm/kvm_binary_stats_test.c
> 
> 
> base-commit: a4345a7cecfb91ae78cd43d26b0c6a956420761a
> 

I had a few remarks, but it looks very nice overall.

Thanks!

Paolo


^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: [PATCH v7 2/4] KVM: stats: Add fd-based API to read binary stats data
  2021-06-10 16:23   ` Paolo Bonzini
@ 2021-06-10 17:26     ` Jing Zhang
  2021-06-10 22:47     ` Jing Zhang
  1 sibling, 0 replies; 32+ messages in thread
From: Jing Zhang @ 2021-06-10 17:26 UTC (permalink / raw)
  To: Paolo Bonzini
  Cc: KVM, KVMARM, LinuxMIPS, KVMPPC, LinuxS390, Linuxkselftest,
	Marc Zyngier, James Morse, Julien Thierry, Suzuki K Poulose,
	Will Deacon, Huacai Chen, Aleksandar Markovic,
	Thomas Bogendoerfer, Paul Mackerras, Christian Borntraeger,
	Janosch Frank, David Hildenbrand, Cornelia Huck,
	Claudio Imbrenda, Sean Christopherson, Vitaly Kuznetsov,
	Jim Mattson, Peter Shier, Oliver Upton, David Rientjes,
	Emanuele Giuseppe Esposito, David Matlack, Ricardo Koller,
	Krish Sadhukhan

Hi Paolo,

On Thu, Jun 10, 2021 at 11:23 AM Paolo Bonzini <pbonzini@redhat.com> wrote:
>
> On 03/06/21 23:14, Jing Zhang wrote:
> > +#define DEFINE_VM_STATS_DESC(...) {                                         \
> > +     STATS_DESC_COUNTER("remote_tlb_flush"),                                \
> > +     ## __VA_ARGS__                                                         \
> > +}
> > +
> > +#define DEFINE_VCPU_STATS_DESC(...) {                                               \
> > +     STATS_DESC_COUNTER("halt_successful_poll"),                            \
> > +     STATS_DESC_COUNTER("halt_attempted_poll"),                             \
> > +     STATS_DESC_COUNTER("halt_poll_invalid"),                               \
> > +     STATS_DESC_COUNTER("halt_wakeup"),                                     \
> > +     STATS_DESC_TIME_NSEC("halt_poll_success_ns"),                          \
> > +     STATS_DESC_TIME_NSEC("halt_poll_fail_ns"),                             \
> > +     ## __VA_ARGS__                                                         \
>
> Let's instead put this (note it's without braces) in macros like these
>
> #define KVM_GENERIC_VM_STATS()                                                  \
>         STATS_DESC_COUNTER("remote_tlb_flush"),
>
> #define KVM_GENERIC_VCPU_STATS(...)                                             \
>         STATS_DESC_COUNTER("halt_successful_poll"),                             \
>         STATS_DESC_COUNTER("halt_attempted_poll"),                              \
>         STATS_DESC_COUNTER("halt_poll_invalid"),                                \
>         STATS_DESC_COUNTER("halt_wakeup"),                                      \
>         STATS_DESC_TIME_NSEC("halt_poll_success_ns"),                           \
>         STATS_DESC_TIME_NSEC("halt_poll_fail_ns"),
>
> and it can be used in the arch files.  In fact it can even be added in patch 1 and
> switched to STATS_DESC_* here.
>
> Paolo
>
Sure, will do.

Thank,s
Jing

^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: [PATCH v7 2/4] KVM: stats: Add fd-based API to read binary stats data
  2021-06-10 16:42   ` Paolo Bonzini
@ 2021-06-10 17:27     ` Jing Zhang
  0 siblings, 0 replies; 32+ messages in thread
From: Jing Zhang @ 2021-06-10 17:27 UTC (permalink / raw)
  To: Paolo Bonzini
  Cc: KVM, KVMARM, LinuxMIPS, KVMPPC, LinuxS390, Linuxkselftest,
	Marc Zyngier, James Morse, Julien Thierry, Suzuki K Poulose,
	Will Deacon, Huacai Chen, Aleksandar Markovic,
	Thomas Bogendoerfer, Paul Mackerras, Christian Borntraeger,
	Janosch Frank, David Hildenbrand, Cornelia Huck,
	Claudio Imbrenda, Sean Christopherson, Vitaly Kuznetsov,
	Jim Mattson, Peter Shier, Oliver Upton, David Rientjes,
	Emanuele Giuseppe Esposito, David Matlack, Ricardo Koller,
	Krish Sadhukhan

Hi Paolo,

On Thu, Jun 10, 2021 at 11:42 AM Paolo Bonzini <pbonzini@redhat.com> wrote:
>
> On 03/06/21 23:14, Jing Zhang wrote:
> > +struct _kvm_stats_header {
> > +     __u32 name_size;
> > +     __u32 count;
> > +     __u32 desc_offset;
> > +     __u32 data_offset;
> > +};
> > +
>
> Keeping this struct in sync with kvm_stats_header is a bit messy.  If
> you move the id at the end of the header, however, you can use the same
> trick with the zero-sized array that you used for _kvm_stats_desc.
>
Good point. Will do.
> > +struct kvm_vm_stats_data {
> > +     unsigned long value[0];
> > +};
> > +
>
> I posted the patch to switch the VM statistics to 64-bit; you can rebase
> on top of it.
Cool!
>
> > +#define KVM_GET_STATS_FD  _IOR(KVMIO,  0xcc, struct kvm_stats_header)
>
> This should be _IO(KVMIO, 0xcc) since it does not have an argument.
>
Will correct it.
> > +#define STATS_DESC(stat, type, unit, scale, exp)                            \
> > +     {                                                                      \
> > +             {                                                              \
> > +                     .flags = type | unit | scale,                          \
> > +                     .exponent = exp,                                       \
> > +                     .size = 1                                              \
> > +             },                                                             \
> > +             .name = stat,                                                  \
>
> Here you can use
>
>         type | BUILD_BUG_ON_ZERO(type & ~KVM_STATS_TYPE_MASK) |
>         unit | BUILD_BUG_ON_ZERO(unit & ~KVM_STATS_UNIT_MASK) |
>         scale | BUILD_BUG_ON_ZERO(scale & ~KVM_STATS_SCALE_MASK) |
>
> to get a little bit of type checking.
Sure, will do.
>
> Paolo
>

Thanks,
Jing

^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: [PATCH v7 0/4] KVM statistics data fd-based binary interface
  2021-06-10 16:46 ` [PATCH v7 0/4] KVM statistics data fd-based " Paolo Bonzini
@ 2021-06-10 17:30   ` Jing Zhang
  0 siblings, 0 replies; 32+ messages in thread
From: Jing Zhang @ 2021-06-10 17:30 UTC (permalink / raw)
  To: Paolo Bonzini
  Cc: KVM, KVMARM, LinuxMIPS, KVMPPC, LinuxS390, Linuxkselftest,
	Marc Zyngier, James Morse, Julien Thierry, Suzuki K Poulose,
	Will Deacon, Huacai Chen, Aleksandar Markovic,
	Thomas Bogendoerfer, Paul Mackerras, Christian Borntraeger,
	Janosch Frank, David Hildenbrand, Cornelia Huck,
	Claudio Imbrenda, Sean Christopherson, Vitaly Kuznetsov,
	Jim Mattson, Peter Shier, Oliver Upton, David Rientjes,
	Emanuele Giuseppe Esposito, David Matlack, Ricardo Koller,
	Krish Sadhukhan

Hi Paolo,

On Thu, Jun 10, 2021 at 11:46 AM Paolo Bonzini <pbonzini@redhat.com> wrote:
>
> On 03/06/21 23:14, Jing Zhang wrote:
> > This patchset provides a file descriptor for every VM and VCPU to read
> > KVM statistics data in binary format.
> > It is meant to provide a lightweight, flexible, scalable and efficient
> > lock-free solution for user space telemetry applications to pull the
> > statistics data periodically for large scale systems. The pulling
> > frequency could be as high as a few times per second.
> > In this patchset, every statistics data are treated to have some
> > attributes as below:
> >    * architecture dependent or generic
> >    * VM statistics data or VCPU statistics data
> >    * type: cumulative, instantaneous,
> >    * unit: none for simple counter, nanosecond, microsecond,
> >      millisecond, second, Byte, KiByte, MiByte, GiByte. Clock Cycles
> > Since no lock/synchronization is used, the consistency between all
> > the statistics data is not guaranteed. That means not all statistics
> > data are read out at the exact same time, since the statistics date
> > are still being updated by KVM subsystems while they are read out.
> >
> > ---
> >
> > * v6 -> v7
> >    - Improve file descriptor allocation function by Krish suggestion
> >    - Use "generic stats" instead of "common stats" as Krish suggested
> >    - Addressed some other nits from Krish and David Matlack
> >
> > * v5 -> v6
> >    - Use designated initializers for STATS_DESC
> >    - Change KVM_STATS_SCALE... to KVM_STATS_BASE...
> >    - Use a common function for kvm_[vm|vcpu]_stats_read
> >    - Fix some documentation errors/missings
> >    - Use TEST_ASSERT in selftest
> >    - Use a common function for [vm|vcpu]_stats_test in selftest
> >
> > * v4 -> v5
> >    - Rebase to kvm/queue, commit a4345a7cecfb ("Merge tag
> >      'kvmarm-fixes-5.13-1'")
> >    - Change maximum stats name length to 48
> >    - Replace VM_STATS_COMMON/VCPU_STATS_COMMON macros with stats
> >      descriptor definition macros.
> >    - Fixed some errors/warnings reported by checkpatch.pl
> >
> > * v3 -> v4
> >    - Rebase to kvm/queue, commit 9f242010c3b4 ("KVM: avoid "deadlock"
> >      between install_new_memslots and MMU notifier")
> >    - Use C-stype comments in the whole patch
> >    - Fix wrong count for x86 VCPU stats descriptors
> >    - Fix KVM stats data size counting and validity check in selftest
> >
> > * v2 -> v3
> >    - Rebase to kvm/queue, commit edf408f5257b ("KVM: avoid "deadlock"
> >      between install_new_memslots and MMU notifier")
> >    - Resolve some nitpicks about format
> >
> > * v1 -> v2
> >    - Use ARRAY_SIZE to count the number of stats descriptors
> >    - Fix missing `size` field initialization in macro STATS_DESC
> >
> > [1] https://lore.kernel.org/kvm/20210402224359.2297157-1-jingzhangos@google.com
> > [2] https://lore.kernel.org/kvm/20210415151741.1607806-1-jingzhangos@google.com
> > [3] https://lore.kernel.org/kvm/20210423181727.596466-1-jingzhangos@google.com
> > [4] https://lore.kernel.org/kvm/20210429203740.1935629-1-jingzhangos@google.com
> > [5] https://lore.kernel.org/kvm/20210517145314.157626-1-jingzhangos@google.com
> > [6] https://lore.kernel.org/kvm/20210524151828.4113777-1-jingzhangos@google.com
> >
> > ---
> >
> > Jing Zhang (4):
> >    KVM: stats: Separate generic stats from architecture specific ones
> >    KVM: stats: Add fd-based API to read binary stats data
> >    KVM: stats: Add documentation for statistics data binary interface
> >    KVM: selftests: Add selftest for KVM statistics data binary interface
> >
> >   Documentation/virt/kvm/api.rst                | 180 +++++++++++++++
> >   arch/arm64/include/asm/kvm_host.h             |   9 +-
> >   arch/arm64/kvm/guest.c                        |  38 +++-
> >   arch/mips/include/asm/kvm_host.h              |   9 +-
> >   arch/mips/kvm/mips.c                          |  64 +++++-
> >   arch/powerpc/include/asm/kvm_host.h           |   9 +-
> >   arch/powerpc/kvm/book3s.c                     |  64 +++++-
> >   arch/powerpc/kvm/book3s_hv.c                  |  12 +-
> >   arch/powerpc/kvm/book3s_pr.c                  |   2 +-
> >   arch/powerpc/kvm/book3s_pr_papr.c             |   2 +-
> >   arch/powerpc/kvm/booke.c                      |  59 ++++-
> >   arch/s390/include/asm/kvm_host.h              |   9 +-
> >   arch/s390/kvm/kvm-s390.c                      | 129 ++++++++++-
> >   arch/x86/include/asm/kvm_host.h               |   9 +-
> >   arch/x86/kvm/x86.c                            |  67 +++++-
> >   include/linux/kvm_host.h                      | 141 +++++++++++-
> >   include/linux/kvm_types.h                     |  12 +
> >   include/uapi/linux/kvm.h                      |  50 ++++
> >   tools/testing/selftests/kvm/.gitignore        |   1 +
> >   tools/testing/selftests/kvm/Makefile          |   3 +
> >   .../testing/selftests/kvm/include/kvm_util.h  |   3 +
> >   .../selftests/kvm/kvm_binary_stats_test.c     | 215 ++++++++++++++++++
> >   tools/testing/selftests/kvm/lib/kvm_util.c    |  12 +
> >   virt/kvm/kvm_main.c                           | 169 +++++++++++++-
> >   24 files changed, 1178 insertions(+), 90 deletions(-)
> >   create mode 100644 tools/testing/selftests/kvm/kvm_binary_stats_test.c
> >
> >
> > base-commit: a4345a7cecfb91ae78cd43d26b0c6a956420761a
> >
>
> I had a few remarks, but it looks very nice overall.
>
> Thanks!
>
> Paolo
>
Happy to see per-VM stats use u64 too.
Will send out another patchset based on your remarks.

Thanks,
Jing

^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: [PATCH v7 2/4] KVM: stats: Add fd-based API to read binary stats data
  2021-06-10 16:23   ` Paolo Bonzini
  2021-06-10 17:26     ` Jing Zhang
@ 2021-06-10 22:47     ` Jing Zhang
  2021-06-11 10:32       ` Paolo Bonzini
  1 sibling, 1 reply; 32+ messages in thread
From: Jing Zhang @ 2021-06-10 22:47 UTC (permalink / raw)
  To: Paolo Bonzini
  Cc: KVM, KVMARM, LinuxMIPS, KVMPPC, LinuxS390, Linuxkselftest,
	Marc Zyngier, James Morse, Julien Thierry, Suzuki K Poulose,
	Will Deacon, Huacai Chen, Aleksandar Markovic,
	Thomas Bogendoerfer, Paul Mackerras, Christian Borntraeger,
	Janosch Frank, David Hildenbrand, Cornelia Huck,
	Claudio Imbrenda, Sean Christopherson, Vitaly Kuznetsov,
	Jim Mattson, Peter Shier, Oliver Upton, David Rientjes,
	Emanuele Giuseppe Esposito, David Matlack, Ricardo Koller,
	Krish Sadhukhan

Hi Paolo,

On Thu, Jun 10, 2021 at 11:23 AM Paolo Bonzini <pbonzini@redhat.com> wrote:
>
> On 03/06/21 23:14, Jing Zhang wrote:
> > +#define DEFINE_VM_STATS_DESC(...) {                                         \
> > +     STATS_DESC_COUNTER("remote_tlb_flush"),                                \
> > +     ## __VA_ARGS__                                                         \
> > +}
> > +
> > +#define DEFINE_VCPU_STATS_DESC(...) {                                               \
> > +     STATS_DESC_COUNTER("halt_successful_poll"),                            \
> > +     STATS_DESC_COUNTER("halt_attempted_poll"),                             \
> > +     STATS_DESC_COUNTER("halt_poll_invalid"),                               \
> > +     STATS_DESC_COUNTER("halt_wakeup"),                                     \
> > +     STATS_DESC_TIME_NSEC("halt_poll_success_ns"),                          \
> > +     STATS_DESC_TIME_NSEC("halt_poll_fail_ns"),                             \
> > +     ## __VA_ARGS__                                                         \
>
> Let's instead put this (note it's without braces) in macros like these
>
> #define KVM_GENERIC_VM_STATS()                                                  \
>         STATS_DESC_COUNTER("remote_tlb_flush"),
>
> #define KVM_GENERIC_VCPU_STATS(...)                                             \
>         STATS_DESC_COUNTER("halt_successful_poll"),                             \
>         STATS_DESC_COUNTER("halt_attempted_poll"),                              \
>         STATS_DESC_COUNTER("halt_poll_invalid"),                                \
>         STATS_DESC_COUNTER("halt_wakeup"),                                      \
>         STATS_DESC_TIME_NSEC("halt_poll_success_ns"),                           \
>         STATS_DESC_TIME_NSEC("halt_poll_fail_ns"),
>
> and it can be used in the arch files.  In fact it can even be added in patch 1 and
> switched to STATS_DESC_* here.
>
> Paolo
>
I just remember that the reason I used braces is due to following
error from checkpatch.pl:
ERROR: Macros with complex values should be enclosed in parentheses

So, just keep it as it is?

Thanks,
Jing

^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: [PATCH v7 1/4] KVM: stats: Separate generic stats from architecture specific ones
  2021-06-03 21:14 ` [PATCH v7 1/4] KVM: stats: Separate generic stats from architecture specific ones Jing Zhang
  2021-06-07 19:22   ` Krish Sadhukhan
@ 2021-06-11  6:57   ` Christian Borntraeger
  2021-06-11 10:51     ` Paolo Bonzini
  1 sibling, 1 reply; 32+ messages in thread
From: Christian Borntraeger @ 2021-06-11  6:57 UTC (permalink / raw)
  To: Jing Zhang, KVM, KVMARM, LinuxMIPS, KVMPPC, LinuxS390,
	Linuxkselftest, Paolo Bonzini, Marc Zyngier, James Morse,
	Julien Thierry, Suzuki K Poulose, Will Deacon, Huacai Chen,
	Aleksandar Markovic, Thomas Bogendoerfer, Paul Mackerras,
	Janosch Frank, David Hildenbrand, Cornelia Huck,
	Claudio Imbrenda, Sean Christopherson, Vitaly Kuznetsov,
	Jim Mattson, Peter Shier, Oliver Upton, David Rientjes,
	Emanuele Giuseppe Esposito, David Matlack, Ricardo Koller,
	Krish Sadhukhan



On 03.06.21 23:14, Jing Zhang wrote:
> Put all generic statistics in a separate structure to ease
> statistics handling for the incoming new statistics API.
> 
> No functional change intended.
> 
> Reviewed-by: David Matlack <dmatlack@google.com>
> Reviewed-by: Ricardo Koller <ricarkol@google.com>
> Signed-off-by: Jing Zhang <jingzhangos@google.com>
[...]
> diff --git a/arch/s390/include/asm/kvm_host.h b/arch/s390/include/asm/kvm_host.h
> index 8925f3969478..9b4473f76e56 100644
> --- a/arch/s390/include/asm/kvm_host.h
> +++ b/arch/s390/include/asm/kvm_host.h
[...]
> @@ -755,12 +750,12 @@ struct kvm_vcpu_arch {
>   };
>   
>   struct kvm_vm_stat {
> +	struct kvm_vm_stat_generic generic;

s390 does not have remote_tlb_flush. I guess this does not hurt?

>   	u64 inject_io;
>   	u64 inject_float_mchk;
>   	u64 inject_pfault_done;
>   	u64 inject_service_signal;
>   	u64 inject_virtio;
> -	u64 remote_tlb_flush;
>   };
[...]
> +struct kvm_vm_stat_generic {
> +	ulong remote_tlb_flush;
> +};

^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: [PATCH v7 0/4] KVM statistics data fd-based binary interface
  2021-06-03 21:14 [PATCH v7 0/4] KVM statistics data fd-based binary interface Jing Zhang
                   ` (4 preceding siblings ...)
  2021-06-10 16:46 ` [PATCH v7 0/4] KVM statistics data fd-based " Paolo Bonzini
@ 2021-06-11  6:57 ` Christian Borntraeger
  2021-06-11 11:02   ` Paolo Bonzini
  5 siblings, 1 reply; 32+ messages in thread
From: Christian Borntraeger @ 2021-06-11  6:57 UTC (permalink / raw)
  To: Jing Zhang, KVM, KVMARM, LinuxMIPS, KVMPPC, LinuxS390,
	Linuxkselftest, Paolo Bonzini, Marc Zyngier, James Morse,
	Julien Thierry, Suzuki K Poulose, Will Deacon, Huacai Chen,
	Aleksandar Markovic, Thomas Bogendoerfer, Paul Mackerras,
	Janosch Frank, David Hildenbrand, Cornelia Huck,
	Claudio Imbrenda, Sean Christopherson, Vitaly Kuznetsov,
	Jim Mattson, Peter Shier, Oliver Upton, David Rientjes,
	Emanuele Giuseppe Esposito, David Matlack, Ricardo Koller,
	Krish Sadhukhan


On 03.06.21 23:14, Jing Zhang wrote:
> This patchset provides a file descriptor for every VM and VCPU to read
> KVM statistics data in binary format.
> It is meant to provide a lightweight, flexible, scalable and efficient
> lock-free solution for user space telemetry applications to pull the
> statistics data periodically for large scale systems. The pulling
> frequency could be as high as a few times per second.
> In this patchset, every statistics data are treated to have some
> attributes as below:
>    * architecture dependent or generic
>    * VM statistics data or VCPU statistics data

Are the debugfs things good enough, or do we want to also add the same
ioctl for the /dev/kvm to get the global counters as well, e.g. for
tools like kvm_stat?



^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: [PATCH v7 2/4] KVM: stats: Add fd-based API to read binary stats data
  2021-06-10 22:47     ` Jing Zhang
@ 2021-06-11 10:32       ` Paolo Bonzini
  0 siblings, 0 replies; 32+ messages in thread
From: Paolo Bonzini @ 2021-06-11 10:32 UTC (permalink / raw)
  To: Jing Zhang
  Cc: KVM, KVMARM, LinuxMIPS, KVMPPC, LinuxS390, Linuxkselftest,
	Marc Zyngier, James Morse, Julien Thierry, Suzuki K Poulose,
	Will Deacon, Huacai Chen, Aleksandar Markovic,
	Thomas Bogendoerfer, Paul Mackerras, Christian Borntraeger,
	Janosch Frank, David Hildenbrand, Cornelia Huck,
	Claudio Imbrenda, Sean Christopherson, Vitaly Kuznetsov,
	Jim Mattson, Peter Shier, Oliver Upton, David Rientjes,
	Emanuele Giuseppe Esposito, David Matlack, Ricardo Koller,
	Krish Sadhukhan

On 11/06/21 00:47, Jing Zhang wrote:
> Hi Paolo,
> 
> On Thu, Jun 10, 2021 at 11:23 AM Paolo Bonzini <pbonzini@redhat.com> wrote:
>>
>> On 03/06/21 23:14, Jing Zhang wrote:
>>> +#define DEFINE_VM_STATS_DESC(...) {                                         \
>>> +     STATS_DESC_COUNTER("remote_tlb_flush"),                                \
>>> +     ## __VA_ARGS__                                                         \
>>> +}
>>> +
>>> +#define DEFINE_VCPU_STATS_DESC(...) {                                               \
>>> +     STATS_DESC_COUNTER("halt_successful_poll"),                            \
>>> +     STATS_DESC_COUNTER("halt_attempted_poll"),                             \
>>> +     STATS_DESC_COUNTER("halt_poll_invalid"),                               \
>>> +     STATS_DESC_COUNTER("halt_wakeup"),                                     \
>>> +     STATS_DESC_TIME_NSEC("halt_poll_success_ns"),                          \
>>> +     STATS_DESC_TIME_NSEC("halt_poll_fail_ns"),                             \
>>> +     ## __VA_ARGS__                                                         \
>>
>> Let's instead put this (note it's without braces) in macros like these
>>
>> #define KVM_GENERIC_VM_STATS()                                                  \
>>          STATS_DESC_COUNTER("remote_tlb_flush"),
>>
>> #define KVM_GENERIC_VCPU_STATS(...)                                             \
>>          STATS_DESC_COUNTER("halt_successful_poll"),                             \
>>          STATS_DESC_COUNTER("halt_attempted_poll"),                              \
>>          STATS_DESC_COUNTER("halt_poll_invalid"),                                \
>>          STATS_DESC_COUNTER("halt_wakeup"),                                      \
>>          STATS_DESC_TIME_NSEC("halt_poll_success_ns"),                           \
>>          STATS_DESC_TIME_NSEC("halt_poll_fail_ns"),
>>
>> and it can be used in the arch files.  In fact it can even be added in patch 1 and
>> switched to STATS_DESC_* here.
>>
>> Paolo
>>
> I just remember that the reason I used braces is due to following
> error from checkpatch.pl:
> ERROR: Macros with complex values should be enclosed in parentheses

No, just ignore the bogus error from checkpatch.pl.  But, thanks for 
checking!

Paolo


^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: [PATCH v7 1/4] KVM: stats: Separate generic stats from architecture specific ones
  2021-06-11  6:57   ` Christian Borntraeger
@ 2021-06-11 10:51     ` Paolo Bonzini
  2021-06-11 12:00       ` Christian Borntraeger
  0 siblings, 1 reply; 32+ messages in thread
From: Paolo Bonzini @ 2021-06-11 10:51 UTC (permalink / raw)
  To: Christian Borntraeger, Jing Zhang, KVM, KVMARM, LinuxMIPS,
	KVMPPC, LinuxS390, Linuxkselftest, Marc Zyngier, James Morse,
	Julien Thierry, Suzuki K Poulose, Will Deacon, Huacai Chen,
	Aleksandar Markovic, Thomas Bogendoerfer, Paul Mackerras,
	Janosch Frank, David Hildenbrand, Cornelia Huck,
	Claudio Imbrenda, Sean Christopherson, Vitaly Kuznetsov,
	Jim Mattson, Peter Shier, Oliver Upton, David Rientjes,
	Emanuele Giuseppe Esposito, David Matlack, Ricardo Koller,
	Krish Sadhukhan

On 11/06/21 08:57, Christian Borntraeger wrote:
>> @@ -755,12 +750,12 @@ struct kvm_vcpu_arch {
>>   };
>>   struct kvm_vm_stat {
>> +    struct kvm_vm_stat_generic generic;
> 
> s390 does not have remote_tlb_flush. I guess this does not hurt?

It would have to be accounted in gmap_flush_tlb, but there is no struct 
kvm in there.  A slightly hackish possibility would be to include the 
gmap by value (instead of by pointer) in struct kvm, and then use 
container_of.

This reminds me that I have never asked you why the gmap code is not in 
arch/s390/kvm, and also that there is no code in QEMU that uses 
KVM_VM_S390_UCONTROL or KVM_S390_VCPU_FAULT.  It would be nice to have 
some testcases for that, and also for KVM_S390_VCPU_FAULT with regular 
virtual machines... or to remove the code if it's unused.

Thanks,

Paolo


^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: [PATCH v7 0/4] KVM statistics data fd-based binary interface
  2021-06-11  6:57 ` Christian Borntraeger
@ 2021-06-11 11:02   ` Paolo Bonzini
  0 siblings, 0 replies; 32+ messages in thread
From: Paolo Bonzini @ 2021-06-11 11:02 UTC (permalink / raw)
  To: Christian Borntraeger, Jing Zhang, KVM, KVMARM, LinuxMIPS,
	KVMPPC, LinuxS390, Linuxkselftest, Marc Zyngier, James Morse,
	Julien Thierry, Suzuki K Poulose, Will Deacon, Huacai Chen,
	Aleksandar Markovic, Thomas Bogendoerfer, Paul Mackerras,
	Janosch Frank, David Hildenbrand, Cornelia Huck,
	Claudio Imbrenda, Sean Christopherson, Vitaly Kuznetsov,
	Jim Mattson, Peter Shier, Oliver Upton, David Rientjes,
	Emanuele Giuseppe Esposito, David Matlack, Ricardo Koller,
	Krish Sadhukhan

On 11/06/21 08:57, Christian Borntraeger wrote:
> 
> On 03.06.21 23:14, Jing Zhang wrote:
>> This patchset provides a file descriptor for every VM and VCPU to read
>> KVM statistics data in binary format.
>> It is meant to provide a lightweight, flexible, scalable and efficient
>> lock-free solution for user space telemetry applications to pull the
>> statistics data periodically for large scale systems. The pulling
>> frequency could be as high as a few times per second.
>> In this patchset, every statistics data are treated to have some
>> attributes as below:
>>    * architecture dependent or generic
>>    * VM statistics data or VCPU statistics data
> 
> Are the debugfs things good enough, or do we want to also add the same
> ioctl for the /dev/kvm to get the global counters as well, e.g. for
> tools like kvm_stat?

I think we should, yes.  We should also add the summarized VCPU 
statistics to the VM-level file descriptor.  However, it can be done in 
steps.

Paolo


^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: [PATCH v7 1/4] KVM: stats: Separate generic stats from architecture specific ones
  2021-06-11 10:51     ` Paolo Bonzini
@ 2021-06-11 12:00       ` Christian Borntraeger
  2021-06-11 12:03         ` Paolo Bonzini
  0 siblings, 1 reply; 32+ messages in thread
From: Christian Borntraeger @ 2021-06-11 12:00 UTC (permalink / raw)
  To: Paolo Bonzini, Jing Zhang, KVM, KVMARM, LinuxMIPS, KVMPPC,
	LinuxS390, Linuxkselftest, Marc Zyngier, James Morse,
	Julien Thierry, Suzuki K Poulose, Will Deacon, Huacai Chen,
	Aleksandar Markovic, Thomas Bogendoerfer, Paul Mackerras,
	Janosch Frank, David Hildenbrand, Cornelia Huck,
	Claudio Imbrenda, Sean Christopherson, Vitaly Kuznetsov,
	Jim Mattson, Peter Shier, Oliver Upton, David Rientjes,
	Emanuele Giuseppe Esposito, David Matlack, Ricardo Koller,
	Krish Sadhukhan



On 11.06.21 12:51, Paolo Bonzini wrote:
> On 11/06/21 08:57, Christian Borntraeger wrote:
>>> @@ -755,12 +750,12 @@ struct kvm_vcpu_arch {
>>>   };
>>>   struct kvm_vm_stat {
>>> +    struct kvm_vm_stat_generic generic;
>>
>> s390 does not have remote_tlb_flush. I guess this does not hurt?
> 
> It would have to be accounted in gmap_flush_tlb, but there is no struct kvm in there.  A slightly hackish possibility would be to include the gmap by value (instead of by pointer) in struct kvm, and then use container_of.

What is the semantics of remote_tlb_flush?
For the host:
We usually do not do remote TLB flushes in the "IPI with a flush executed on the remote CPU" sense.
We do have instructions that invalidates table entries and it will take care of remote TLBs as well (IPTE and IDTE and CRDTE).
This is nice, but on the other side an operating system MUST use these instruction if the page table might be in use by any CPU. If not, you can get a delayed access exception machine check if the hardware detect a TLB/page table incosistency.
Only if you can guarantee that nobody has this page table attached you can also use a normal store + tlb flush instruction.

For the guest (and I guess thats what we care about here?) TLB flushes are almost completely handled by hardware. There is only the set prefix instruction that is handled by KVM and this flushes the cpu local cache.
> 
> This reminds me that I have never asked you why the gmap code is not in arch/s390/kvm, 

Because we share the last level of the page tables with userspace so the KVM address space is somewhat tied to the user address space.
This is partly because Martin wanted to have control over this due to some oddities about our page tables and partly because of the rule from above. Using a IPTE of such a page table entry will take care of the TLB entries for both (user and guest) mappings in an atomic fashion when the page table changes.


and also that there is no code in QEMU that uses KVM_VM_S390_UCONTROL or KVM_S390_VCPU_FAULT.  It would be nice to have some testcases for that, and also for KVM_S390_VCPU_FAULT with regular virtual machines... or to remove the code if it's unused.

This is used by an internal firmware test tool that uses KVM to speed up simulation of hardware instructions.
Search for CECSIM to get an idea (the existing papers still talk about the same approach using z/VM).
I will check what we can do regarding regression tests.


^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: [PATCH v7 1/4] KVM: stats: Separate generic stats from architecture specific ones
  2021-06-11 12:00       ` Christian Borntraeger
@ 2021-06-11 12:03         ` Paolo Bonzini
  2021-06-11 12:08           ` Christian Borntraeger
  0 siblings, 1 reply; 32+ messages in thread
From: Paolo Bonzini @ 2021-06-11 12:03 UTC (permalink / raw)
  To: Christian Borntraeger, Jing Zhang, KVM, KVMARM, LinuxMIPS,
	KVMPPC, LinuxS390, Linuxkselftest, Marc Zyngier, James Morse,
	Julien Thierry, Suzuki K Poulose, Will Deacon, Huacai Chen,
	Aleksandar Markovic, Thomas Bogendoerfer, Paul Mackerras,
	Janosch Frank, David Hildenbrand, Cornelia Huck,
	Claudio Imbrenda, Sean Christopherson, Vitaly Kuznetsov,
	Jim Mattson, Peter Shier, Oliver Upton, David Rientjes,
	Emanuele Giuseppe Esposito, David Matlack, Ricardo Koller,
	Krish Sadhukhan

On 11/06/21 14:00, Christian Borntraeger wrote:
> What is the semantics of remote_tlb_flush?

I always interpreted it as "number of times the KVM page table 
management code needed other CPUs to learn about new page tables". 
Whether the broadcast is done in software or hardware shouldn't matter; 
either way I suppose there is still some traffic on the bus involved.

ARM is also not updating the stat, I'll send a patch for that.

Paolo

> For the host:
> We usually do not do remote TLB flushes in the "IPI with a flush 
> executed on the remote CPU" sense.
> We do have instructions that invalidates table entries and it will take 
> care of remote TLBs as well (IPTE and IDTE and CRDTE).
> This is nice, but on the other side an operating system MUST use these 
> instruction if the page table might be in use by any CPU. If not, you 
> can get a delayed access exception machine check if the hardware detect 
> a TLB/page table incosistency.
> Only if you can guarantee that nobody has this page table attached you 
> can also use a normal store + tlb flush instruction.
> 
> For the guest (and I guess thats what we care about here?) TLB flushes 
> are almost completely handled by hardware. There is only the set prefix 
> instruction that is handled by KVM and this flushes the cpu local cache.


^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: [PATCH v7 1/4] KVM: stats: Separate generic stats from architecture specific ones
  2021-06-11 12:03         ` Paolo Bonzini
@ 2021-06-11 12:08           ` Christian Borntraeger
  2021-06-11 12:14             ` Paolo Bonzini
  0 siblings, 1 reply; 32+ messages in thread
From: Christian Borntraeger @ 2021-06-11 12:08 UTC (permalink / raw)
  To: Paolo Bonzini, Jing Zhang, KVM, KVMARM, LinuxMIPS, KVMPPC,
	LinuxS390, Linuxkselftest, Marc Zyngier, James Morse,
	Julien Thierry, Suzuki K Poulose, Will Deacon, Huacai Chen,
	Aleksandar Markovic, Thomas Bogendoerfer, Paul Mackerras,
	Janosch Frank, David Hildenbrand, Cornelia Huck,
	Claudio Imbrenda, Sean Christopherson, Vitaly Kuznetsov,
	Jim Mattson, Peter Shier, Oliver Upton, David Rientjes,
	Emanuele Giuseppe Esposito, David Matlack, Ricardo Koller,
	Krish Sadhukhan



On 11.06.21 14:03, Paolo Bonzini wrote:
> On 11/06/21 14:00, Christian Borntraeger wrote:
>> What is the semantics of remote_tlb_flush?
> 
> I always interpreted it as "number of times the KVM page table management code needed other CPUs to learn about new page tables". Whether the broadcast is done in software or hardware shouldn't matter; either way I suppose there is still some traffic on the bus involved.


My point is that KVM page table management on s390x completely piggy-backs on the qemu address space page table management from common code for the last level.
And due to the way we handle page tables we also do not teach "other CPUs". We always teach the whole system with things like IPTE.

> 
> ARM is also not updating the stat, I'll send a patch for that.
> 
> Paolo
> 
>> For the host:
>> We usually do not do remote TLB flushes in the "IPI with a flush executed on the remote CPU" sense.
>> We do have instructions that invalidates table entries and it will take care of remote TLBs as well (IPTE and IDTE and CRDTE).
>> This is nice, but on the other side an operating system MUST use these instruction if the page table might be in use by any CPU. If not, you can get a delayed access exception machine check if the hardware detect a TLB/page table incosistency.
>> Only if you can guarantee that nobody has this page table attached you can also use a normal store + tlb flush instruction.
>>
>> For the guest (and I guess thats what we care about here?) TLB flushes are almost completely handled by hardware. There is only the set prefix instruction that is handled by KVM and this flushes the cpu local cache.
> 

^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: [PATCH v7 1/4] KVM: stats: Separate generic stats from architecture specific ones
  2021-06-11 12:08           ` Christian Borntraeger
@ 2021-06-11 12:14             ` Paolo Bonzini
  0 siblings, 0 replies; 32+ messages in thread
From: Paolo Bonzini @ 2021-06-11 12:14 UTC (permalink / raw)
  To: Christian Borntraeger, Jing Zhang, KVM, KVMARM, LinuxMIPS,
	KVMPPC, LinuxS390, Linuxkselftest, Marc Zyngier, James Morse,
	Julien Thierry, Suzuki K Poulose, Will Deacon, Huacai Chen,
	Aleksandar Markovic, Thomas Bogendoerfer, Paul Mackerras,
	Janosch Frank, David Hildenbrand, Cornelia Huck,
	Claudio Imbrenda, Sean Christopherson, Vitaly Kuznetsov,
	Jim Mattson, Peter Shier, Oliver Upton, David Rientjes,
	Emanuele Giuseppe Esposito, David Matlack, Ricardo Koller,
	Krish Sadhukhan

On 11/06/21 14:08, Christian Borntraeger wrote:
>>
>> I always interpreted it as "number of times the KVM page table 
>> management code needed other CPUs to learn about new page tables". 
>> Whether the broadcast is done in software or hardware shouldn't 
>> matter; either way I suppose there is still some traffic on the bus 
>> involved.
> 
> 
> My point is that KVM page table management on s390x completely 
> piggy-backs on the qemu address space page table management from common 
> code for the last level.
> And due to the way we handle page tables we also do not teach "other 
> CPUs". We always teach the whole system with things like IPTE.

But that just means that you'll have fewer KVM-exclusive and thus nicer 
numbers than x86 or ARM. :)  It still makes sense to count 
gmap_flush_tlb calls.

Paolo


^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: [PATCH v7 2/4] KVM: stats: Add fd-based API to read binary stats data
  2021-06-03 21:14 ` [PATCH v7 2/4] KVM: stats: Add fd-based API to read binary stats data Jing Zhang
                     ` (2 preceding siblings ...)
  2021-06-10 16:42   ` Paolo Bonzini
@ 2021-06-13 15:19   ` Fuad Tabba
  2021-06-13 16:42     ` Jing Zhang
  3 siblings, 1 reply; 32+ messages in thread
From: Fuad Tabba @ 2021-06-13 15:19 UTC (permalink / raw)
  To: Jing Zhang
  Cc: KVM, KVMARM, LinuxMIPS, KVMPPC, LinuxS390, Linuxkselftest,
	Paolo Bonzini, Marc Zyngier, James Morse, Julien Thierry,
	Suzuki K Poulose, Will Deacon, Huacai Chen, Aleksandar Markovic,
	Thomas Bogendoerfer, Paul Mackerras, Christian Borntraeger,
	Janosch Frank, David Hildenbrand, Cornelia Huck,
	Claudio Imbrenda, Sean Christopherson, Vitaly Kuznetsov,
	Jim Mattson, Peter Shier, Oliver Upton, David Rientjes,
	Emanuele Giuseppe Esposito, David Matlack, Ricardo Koller,
	Krish Sadhukhan

Hi Jing,

> diff --git a/arch/mips/kvm/mips.c b/arch/mips/kvm/mips.c
> index ff205b35719b..eb8fd4a96952 100644
> --- a/arch/mips/kvm/mips.c
> +++ b/arch/mips/kvm/mips.c
> @@ -38,6 +38,58 @@
>  #define VECTORSPACING 0x100    /* for EI/VI mode */
>  #endif
>
> +struct _kvm_stats_desc kvm_vm_stats_desc[] = DEFINE_VM_STATS_DESC();
> +
> +struct _kvm_stats_header kvm_vm_stats_header = {
> +       .name_size = KVM_STATS_NAME_LEN,
> +       .count = ARRAY_SIZE(kvm_vm_stats_desc),
> +       .desc_offset = sizeof(struct kvm_stats_header),
> +       .data_offset = sizeof(struct kvm_stats_header) +
> +               sizeof(kvm_vm_stats_desc),
> +};
> +
> +struct _kvm_stats_desc kvm_vcpu_stats_desc[] = DEFINE_VCPU_STATS_DESC(
> +       STATS_DESC_COUNTER("wait_exits"),
> +       STATS_DESC_COUNTER("cache_exits"),
> +       STATS_DESC_COUNTER("signal_exits"),
> +       STATS_DESC_COUNTER("int_exits"),
> +       STATS_DESC_COUNTER("cop_unusable_exits"),
> +       STATS_DESC_COUNTER("tlbmod_exits"),
> +       STATS_DESC_COUNTER("tlbmiss_ld_exits"),
> +       STATS_DESC_COUNTER("tlbmiss_st_exits"),
> +       STATS_DESC_COUNTER("addrerr_st_exits"),
> +       STATS_DESC_COUNTER("addrerr_ld_exits"),
> +       STATS_DESC_COUNTER("syscall_exits"),
> +       STATS_DESC_COUNTER("resvd_inst_exits"),
> +       STATS_DESC_COUNTER("break_inst_exits"),
> +       STATS_DESC_COUNTER("trap_inst_exits"),
> +       STATS_DESC_COUNTER("msa_fpe_exits"),
> +       STATS_DESC_COUNTER("fpe_exits"),
> +       STATS_DESC_COUNTER("msa_disabled_exits"),
> +       STATS_DESC_COUNTER("flush_dcache_exits"),
> +#ifdef CONFIG_KVM_MIPS_VZ
> +       STATS_DESC_COUNTER("vz_gpsi_exits"),
> +       STATS_DESC_COUNTER("vz_gsfc_exits"),
> +       STATS_DESC_COUNTER("vz_hc_exits"),
> +       STATS_DESC_COUNTER("vz_grr_exits"),
> +       STATS_DESC_COUNTER("vz_gva_exits"),
> +       STATS_DESC_COUNTER("vz_ghfc_exits"),
> +       STATS_DESC_COUNTER("vz_gpa_exits"),
> +       STATS_DESC_COUNTER("vz_resvd_exits"),
> +#ifdef CONFIG_CPU_LOONGSON64
> +       STATS_DESC_COUNTER("vz_cpucfg_exits"),
> +#endif
> +#endif
> +       );

I'm not sure if this is because I've applied your patches on a
different base than the one you're using (I'm using 5.13-rc6), but it
seems that CONFIG_KVM_MIPS_VZ isn't there anymore. Moreover, looking
at your earlier patch (v7 1/4), although truncated, it doesn't seem
that your version of arch/mips/include/asm/kvm_host.h has that config
either. Again, could be a user-error on my part, but might be worth
checking.

> diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c
> index 96d10253218a..8e900c482626 100644
> --- a/arch/x86/kvm/x86.c
> +++ b/arch/x86/kvm/x86.c
> @@ -214,6 +214,59 @@ EXPORT_SYMBOL_GPL(host_xss);
>  u64 __read_mostly supported_xss;
>  EXPORT_SYMBOL_GPL(supported_xss);
>
> +struct _kvm_stats_desc kvm_vm_stats_desc[] = DEFINE_VM_STATS_DESC(
> +       STATS_DESC_COUNTER("mmu_shadow_zapped"),
> +       STATS_DESC_COUNTER("mmu_pte_write"),
> +       STATS_DESC_COUNTER("mmu_pde_zapped"),
> +       STATS_DESC_COUNTER("mmu_flooded"),
> +       STATS_DESC_COUNTER("mmu_recycled"),
> +       STATS_DESC_COUNTER("mmu_cache_miss"),
> +       STATS_DESC_ICOUNTER("mmu_unsync"),
> +       STATS_DESC_ICOUNTER("largepages"),
> +       STATS_DESC_ICOUNTER("nx_largepages_splits"),
> +       STATS_DESC_ICOUNTER("max_mmu_page_hash_collisions"));
> +
> +struct _kvm_stats_header kvm_vm_stats_header = {
> +       .name_size = KVM_STATS_NAME_LEN,
> +       .count = ARRAY_SIZE(kvm_vm_stats_desc),
> +       .desc_offset = sizeof(struct kvm_stats_header),
> +       .data_offset = sizeof(struct kvm_stats_header) +
> +               sizeof(kvm_vm_stats_desc),
> +};
> +
> +struct _kvm_stats_desc kvm_vcpu_stats_desc[] = DEFINE_VCPU_STATS_DESC(
> +       STATS_DESC_COUNTER("pf_fixed"),
> +       STATS_DESC_COUNTER("pf_guest"),
> +       STATS_DESC_COUNTER("tlb_flush"),
> +       STATS_DESC_COUNTER("invlpg"),
> +       STATS_DESC_COUNTER("exits"),
> +       STATS_DESC_COUNTER("io_exits"),
> +       STATS_DESC_COUNTER("mmio_exits"),
> +       STATS_DESC_COUNTER("signal_exits"),
> +       STATS_DESC_COUNTER("irq_window_exits"),
> +       STATS_DESC_COUNTER("nmi_window_exits"),
> +       STATS_DESC_COUNTER("l1d_flush"),
> +       STATS_DESC_COUNTER("halt_exits"),
> +       STATS_DESC_COUNTER("request_irq_exits"),
> +       STATS_DESC_COUNTER("irq_exits"),
> +       STATS_DESC_COUNTER("host_state_reload"),
> +       STATS_DESC_COUNTER("fpu_reload"),
> +       STATS_DESC_COUNTER("insn_emulation"),
> +       STATS_DESC_COUNTER("insn_emulation_fail"),
> +       STATS_DESC_COUNTER("hypercalls"),
> +       STATS_DESC_COUNTER("irq_injections"),
> +       STATS_DESC_COUNTER("nmi_injections"),
> +       STATS_DESC_COUNTER("req_event"),
> +       STATS_DESC_COUNTER("nested_run"));

Have you deliberately left out directed_yield_attempted and
directed_yield_successful (from struct kvm_vcpu_stat in
arch/x86/include/asm/kvm_host.h), or should they be here as well?

Thanks,
/fuad

> +struct _kvm_stats_header kvm_vcpu_stats_header = {
> +       .name_size = KVM_STATS_NAME_LEN,
> +       .count = ARRAY_SIZE(kvm_vcpu_stats_desc),
> +       .desc_offset = sizeof(struct kvm_stats_header),
> +       .data_offset = sizeof(struct kvm_stats_header) +
> +               sizeof(kvm_vcpu_stats_desc),
> +};
> +
>  struct kvm_stats_debugfs_item debugfs_entries[] = {
>         VCPU_STAT("pf_fixed", pf_fixed),
>         VCPU_STAT("pf_guest", pf_guest),
> diff --git a/include/linux/kvm_host.h b/include/linux/kvm_host.h
> index 1870fa928762..873e21af36be 100644
> --- a/include/linux/kvm_host.h
> +++ b/include/linux/kvm_host.h
> @@ -1240,6 +1240,19 @@ struct kvm_stats_debugfs_item {
>         int mode;
>  };
>
> +struct _kvm_stats_header {
> +       __u32 name_size;
> +       __u32 count;
> +       __u32 desc_offset;
> +       __u32 data_offset;
> +};
> +
> +#define KVM_STATS_NAME_LEN     48
> +struct _kvm_stats_desc {
> +       struct kvm_stats_desc desc;
> +       char name[KVM_STATS_NAME_LEN];
> +};
> +
>  #define KVM_DBGFS_GET_MODE(dbgfs_item)                                         \
>         ((dbgfs_item)->mode ? (dbgfs_item)->mode : 0644)
>
> @@ -1253,8 +1266,127 @@ struct kvm_stats_debugfs_item {
>         { n, offsetof(struct kvm_vcpu, stat.generic.x),                        \
>           KVM_STAT_VCPU, ## __VA_ARGS__ }
>
> +#define STATS_DESC(stat, type, unit, scale, exp)                              \
> +       {                                                                      \
> +               {                                                              \
> +                       .flags = type | unit | scale,                          \
> +                       .exponent = exp,                                       \
> +                       .size = 1                                              \
> +               },                                                             \
> +               .name = stat,                                                  \
> +       }
> +#define STATS_DESC_CUMULATIVE(name, unit, scale, exponent)                    \
> +       STATS_DESC(name, KVM_STATS_TYPE_CUMULATIVE, unit, scale, exponent)
> +#define STATS_DESC_INSTANT(name, unit, scale, exponent)                               \
> +       STATS_DESC(name, KVM_STATS_TYPE_INSTANT, unit, scale, exponent)
> +
> +/* Cumulative counter */
> +#define STATS_DESC_COUNTER(name)                                              \
> +       STATS_DESC_CUMULATIVE(name, KVM_STATS_UNIT_NONE,                       \
> +               KVM_STATS_BASE_POW10, 0)
> +/* Instantaneous counter */
> +#define STATS_DESC_ICOUNTER(name)                                             \
> +       STATS_DESC_INSTANT(name, KVM_STATS_UNIT_NONE,                          \
> +               KVM_STATS_BASE_POW10, 0)
> +
> +/* Cumulative clock cycles */
> +#define STATS_DESC_CYCLE(name)                                                \
> +       STATS_DESC_CUMULATIVE(name, KVM_STATS_UNIT_CYCLES,                     \
> +               KVM_STATS_BASE_POW10, 0)
> +/* Instantaneous clock cycles */
> +#define STATS_DESC_ICYCLE(name)                                                       \
> +       STATS_DESC_INSTANT(name, KVM_STATS_UNIT_CYCLES,                        \
> +               KVM_STATS_BASE_POW10, 0)
> +
> +/* Cumulative memory size in Byte */
> +#define STATS_DESC_SIZE_BYTE(name)                                            \
> +       STATS_DESC_CUMULATIVE(name, KVM_STATS_UNIT_BYTES,                      \
> +               KVM_STATS_BASE_POW2, 0)
> +/* Cumulative memory size in KiByte */
> +#define STATS_DESC_SIZE_KBYTE(name)                                           \
> +       STATS_DESC_CUMULATIVE(name, KVM_STATS_UNIT_BYTES,                      \
> +               KVM_STATS_BASE_POW2, 10)
> +/* Cumulative memory size in MiByte */
> +#define STATS_DESC_SIZE_MBYTE(name)                                           \
> +       STATS_DESC_CUMULATIVE(name, KVM_STATS_UNIT_BYTES,                      \
> +               KVM_STATS_BASE_POW2, 20)
> +/* Cumulative memory size in GiByte */
> +#define STATS_DESC_SIZE_GBYTE(name)                                           \
> +       STATS_DESC_CUMULATIVE(name, KVM_STATS_UNIT_BYTES,                      \
> +               KVM_STATS_BASE_POW2, 30)
> +
> +/* Instantaneous memory size in Byte */
> +#define STATS_DESC_ISIZE_BYTE(name)                                           \
> +       STATS_DESC_INSTANT(name, KVM_STATS_UNIT_BYTES,                         \
> +               KVM_STATS_BASE_POW2, 0)
> +/* Instantaneous memory size in KiByte */
> +#define STATS_DESC_ISIZE_KBYTE(name)                                          \
> +       STATS_DESC_INSTANT(name, KVM_STATS_UNIT_BYTES,                         \
> +               KVM_STATS_BASE_POW2, 10)
> +/* Instantaneous memory size in MiByte */
> +#define STATS_DESC_ISIZE_MBYTE(name)                                          \
> +       STATS_DESC_INSTANT(name, KVM_STATS_UNIT_BYTES,                         \
> +               KVM_STATS_BASE_POW2, 20)
> +/* Instantaneous memory size in GiByte */
> +#define STATS_DESC_ISIZE_GBYTE(name)                                          \
> +       STATS_DESC_INSTANT(name, KVM_STATS_UNIT_BYTES,                         \
> +               KVM_STATS_BASE_POW2, 30)
> +
> +/* Cumulative time in second */
> +#define STATS_DESC_TIME_SEC(name)                                             \
> +       STATS_DESC_CUMULATIVE(name, KVM_STATS_UNIT_SECONDS,                    \
> +               KVM_STATS_BASE_POW10, 0)
> +/* Cumulative time in millisecond */
> +#define STATS_DESC_TIME_MSEC(name)                                            \
> +       STATS_DESC_CUMULATIVE(name, KVM_STATS_UNIT_SECONDS,                    \
> +               KVM_STATS_BASE_POW10, -3)
> +/* Cumulative time in microsecond */
> +#define STATS_DESC_TIME_USEC(name)                                            \
> +       STATS_DESC_CUMULATIVE(name, KVM_STATS_UNIT_SECONDS,                    \
> +               KVM_STATS_BASE_POW10, -6)
> +/* Cumulative time in nanosecond */
> +#define STATS_DESC_TIME_NSEC(name)                                            \
> +       STATS_DESC_CUMULATIVE(name, KVM_STATS_UNIT_SECONDS,                    \
> +               KVM_STATS_BASE_POW10, -9)
> +
> +/* Instantaneous time in second */
> +#define STATS_DESC_ITIME_SEC(name)                                            \
> +       STATS_DESC_INSTANT(name, KVM_STATS_UNIT_SECONDS,                       \
> +               KVM_STATS_BASE_POW10, 0)
> +/* Instantaneous time in millisecond */
> +#define STATS_DESC_ITIME_MSEC(name)                                           \
> +       STATS_DESC_INSTANT(name, KVM_STATS_UNIT_SECONDS,                       \
> +               KVM_STATS_BASE_POW10, -3)
> +/* Instantaneous time in microsecond */
> +#define STATS_DESC_ITIME_USEC(name)                                           \
> +       STATS_DESC_INSTANT(name, KVM_STATS_UNIT_SECONDS,                       \
> +               KVM_STATS_BASE_POW10, -6)
> +/* Instantaneous time in nanosecond */
> +#define STATS_DESC_ITIME_NSEC(name)                                           \
> +       STATS_DESC_INSTANT(name, KVM_STATS_UNIT_SECONDS,                       \
> +               KVM_STATS_BASE_POW10, -9)
> +
> +#define DEFINE_VM_STATS_DESC(...) {                                           \
> +       STATS_DESC_COUNTER("remote_tlb_flush"),                                \
> +       ## __VA_ARGS__                                                         \
> +}
> +
> +#define DEFINE_VCPU_STATS_DESC(...) {                                         \
> +       STATS_DESC_COUNTER("halt_successful_poll"),                            \
> +       STATS_DESC_COUNTER("halt_attempted_poll"),                             \
> +       STATS_DESC_COUNTER("halt_poll_invalid"),                               \
> +       STATS_DESC_COUNTER("halt_wakeup"),                                     \
> +       STATS_DESC_TIME_NSEC("halt_poll_success_ns"),                          \
> +       STATS_DESC_TIME_NSEC("halt_poll_fail_ns"),                             \
> +       ## __VA_ARGS__                                                         \
> +}
> +
>  extern struct kvm_stats_debugfs_item debugfs_entries[];
>  extern struct dentry *kvm_debugfs_dir;
> +extern struct _kvm_stats_header kvm_vm_stats_header;
> +extern struct _kvm_stats_header kvm_vcpu_stats_header;
> +extern struct _kvm_stats_desc kvm_vm_stats_desc[];
> +extern struct _kvm_stats_desc kvm_vcpu_stats_desc[];
>
>  #if defined(CONFIG_MMU_NOTIFIER) && defined(KVM_ARCH_WANT_MMU_NOTIFIER)
>  static inline int mmu_notifier_retry(struct kvm *kvm, unsigned long mmu_seq)
> diff --git a/include/uapi/linux/kvm.h b/include/uapi/linux/kvm.h
> index 3fd9a7e9d90c..dcfa0315e3f9 100644
> --- a/include/uapi/linux/kvm.h
> +++ b/include/uapi/linux/kvm.h
> @@ -1082,6 +1082,7 @@ struct kvm_ppc_resize_hpt {
>  #define KVM_CAP_SGX_ATTRIBUTE 196
>  #define KVM_CAP_VM_COPY_ENC_CONTEXT_FROM 197
>  #define KVM_CAP_PTP_KVM 198
> +#define KVM_CAP_STATS_BINARY_FD 199
>
>  #ifdef KVM_CAP_IRQ_ROUTING
>
> @@ -1898,4 +1899,53 @@ struct kvm_dirty_gfn {
>  #define KVM_BUS_LOCK_DETECTION_OFF             (1 << 0)
>  #define KVM_BUS_LOCK_DETECTION_EXIT            (1 << 1)
>
> +#define KVM_STATS_ID_MAXLEN            64
> +
> +struct kvm_stats_header {
> +       char id[KVM_STATS_ID_MAXLEN];
> +       __u32 name_size;
> +       __u32 count;
> +       __u32 desc_offset;
> +       __u32 data_offset;
> +};
> +
> +#define KVM_STATS_TYPE_SHIFT           0
> +#define KVM_STATS_TYPE_MASK            (0xF << KVM_STATS_TYPE_SHIFT)
> +#define KVM_STATS_TYPE_CUMULATIVE      (0x0 << KVM_STATS_TYPE_SHIFT)
> +#define KVM_STATS_TYPE_INSTANT         (0x1 << KVM_STATS_TYPE_SHIFT)
> +#define KVM_STATS_TYPE_MAX             KVM_STATS_TYPE_INSTANT
> +
> +#define KVM_STATS_UNIT_SHIFT           4
> +#define KVM_STATS_UNIT_MASK            (0xF << KVM_STATS_UNIT_SHIFT)
> +#define KVM_STATS_UNIT_NONE            (0x0 << KVM_STATS_UNIT_SHIFT)
> +#define KVM_STATS_UNIT_BYTES           (0x1 << KVM_STATS_UNIT_SHIFT)
> +#define KVM_STATS_UNIT_SECONDS         (0x2 << KVM_STATS_UNIT_SHIFT)
> +#define KVM_STATS_UNIT_CYCLES          (0x3 << KVM_STATS_UNIT_SHIFT)
> +#define KVM_STATS_UNIT_MAX             KVM_STATS_UNIT_CYCLES
> +
> +#define KVM_STATS_BASE_SHIFT           8
> +#define KVM_STATS_BASE_MASK            (0xF << KVM_STATS_BASE_SHIFT)
> +#define KVM_STATS_BASE_POW10           (0x0 << KVM_STATS_BASE_SHIFT)
> +#define KVM_STATS_BASE_POW2            (0x1 << KVM_STATS_BASE_SHIFT)
> +#define KVM_STATS_BASE_MAX             KVM_STATS_BASE_POW2
> +
> +struct kvm_stats_desc {
> +       __u32 flags;
> +       __s16 exponent;
> +       __u16 size;
> +       __u32 unused1;
> +       __u32 unused2;
> +       char name[0];
> +};
> +
> +struct kvm_vm_stats_data {
> +       unsigned long value[0];
> +};
> +
> +struct kvm_vcpu_stats_data {
> +       __u64 value[0];
> +};
> +
> +#define KVM_GET_STATS_FD  _IOR(KVMIO,  0xcc, struct kvm_stats_header)
> +
>  #endif /* __LINUX_KVM_H */
> diff --git a/virt/kvm/kvm_main.c b/virt/kvm/kvm_main.c
> index f6ad5b080994..d84bb17bdea8 100644
> --- a/virt/kvm/kvm_main.c
> +++ b/virt/kvm/kvm_main.c
> @@ -3409,6 +3409,115 @@ static int kvm_vcpu_ioctl_set_sigmask(struct kvm_vcpu *vcpu, sigset_t *sigset)
>         return 0;
>  }
>
> +static ssize_t kvm_stats_read(char *id, struct _kvm_stats_header *header,
> +               struct _kvm_stats_desc *desc, void *stats, size_t size_stats,
> +               char __user *user_buffer, size_t size, loff_t *offset)
> +{
> +       ssize_t copylen, len, remain = size;
> +       size_t size_header, size_desc;
> +       loff_t pos = *offset;
> +       char __user *dest = user_buffer;
> +       void *src;
> +
> +       size_header = sizeof(*header);
> +       size_desc = header->count * sizeof(*desc);
> +
> +       len = KVM_STATS_ID_MAXLEN + size_header + size_desc + size_stats - pos;
> +       len = min(len, remain);
> +       if (len <= 0)
> +               return 0;
> +       remain = len;
> +
> +       /* Copy kvm stats header id string */
> +       copylen = KVM_STATS_ID_MAXLEN - pos;
> +       copylen = min(copylen, remain);
> +       if (copylen > 0) {
> +               src = id + pos;
> +               if (copy_to_user(dest, src, copylen))
> +                       return -EFAULT;
> +               remain -= copylen;
> +               pos += copylen;
> +               dest += copylen;
> +       }
> +       /* Copy kvm stats header */
> +       copylen = KVM_STATS_ID_MAXLEN + size_header - pos;
> +       copylen = min(copylen, remain);
> +       if (copylen > 0) {
> +               src = header + pos - KVM_STATS_ID_MAXLEN;
> +               if (copy_to_user(dest, src, copylen))
> +                       return -EFAULT;
> +               remain -= copylen;
> +               pos += copylen;
> +               dest += copylen;
> +       }
> +       /* Copy kvm stats descriptors */
> +       copylen = header->desc_offset + size_desc - pos;
> +       copylen = min(copylen, remain);
> +       if (copylen > 0) {
> +               src = desc + pos - header->desc_offset;
> +               if (copy_to_user(dest, src, copylen))
> +                       return -EFAULT;
> +               remain -= copylen;
> +               pos += copylen;
> +               dest += copylen;
> +       }
> +       /* Copy kvm stats values */
> +       copylen = header->data_offset + size_stats - pos;
> +       copylen = min(copylen, remain);
> +       if (copylen > 0) {
> +               src = stats + pos - header->data_offset;
> +               if (copy_to_user(dest, src, copylen))
> +                       return -EFAULT;
> +               remain -= copylen;
> +               pos += copylen;
> +               dest += copylen;
> +       }
> +
> +       *offset = pos;
> +       return len;
> +}
> +
> +static ssize_t kvm_vcpu_stats_read(struct file *file, char __user *user_buffer,
> +                             size_t size, loff_t *offset)
> +{
> +       char id[KVM_STATS_ID_MAXLEN];
> +       struct kvm_vcpu *vcpu = file->private_data;
> +
> +       snprintf(id, sizeof(id), "kvm-%d/vcpu-%d",
> +                       task_pid_nr(current), vcpu->vcpu_id);
> +       return kvm_stats_read(id, &kvm_vcpu_stats_header,
> +                       &kvm_vcpu_stats_desc[0], &vcpu->stat,
> +                       sizeof(vcpu->stat), user_buffer, size, offset);
> +}
> +
> +static const struct file_operations kvm_vcpu_stats_fops = {
> +       .read = kvm_vcpu_stats_read,
> +       .llseek = noop_llseek,
> +};
> +
> +static int kvm_vcpu_ioctl_get_stats_fd(struct kvm_vcpu *vcpu)
> +{
> +       int fd;
> +       struct file *file;
> +       char name[15 + ITOA_MAX_LEN + 1];
> +
> +       snprintf(name, sizeof(name), "kvm-vcpu-stats:%d", vcpu->vcpu_id);
> +
> +       fd = get_unused_fd_flags(O_CLOEXEC);
> +       if (fd < 0)
> +               return fd;
> +
> +       file = anon_inode_getfile(name, &kvm_vcpu_stats_fops, vcpu, O_RDONLY);
> +       if (IS_ERR(file)) {
> +               put_unused_fd(fd);
> +               return PTR_ERR(file);
> +       }
> +       file->f_mode |= FMODE_PREAD;
> +       fd_install(fd, file);
> +
> +       return fd;
> +}
> +
>  static long kvm_vcpu_ioctl(struct file *filp,
>                            unsigned int ioctl, unsigned long arg)
>  {
> @@ -3606,6 +3715,10 @@ static long kvm_vcpu_ioctl(struct file *filp,
>                 r = kvm_arch_vcpu_ioctl_set_fpu(vcpu, fpu);
>                 break;
>         }
> +       case KVM_GET_STATS_FD: {
> +               r = kvm_vcpu_ioctl_get_stats_fd(vcpu);
> +               break;
> +       }
>         default:
>                 r = kvm_arch_vcpu_ioctl(filp, ioctl, arg);
>         }
> @@ -3864,6 +3977,8 @@ static long kvm_vm_ioctl_check_extension_generic(struct kvm *kvm, long arg)
>  #else
>                 return 0;
>  #endif
> +       case KVM_CAP_STATS_BINARY_FD:
> +               return 1;
>         default:
>                 break;
>         }
> @@ -3967,6 +4082,43 @@ static int kvm_vm_ioctl_enable_cap_generic(struct kvm *kvm,
>         }
>  }
>
> +static ssize_t kvm_vm_stats_read(struct file *file, char __user *user_buffer,
> +                             size_t size, loff_t *offset)
> +{
> +       char id[KVM_STATS_ID_MAXLEN];
> +       struct kvm *kvm = file->private_data;
> +
> +       snprintf(id, sizeof(id), "kvm-%d", task_pid_nr(current));
> +       return kvm_stats_read(id, &kvm_vm_stats_header, &kvm_vm_stats_desc[0],
> +               &kvm->stat, sizeof(kvm->stat), user_buffer, size, offset);
> +}
> +
> +static const struct file_operations kvm_vm_stats_fops = {
> +       .read = kvm_vm_stats_read,
> +       .llseek = noop_llseek,
> +};
> +
> +static int kvm_vm_ioctl_get_stats_fd(struct kvm *kvm)
> +{
> +       int fd;
> +       struct file *file;
> +
> +       fd = get_unused_fd_flags(O_CLOEXEC);
> +       if (fd < 0)
> +               return fd;
> +
> +       file = anon_inode_getfile("kvm-vm-stats",
> +                       &kvm_vm_stats_fops, kvm, O_RDONLY);
> +       if (IS_ERR(file)) {
> +               put_unused_fd(fd);
> +               return PTR_ERR(file);
> +       }
> +       file->f_mode |= FMODE_PREAD;
> +       fd_install(fd, file);
> +
> +       return fd;
> +}
> +
>  static long kvm_vm_ioctl(struct file *filp,
>                            unsigned int ioctl, unsigned long arg)
>  {
> @@ -4149,6 +4301,9 @@ static long kvm_vm_ioctl(struct file *filp,
>         case KVM_RESET_DIRTY_RINGS:
>                 r = kvm_vm_ioctl_reset_dirty_pages(kvm);
>                 break;
> +       case KVM_GET_STATS_FD:
> +               r = kvm_vm_ioctl_get_stats_fd(kvm);
> +               break;
>         default:
>                 r = kvm_arch_vm_ioctl(filp, ioctl, arg);
>         }
> --
> 2.32.0.rc1.229.g3e70b5a671-goog
>
> _______________________________________________
> kvmarm mailing list
> kvmarm@lists.cs.columbia.edu
> https://lists.cs.columbia.edu/mailman/listinfo/kvmarm

^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: [PATCH v7 2/4] KVM: stats: Add fd-based API to read binary stats data
  2021-06-13 15:19   ` Fuad Tabba
@ 2021-06-13 16:42     ` Jing Zhang
  0 siblings, 0 replies; 32+ messages in thread
From: Jing Zhang @ 2021-06-13 16:42 UTC (permalink / raw)
  To: Fuad Tabba
  Cc: KVM, KVMARM, LinuxMIPS, KVMPPC, LinuxS390, Linuxkselftest,
	Paolo Bonzini, Marc Zyngier, James Morse, Julien Thierry,
	Suzuki K Poulose, Will Deacon, Huacai Chen, Aleksandar Markovic,
	Thomas Bogendoerfer, Paul Mackerras, Christian Borntraeger,
	Janosch Frank, David Hildenbrand, Cornelia Huck,
	Claudio Imbrenda, Sean Christopherson, Vitaly Kuznetsov,
	Jim Mattson, Peter Shier, Oliver Upton, David Rientjes,
	Emanuele Giuseppe Esposito, David Matlack, Ricardo Koller,
	Krish Sadhukhan

Hi Faud,

On Sun, Jun 13, 2021 at 10:20 AM Fuad Tabba <tabba@google.com> wrote:
>
> Hi Jing,
>
> > diff --git a/arch/mips/kvm/mips.c b/arch/mips/kvm/mips.c
> > index ff205b35719b..eb8fd4a96952 100644
> > --- a/arch/mips/kvm/mips.c
> > +++ b/arch/mips/kvm/mips.c
> > @@ -38,6 +38,58 @@
> >  #define VECTORSPACING 0x100    /* for EI/VI mode */
> >  #endif
> >
> > +struct _kvm_stats_desc kvm_vm_stats_desc[] = DEFINE_VM_STATS_DESC();
> > +
> > +struct _kvm_stats_header kvm_vm_stats_header = {
> > +       .name_size = KVM_STATS_NAME_LEN,
> > +       .count = ARRAY_SIZE(kvm_vm_stats_desc),
> > +       .desc_offset = sizeof(struct kvm_stats_header),
> > +       .data_offset = sizeof(struct kvm_stats_header) +
> > +               sizeof(kvm_vm_stats_desc),
> > +};
> > +
> > +struct _kvm_stats_desc kvm_vcpu_stats_desc[] = DEFINE_VCPU_STATS_DESC(
> > +       STATS_DESC_COUNTER("wait_exits"),
> > +       STATS_DESC_COUNTER("cache_exits"),
> > +       STATS_DESC_COUNTER("signal_exits"),
> > +       STATS_DESC_COUNTER("int_exits"),
> > +       STATS_DESC_COUNTER("cop_unusable_exits"),
> > +       STATS_DESC_COUNTER("tlbmod_exits"),
> > +       STATS_DESC_COUNTER("tlbmiss_ld_exits"),
> > +       STATS_DESC_COUNTER("tlbmiss_st_exits"),
> > +       STATS_DESC_COUNTER("addrerr_st_exits"),
> > +       STATS_DESC_COUNTER("addrerr_ld_exits"),
> > +       STATS_DESC_COUNTER("syscall_exits"),
> > +       STATS_DESC_COUNTER("resvd_inst_exits"),
> > +       STATS_DESC_COUNTER("break_inst_exits"),
> > +       STATS_DESC_COUNTER("trap_inst_exits"),
> > +       STATS_DESC_COUNTER("msa_fpe_exits"),
> > +       STATS_DESC_COUNTER("fpe_exits"),
> > +       STATS_DESC_COUNTER("msa_disabled_exits"),
> > +       STATS_DESC_COUNTER("flush_dcache_exits"),
> > +#ifdef CONFIG_KVM_MIPS_VZ
> > +       STATS_DESC_COUNTER("vz_gpsi_exits"),
> > +       STATS_DESC_COUNTER("vz_gsfc_exits"),
> > +       STATS_DESC_COUNTER("vz_hc_exits"),
> > +       STATS_DESC_COUNTER("vz_grr_exits"),
> > +       STATS_DESC_COUNTER("vz_gva_exits"),
> > +       STATS_DESC_COUNTER("vz_ghfc_exits"),
> > +       STATS_DESC_COUNTER("vz_gpa_exits"),
> > +       STATS_DESC_COUNTER("vz_resvd_exits"),
> > +#ifdef CONFIG_CPU_LOONGSON64
> > +       STATS_DESC_COUNTER("vz_cpucfg_exits"),
> > +#endif
> > +#endif
> > +       );
>
> I'm not sure if this is because I've applied your patches on a
> different base than the one you're using (I'm using 5.13-rc6), but it
> seems that CONFIG_KVM_MIPS_VZ isn't there anymore. Moreover, looking
> at your earlier patch (v7 1/4), although truncated, it doesn't seem
> that your version of arch/mips/include/asm/kvm_host.h has that config
> either. Again, could be a user-error on my part, but might be worth
> checking.
>
Good eyes! The removal of CONFIG_KVM_MIPS_VZ came in after my early
version of patchset. My later auto rebase didn't catch that change.
> > diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c
> > index 96d10253218a..8e900c482626 100644
> > --- a/arch/x86/kvm/x86.c
> > +++ b/arch/x86/kvm/x86.c
> > @@ -214,6 +214,59 @@ EXPORT_SYMBOL_GPL(host_xss);
> >  u64 __read_mostly supported_xss;
> >  EXPORT_SYMBOL_GPL(supported_xss);
> >
> > +struct _kvm_stats_desc kvm_vm_stats_desc[] = DEFINE_VM_STATS_DESC(
> > +       STATS_DESC_COUNTER("mmu_shadow_zapped"),
> > +       STATS_DESC_COUNTER("mmu_pte_write"),
> > +       STATS_DESC_COUNTER("mmu_pde_zapped"),
> > +       STATS_DESC_COUNTER("mmu_flooded"),
> > +       STATS_DESC_COUNTER("mmu_recycled"),
> > +       STATS_DESC_COUNTER("mmu_cache_miss"),
> > +       STATS_DESC_ICOUNTER("mmu_unsync"),
> > +       STATS_DESC_ICOUNTER("largepages"),
> > +       STATS_DESC_ICOUNTER("nx_largepages_splits"),
> > +       STATS_DESC_ICOUNTER("max_mmu_page_hash_collisions"));
> > +
> > +struct _kvm_stats_header kvm_vm_stats_header = {
> > +       .name_size = KVM_STATS_NAME_LEN,
> > +       .count = ARRAY_SIZE(kvm_vm_stats_desc),
> > +       .desc_offset = sizeof(struct kvm_stats_header),
> > +       .data_offset = sizeof(struct kvm_stats_header) +
> > +               sizeof(kvm_vm_stats_desc),
> > +};
> > +
> > +struct _kvm_stats_desc kvm_vcpu_stats_desc[] = DEFINE_VCPU_STATS_DESC(
> > +       STATS_DESC_COUNTER("pf_fixed"),
> > +       STATS_DESC_COUNTER("pf_guest"),
> > +       STATS_DESC_COUNTER("tlb_flush"),
> > +       STATS_DESC_COUNTER("invlpg"),
> > +       STATS_DESC_COUNTER("exits"),
> > +       STATS_DESC_COUNTER("io_exits"),
> > +       STATS_DESC_COUNTER("mmio_exits"),
> > +       STATS_DESC_COUNTER("signal_exits"),
> > +       STATS_DESC_COUNTER("irq_window_exits"),
> > +       STATS_DESC_COUNTER("nmi_window_exits"),
> > +       STATS_DESC_COUNTER("l1d_flush"),
> > +       STATS_DESC_COUNTER("halt_exits"),
> > +       STATS_DESC_COUNTER("request_irq_exits"),
> > +       STATS_DESC_COUNTER("irq_exits"),
> > +       STATS_DESC_COUNTER("host_state_reload"),
> > +       STATS_DESC_COUNTER("fpu_reload"),
> > +       STATS_DESC_COUNTER("insn_emulation"),
> > +       STATS_DESC_COUNTER("insn_emulation_fail"),
> > +       STATS_DESC_COUNTER("hypercalls"),
> > +       STATS_DESC_COUNTER("irq_injections"),
> > +       STATS_DESC_COUNTER("nmi_injections"),
> > +       STATS_DESC_COUNTER("req_event"),
> > +       STATS_DESC_COUNTER("nested_run"));
>
> Have you deliberately left out directed_yield_attempted and
> directed_yield_successful (from struct kvm_vcpu_stat in
> arch/x86/include/asm/kvm_host.h), or should they be here as well?
>
> Thanks,
> /fuad
>
Thanks. Same problem caused from rebase. Also another one "guest_mode"
is missing too.
Will double check similar issues and add static assertion to make sure the
number of stats descriptors matches the number of stats defined in vm and
vcpu stats structures.
> > +struct _kvm_stats_header kvm_vcpu_stats_header = {
> > +       .name_size = KVM_STATS_NAME_LEN,
> > +       .count = ARRAY_SIZE(kvm_vcpu_stats_desc),
> > +       .desc_offset = sizeof(struct kvm_stats_header),
> > +       .data_offset = sizeof(struct kvm_stats_header) +
> > +               sizeof(kvm_vcpu_stats_desc),
> > +};
> > +
> >  struct kvm_stats_debugfs_item debugfs_entries[] = {
> >         VCPU_STAT("pf_fixed", pf_fixed),
> >         VCPU_STAT("pf_guest", pf_guest),
> > diff --git a/include/linux/kvm_host.h b/include/linux/kvm_host.h
> > index 1870fa928762..873e21af36be 100644
> > --- a/include/linux/kvm_host.h
> > +++ b/include/linux/kvm_host.h
> > @@ -1240,6 +1240,19 @@ struct kvm_stats_debugfs_item {
> >         int mode;
> >  };
> >
> > +struct _kvm_stats_header {
> > +       __u32 name_size;
> > +       __u32 count;
> > +       __u32 desc_offset;
> > +       __u32 data_offset;
> > +};
> > +
> > +#define KVM_STATS_NAME_LEN     48
> > +struct _kvm_stats_desc {
> > +       struct kvm_stats_desc desc;
> > +       char name[KVM_STATS_NAME_LEN];
> > +};
> > +
> >  #define KVM_DBGFS_GET_MODE(dbgfs_item)                                         \
> >         ((dbgfs_item)->mode ? (dbgfs_item)->mode : 0644)
> >
> > @@ -1253,8 +1266,127 @@ struct kvm_stats_debugfs_item {
> >         { n, offsetof(struct kvm_vcpu, stat.generic.x),                        \
> >           KVM_STAT_VCPU, ## __VA_ARGS__ }
> >
> > +#define STATS_DESC(stat, type, unit, scale, exp)                              \
> > +       {                                                                      \
> > +               {                                                              \
> > +                       .flags = type | unit | scale,                          \
> > +                       .exponent = exp,                                       \
> > +                       .size = 1                                              \
> > +               },                                                             \
> > +               .name = stat,                                                  \
> > +       }
> > +#define STATS_DESC_CUMULATIVE(name, unit, scale, exponent)                    \
> > +       STATS_DESC(name, KVM_STATS_TYPE_CUMULATIVE, unit, scale, exponent)
> > +#define STATS_DESC_INSTANT(name, unit, scale, exponent)                               \
> > +       STATS_DESC(name, KVM_STATS_TYPE_INSTANT, unit, scale, exponent)
> > +
> > +/* Cumulative counter */
> > +#define STATS_DESC_COUNTER(name)                                              \
> > +       STATS_DESC_CUMULATIVE(name, KVM_STATS_UNIT_NONE,                       \
> > +               KVM_STATS_BASE_POW10, 0)
> > +/* Instantaneous counter */
> > +#define STATS_DESC_ICOUNTER(name)                                             \
> > +       STATS_DESC_INSTANT(name, KVM_STATS_UNIT_NONE,                          \
> > +               KVM_STATS_BASE_POW10, 0)
> > +
> > +/* Cumulative clock cycles */
> > +#define STATS_DESC_CYCLE(name)                                                \
> > +       STATS_DESC_CUMULATIVE(name, KVM_STATS_UNIT_CYCLES,                     \
> > +               KVM_STATS_BASE_POW10, 0)
> > +/* Instantaneous clock cycles */
> > +#define STATS_DESC_ICYCLE(name)                                                       \
> > +       STATS_DESC_INSTANT(name, KVM_STATS_UNIT_CYCLES,                        \
> > +               KVM_STATS_BASE_POW10, 0)
> > +
> > +/* Cumulative memory size in Byte */
> > +#define STATS_DESC_SIZE_BYTE(name)                                            \
> > +       STATS_DESC_CUMULATIVE(name, KVM_STATS_UNIT_BYTES,                      \
> > +               KVM_STATS_BASE_POW2, 0)
> > +/* Cumulative memory size in KiByte */
> > +#define STATS_DESC_SIZE_KBYTE(name)                                           \
> > +       STATS_DESC_CUMULATIVE(name, KVM_STATS_UNIT_BYTES,                      \
> > +               KVM_STATS_BASE_POW2, 10)
> > +/* Cumulative memory size in MiByte */
> > +#define STATS_DESC_SIZE_MBYTE(name)                                           \
> > +       STATS_DESC_CUMULATIVE(name, KVM_STATS_UNIT_BYTES,                      \
> > +               KVM_STATS_BASE_POW2, 20)
> > +/* Cumulative memory size in GiByte */
> > +#define STATS_DESC_SIZE_GBYTE(name)                                           \
> > +       STATS_DESC_CUMULATIVE(name, KVM_STATS_UNIT_BYTES,                      \
> > +               KVM_STATS_BASE_POW2, 30)
> > +
> > +/* Instantaneous memory size in Byte */
> > +#define STATS_DESC_ISIZE_BYTE(name)                                           \
> > +       STATS_DESC_INSTANT(name, KVM_STATS_UNIT_BYTES,                         \
> > +               KVM_STATS_BASE_POW2, 0)
> > +/* Instantaneous memory size in KiByte */
> > +#define STATS_DESC_ISIZE_KBYTE(name)                                          \
> > +       STATS_DESC_INSTANT(name, KVM_STATS_UNIT_BYTES,                         \
> > +               KVM_STATS_BASE_POW2, 10)
> > +/* Instantaneous memory size in MiByte */
> > +#define STATS_DESC_ISIZE_MBYTE(name)                                          \
> > +       STATS_DESC_INSTANT(name, KVM_STATS_UNIT_BYTES,                         \
> > +               KVM_STATS_BASE_POW2, 20)
> > +/* Instantaneous memory size in GiByte */
> > +#define STATS_DESC_ISIZE_GBYTE(name)                                          \
> > +       STATS_DESC_INSTANT(name, KVM_STATS_UNIT_BYTES,                         \
> > +               KVM_STATS_BASE_POW2, 30)
> > +
> > +/* Cumulative time in second */
> > +#define STATS_DESC_TIME_SEC(name)                                             \
> > +       STATS_DESC_CUMULATIVE(name, KVM_STATS_UNIT_SECONDS,                    \
> > +               KVM_STATS_BASE_POW10, 0)
> > +/* Cumulative time in millisecond */
> > +#define STATS_DESC_TIME_MSEC(name)                                            \
> > +       STATS_DESC_CUMULATIVE(name, KVM_STATS_UNIT_SECONDS,                    \
> > +               KVM_STATS_BASE_POW10, -3)
> > +/* Cumulative time in microsecond */
> > +#define STATS_DESC_TIME_USEC(name)                                            \
> > +       STATS_DESC_CUMULATIVE(name, KVM_STATS_UNIT_SECONDS,                    \
> > +               KVM_STATS_BASE_POW10, -6)
> > +/* Cumulative time in nanosecond */
> > +#define STATS_DESC_TIME_NSEC(name)                                            \
> > +       STATS_DESC_CUMULATIVE(name, KVM_STATS_UNIT_SECONDS,                    \
> > +               KVM_STATS_BASE_POW10, -9)
> > +
> > +/* Instantaneous time in second */
> > +#define STATS_DESC_ITIME_SEC(name)                                            \
> > +       STATS_DESC_INSTANT(name, KVM_STATS_UNIT_SECONDS,                       \
> > +               KVM_STATS_BASE_POW10, 0)
> > +/* Instantaneous time in millisecond */
> > +#define STATS_DESC_ITIME_MSEC(name)                                           \
> > +       STATS_DESC_INSTANT(name, KVM_STATS_UNIT_SECONDS,                       \
> > +               KVM_STATS_BASE_POW10, -3)
> > +/* Instantaneous time in microsecond */
> > +#define STATS_DESC_ITIME_USEC(name)                                           \
> > +       STATS_DESC_INSTANT(name, KVM_STATS_UNIT_SECONDS,                       \
> > +               KVM_STATS_BASE_POW10, -6)
> > +/* Instantaneous time in nanosecond */
> > +#define STATS_DESC_ITIME_NSEC(name)                                           \
> > +       STATS_DESC_INSTANT(name, KVM_STATS_UNIT_SECONDS,                       \
> > +               KVM_STATS_BASE_POW10, -9)
> > +
> > +#define DEFINE_VM_STATS_DESC(...) {                                           \
> > +       STATS_DESC_COUNTER("remote_tlb_flush"),                                \
> > +       ## __VA_ARGS__                                                         \
> > +}
> > +
> > +#define DEFINE_VCPU_STATS_DESC(...) {                                         \
> > +       STATS_DESC_COUNTER("halt_successful_poll"),                            \
> > +       STATS_DESC_COUNTER("halt_attempted_poll"),                             \
> > +       STATS_DESC_COUNTER("halt_poll_invalid"),                               \
> > +       STATS_DESC_COUNTER("halt_wakeup"),                                     \
> > +       STATS_DESC_TIME_NSEC("halt_poll_success_ns"),                          \
> > +       STATS_DESC_TIME_NSEC("halt_poll_fail_ns"),                             \
> > +       ## __VA_ARGS__                                                         \
> > +}
> > +
> >  extern struct kvm_stats_debugfs_item debugfs_entries[];
> >  extern struct dentry *kvm_debugfs_dir;
> > +extern struct _kvm_stats_header kvm_vm_stats_header;
> > +extern struct _kvm_stats_header kvm_vcpu_stats_header;
> > +extern struct _kvm_stats_desc kvm_vm_stats_desc[];
> > +extern struct _kvm_stats_desc kvm_vcpu_stats_desc[];
> >
> >  #if defined(CONFIG_MMU_NOTIFIER) && defined(KVM_ARCH_WANT_MMU_NOTIFIER)
> >  static inline int mmu_notifier_retry(struct kvm *kvm, unsigned long mmu_seq)
> > diff --git a/include/uapi/linux/kvm.h b/include/uapi/linux/kvm.h
> > index 3fd9a7e9d90c..dcfa0315e3f9 100644
> > --- a/include/uapi/linux/kvm.h
> > +++ b/include/uapi/linux/kvm.h
> > @@ -1082,6 +1082,7 @@ struct kvm_ppc_resize_hpt {
> >  #define KVM_CAP_SGX_ATTRIBUTE 196
> >  #define KVM_CAP_VM_COPY_ENC_CONTEXT_FROM 197
> >  #define KVM_CAP_PTP_KVM 198
> > +#define KVM_CAP_STATS_BINARY_FD 199
> >
> >  #ifdef KVM_CAP_IRQ_ROUTING
> >
> > @@ -1898,4 +1899,53 @@ struct kvm_dirty_gfn {
> >  #define KVM_BUS_LOCK_DETECTION_OFF             (1 << 0)
> >  #define KVM_BUS_LOCK_DETECTION_EXIT            (1 << 1)
> >
> > +#define KVM_STATS_ID_MAXLEN            64
> > +
> > +struct kvm_stats_header {
> > +       char id[KVM_STATS_ID_MAXLEN];
> > +       __u32 name_size;
> > +       __u32 count;
> > +       __u32 desc_offset;
> > +       __u32 data_offset;
> > +};
> > +
> > +#define KVM_STATS_TYPE_SHIFT           0
> > +#define KVM_STATS_TYPE_MASK            (0xF << KVM_STATS_TYPE_SHIFT)
> > +#define KVM_STATS_TYPE_CUMULATIVE      (0x0 << KVM_STATS_TYPE_SHIFT)
> > +#define KVM_STATS_TYPE_INSTANT         (0x1 << KVM_STATS_TYPE_SHIFT)
> > +#define KVM_STATS_TYPE_MAX             KVM_STATS_TYPE_INSTANT
> > +
> > +#define KVM_STATS_UNIT_SHIFT           4
> > +#define KVM_STATS_UNIT_MASK            (0xF << KVM_STATS_UNIT_SHIFT)
> > +#define KVM_STATS_UNIT_NONE            (0x0 << KVM_STATS_UNIT_SHIFT)
> > +#define KVM_STATS_UNIT_BYTES           (0x1 << KVM_STATS_UNIT_SHIFT)
> > +#define KVM_STATS_UNIT_SECONDS         (0x2 << KVM_STATS_UNIT_SHIFT)
> > +#define KVM_STATS_UNIT_CYCLES          (0x3 << KVM_STATS_UNIT_SHIFT)
> > +#define KVM_STATS_UNIT_MAX             KVM_STATS_UNIT_CYCLES
> > +
> > +#define KVM_STATS_BASE_SHIFT           8
> > +#define KVM_STATS_BASE_MASK            (0xF << KVM_STATS_BASE_SHIFT)
> > +#define KVM_STATS_BASE_POW10           (0x0 << KVM_STATS_BASE_SHIFT)
> > +#define KVM_STATS_BASE_POW2            (0x1 << KVM_STATS_BASE_SHIFT)
> > +#define KVM_STATS_BASE_MAX             KVM_STATS_BASE_POW2
> > +
> > +struct kvm_stats_desc {
> > +       __u32 flags;
> > +       __s16 exponent;
> > +       __u16 size;
> > +       __u32 unused1;
> > +       __u32 unused2;
> > +       char name[0];
> > +};
> > +
> > +struct kvm_vm_stats_data {
> > +       unsigned long value[0];
> > +};
> > +
> > +struct kvm_vcpu_stats_data {
> > +       __u64 value[0];
> > +};
> > +
> > +#define KVM_GET_STATS_FD  _IOR(KVMIO,  0xcc, struct kvm_stats_header)
> > +
> >  #endif /* __LINUX_KVM_H */
> > diff --git a/virt/kvm/kvm_main.c b/virt/kvm/kvm_main.c
> > index f6ad5b080994..d84bb17bdea8 100644
> > --- a/virt/kvm/kvm_main.c
> > +++ b/virt/kvm/kvm_main.c
> > @@ -3409,6 +3409,115 @@ static int kvm_vcpu_ioctl_set_sigmask(struct kvm_vcpu *vcpu, sigset_t *sigset)
> >         return 0;
> >  }
> >
> > +static ssize_t kvm_stats_read(char *id, struct _kvm_stats_header *header,
> > +               struct _kvm_stats_desc *desc, void *stats, size_t size_stats,
> > +               char __user *user_buffer, size_t size, loff_t *offset)
> > +{
> > +       ssize_t copylen, len, remain = size;
> > +       size_t size_header, size_desc;
> > +       loff_t pos = *offset;
> > +       char __user *dest = user_buffer;
> > +       void *src;
> > +
> > +       size_header = sizeof(*header);
> > +       size_desc = header->count * sizeof(*desc);
> > +
> > +       len = KVM_STATS_ID_MAXLEN + size_header + size_desc + size_stats - pos;
> > +       len = min(len, remain);
> > +       if (len <= 0)
> > +               return 0;
> > +       remain = len;
> > +
> > +       /* Copy kvm stats header id string */
> > +       copylen = KVM_STATS_ID_MAXLEN - pos;
> > +       copylen = min(copylen, remain);
> > +       if (copylen > 0) {
> > +               src = id + pos;
> > +               if (copy_to_user(dest, src, copylen))
> > +                       return -EFAULT;
> > +               remain -= copylen;
> > +               pos += copylen;
> > +               dest += copylen;
> > +       }
> > +       /* Copy kvm stats header */
> > +       copylen = KVM_STATS_ID_MAXLEN + size_header - pos;
> > +       copylen = min(copylen, remain);
> > +       if (copylen > 0) {
> > +               src = header + pos - KVM_STATS_ID_MAXLEN;
> > +               if (copy_to_user(dest, src, copylen))
> > +                       return -EFAULT;
> > +               remain -= copylen;
> > +               pos += copylen;
> > +               dest += copylen;
> > +       }
> > +       /* Copy kvm stats descriptors */
> > +       copylen = header->desc_offset + size_desc - pos;
> > +       copylen = min(copylen, remain);
> > +       if (copylen > 0) {
> > +               src = desc + pos - header->desc_offset;
> > +               if (copy_to_user(dest, src, copylen))
> > +                       return -EFAULT;
> > +               remain -= copylen;
> > +               pos += copylen;
> > +               dest += copylen;
> > +       }
> > +       /* Copy kvm stats values */
> > +       copylen = header->data_offset + size_stats - pos;
> > +       copylen = min(copylen, remain);
> > +       if (copylen > 0) {
> > +               src = stats + pos - header->data_offset;
> > +               if (copy_to_user(dest, src, copylen))
> > +                       return -EFAULT;
> > +               remain -= copylen;
> > +               pos += copylen;
> > +               dest += copylen;
> > +       }
> > +
> > +       *offset = pos;
> > +       return len;
> > +}
> > +
> > +static ssize_t kvm_vcpu_stats_read(struct file *file, char __user *user_buffer,
> > +                             size_t size, loff_t *offset)
> > +{
> > +       char id[KVM_STATS_ID_MAXLEN];
> > +       struct kvm_vcpu *vcpu = file->private_data;
> > +
> > +       snprintf(id, sizeof(id), "kvm-%d/vcpu-%d",
> > +                       task_pid_nr(current), vcpu->vcpu_id);
> > +       return kvm_stats_read(id, &kvm_vcpu_stats_header,
> > +                       &kvm_vcpu_stats_desc[0], &vcpu->stat,
> > +                       sizeof(vcpu->stat), user_buffer, size, offset);
> > +}
> > +
> > +static const struct file_operations kvm_vcpu_stats_fops = {
> > +       .read = kvm_vcpu_stats_read,
> > +       .llseek = noop_llseek,
> > +};
> > +
> > +static int kvm_vcpu_ioctl_get_stats_fd(struct kvm_vcpu *vcpu)
> > +{
> > +       int fd;
> > +       struct file *file;
> > +       char name[15 + ITOA_MAX_LEN + 1];
> > +
> > +       snprintf(name, sizeof(name), "kvm-vcpu-stats:%d", vcpu->vcpu_id);
> > +
> > +       fd = get_unused_fd_flags(O_CLOEXEC);
> > +       if (fd < 0)
> > +               return fd;
> > +
> > +       file = anon_inode_getfile(name, &kvm_vcpu_stats_fops, vcpu, O_RDONLY);
> > +       if (IS_ERR(file)) {
> > +               put_unused_fd(fd);
> > +               return PTR_ERR(file);
> > +       }
> > +       file->f_mode |= FMODE_PREAD;
> > +       fd_install(fd, file);
> > +
> > +       return fd;
> > +}
> > +
> >  static long kvm_vcpu_ioctl(struct file *filp,
> >                            unsigned int ioctl, unsigned long arg)
> >  {
> > @@ -3606,6 +3715,10 @@ static long kvm_vcpu_ioctl(struct file *filp,
> >                 r = kvm_arch_vcpu_ioctl_set_fpu(vcpu, fpu);
> >                 break;
> >         }
> > +       case KVM_GET_STATS_FD: {
> > +               r = kvm_vcpu_ioctl_get_stats_fd(vcpu);
> > +               break;
> > +       }
> >         default:
> >                 r = kvm_arch_vcpu_ioctl(filp, ioctl, arg);
> >         }
> > @@ -3864,6 +3977,8 @@ static long kvm_vm_ioctl_check_extension_generic(struct kvm *kvm, long arg)
> >  #else
> >                 return 0;
> >  #endif
> > +       case KVM_CAP_STATS_BINARY_FD:
> > +               return 1;
> >         default:
> >                 break;
> >         }
> > @@ -3967,6 +4082,43 @@ static int kvm_vm_ioctl_enable_cap_generic(struct kvm *kvm,
> >         }
> >  }
> >
> > +static ssize_t kvm_vm_stats_read(struct file *file, char __user *user_buffer,
> > +                             size_t size, loff_t *offset)
> > +{
> > +       char id[KVM_STATS_ID_MAXLEN];
> > +       struct kvm *kvm = file->private_data;
> > +
> > +       snprintf(id, sizeof(id), "kvm-%d", task_pid_nr(current));
> > +       return kvm_stats_read(id, &kvm_vm_stats_header, &kvm_vm_stats_desc[0],
> > +               &kvm->stat, sizeof(kvm->stat), user_buffer, size, offset);
> > +}
> > +
> > +static const struct file_operations kvm_vm_stats_fops = {
> > +       .read = kvm_vm_stats_read,
> > +       .llseek = noop_llseek,
> > +};
> > +
> > +static int kvm_vm_ioctl_get_stats_fd(struct kvm *kvm)
> > +{
> > +       int fd;
> > +       struct file *file;
> > +
> > +       fd = get_unused_fd_flags(O_CLOEXEC);
> > +       if (fd < 0)
> > +               return fd;
> > +
> > +       file = anon_inode_getfile("kvm-vm-stats",
> > +                       &kvm_vm_stats_fops, kvm, O_RDONLY);
> > +       if (IS_ERR(file)) {
> > +               put_unused_fd(fd);
> > +               return PTR_ERR(file);
> > +       }
> > +       file->f_mode |= FMODE_PREAD;
> > +       fd_install(fd, file);
> > +
> > +       return fd;
> > +}
> > +
> >  static long kvm_vm_ioctl(struct file *filp,
> >                            unsigned int ioctl, unsigned long arg)
> >  {
> > @@ -4149,6 +4301,9 @@ static long kvm_vm_ioctl(struct file *filp,
> >         case KVM_RESET_DIRTY_RINGS:
> >                 r = kvm_vm_ioctl_reset_dirty_pages(kvm);
> >                 break;
> > +       case KVM_GET_STATS_FD:
> > +               r = kvm_vm_ioctl_get_stats_fd(kvm);
> > +               break;
> >         default:
> >                 r = kvm_arch_vm_ioctl(filp, ioctl, arg);
> >         }
> > --
> > 2.32.0.rc1.229.g3e70b5a671-goog
> >
> > _______________________________________________
> > kvmarm mailing list
> > kvmarm@lists.cs.columbia.edu
> > https://lists.cs.columbia.edu/mailman/listinfo/kvmarm

Thanks,
Jing

^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: [PATCH v7 3/4] KVM: stats: Add documentation for statistics data binary interface
  2021-06-03 21:14 ` [PATCH v7 3/4] KVM: stats: Add documentation for statistics data binary interface Jing Zhang
  2021-06-07 19:22   ` Krish Sadhukhan
@ 2021-06-14  7:56   ` Fuad Tabba
  2021-06-14 13:19     ` Jing Zhang
  1 sibling, 1 reply; 32+ messages in thread
From: Fuad Tabba @ 2021-06-14  7:56 UTC (permalink / raw)
  To: Jing Zhang
  Cc: KVM, KVMARM, LinuxMIPS, KVMPPC, LinuxS390, Linuxkselftest,
	Paolo Bonzini, Marc Zyngier, James Morse, Julien Thierry,
	Suzuki K Poulose, Will Deacon, Huacai Chen, Aleksandar Markovic,
	Thomas Bogendoerfer, Paul Mackerras, Christian Borntraeger,
	Janosch Frank, David Hildenbrand, Cornelia Huck,
	Claudio Imbrenda, Sean Christopherson, Vitaly Kuznetsov,
	Jim Mattson, Peter Shier, Oliver Upton, David Rientjes,
	Emanuele Giuseppe Esposito, David Matlack, Ricardo Koller,
	Krish Sadhukhan

Hi Jing,


On Thu, Jun 3, 2021 at 10:14 PM Jing Zhang <jingzhangos@google.com> wrote:
>
> Update KVM API documentation for binary statistics.
>
> Reviewed-by: David Matlack <dmatlack@google.com>
> Reviewed-by: Ricardo Koller <ricarkol@google.com>
> Signed-off-by: Jing Zhang <jingzhangos@google.com>
> ---
>  Documentation/virt/kvm/api.rst | 180 +++++++++++++++++++++++++++++++++
>  1 file changed, 180 insertions(+)
>
> diff --git a/Documentation/virt/kvm/api.rst b/Documentation/virt/kvm/api.rst
> index 7fcb2fd38f42..550bfbdf611b 100644
> --- a/Documentation/virt/kvm/api.rst
> +++ b/Documentation/virt/kvm/api.rst
> @@ -5034,6 +5034,178 @@ see KVM_XEN_VCPU_SET_ATTR above.
>  The KVM_XEN_VCPU_ATTR_TYPE_RUNSTATE_ADJUST type may not be used
>  with the KVM_XEN_VCPU_GET_ATTR ioctl.
>
> +4.130 KVM_GET_STATS_FD
> +---------------------

nit: missing one - (to match the subtitle length)

> +
> +:Capability: KVM_CAP_STATS_BINARY_FD
> +:Architectures: all
> +:Type: vm ioctl, vcpu ioctl
> +:Parameters: none
> +:Returns: statistics file descriptor on success, < 0 on error
> +
> +Errors:
> +
> +  ======     ======================================================
> +  ENOMEM     if the fd could not be created due to lack of memory
> +  EMFILE     if the number of opened files exceeds the limit
> +  ======     ======================================================
> +
> +The file descriptor can be used to read VM/vCPU statistics data in binary
> +format. The file data is organized into three blocks as below:
> ++-------------+
> +|   Header    |
> ++-------------+
> +| Descriptors |
> ++-------------+
> +| Stats Data  |
> ++-------------+
> +
> +The Header block is always at the start of the file. It is only needed to be
> +read one time for the lifetime of the file descriptor.
> +It is in the form of ``struct kvm_stats_header`` as below::
> +
> +       #define KVM_STATS_ID_MAXLEN             64
> +
> +       struct kvm_stats_header {
> +               char id[KVM_STATS_ID_MAXLEN];
> +               __u32 name_size;
> +               __u32 count;
> +               __u32 desc_offset;
> +               __u32 data_offset;
> +       };
> +
> +The ``id`` field is identification for the corresponding KVM statistics. For
> +VM statistics, it is in the form of "kvm-{kvm pid}", like "kvm-12345". For
> +VCPU statistics, it is in the form of "kvm-{kvm pid}/vcpu-{vcpu id}", like
> +"kvm-12345/vcpu-12".
> +
> +The ``name_size`` field is the size (byte) of the statistics name string
> +(including trailing '\0') appended to the end of every statistics descriptor.
> +
> +The ``count`` field is the number of statistics.
> +
> +The ``desc_offset`` field is the offset of the Descriptors block from the start
> +of the file indicated by the file descriptor.
> +
> +The ``data_offset`` field is the offset of the Stats Data block from the start
> +of the file indicated by the file descriptor.
> +
> +The Descriptors block is only needed to be read once for the lifetime of the
> +file descriptor. It is an array of ``struct kvm_stats_desc`` as shown in
> +below code block::
> +
> +       #define KVM_STATS_TYPE_SHIFT            0
> +       #define KVM_STATS_TYPE_MASK             (0xF << KVM_STATS_TYPE_SHIFT)
> +       #define KVM_STATS_TYPE_CUMULATIVE       (0x0 << KVM_STATS_TYPE_SHIFT)
> +       #define KVM_STATS_TYPE_INSTANT          (0x1 << KVM_STATS_TYPE_SHIFT)
> +       #define KVM_STATS_TYPE_MAX              KVM_STATS_TYPE_INSTANT
> +
> +       #define KVM_STATS_UNIT_SHIFT            4
> +       #define KVM_STATS_UNIT_MASK             (0xF << KVM_STATS_UNIT_SHIFT)
> +       #define KVM_STATS_UNIT_NONE             (0x0 << KVM_STATS_UNIT_SHIFT)
> +       #define KVM_STATS_UNIT_BYTES            (0x1 << KVM_STATS_UNIT_SHIFT)
> +       #define KVM_STATS_UNIT_SECONDS          (0x2 << KVM_STATS_UNIT_SHIFT)
> +       #define KVM_STATS_UNIT_CYCLES           (0x3 << KVM_STATS_UNIT_SHIFT)
> +       #define KVM_STATS_UNIT_MAX              KVM_STATS_UNIT_CYCLES
> +
> +       #define KVM_STATS_BASE_SHIFT            8
> +       #define KVM_STATS_BASE_MASK             (0xF << KVM_STATS_BASE_SHIFT)
> +       #define KVM_STATS_BASE_POW10            (0x0 << KVM_STATS_BASE_SHIFT)
> +       #define KVM_STATS_BASE_POW2             (0x1 << KVM_STATS_BASE_SHIFT)
> +       #define KVM_STATS_BASE_MAX              KVM_STATS_BASE_POW2
> +
> +       struct kvm_stats_desc {
> +               __u32 flags;
> +               __s16 exponent;
> +               __u16 size;
> +               __u32 unused1;
> +               __u32 unused2;
> +               char name[0];
> +       };
> +
> +The ``flags`` field contains the type and unit of the statistics data described
> +by this descriptor. The following flags are supported:
> +
> +Bits 0-3 of ``flags`` encode the type:
> +  * ``KVM_STATS_TYPE_CUMULATIVE``
> +    The statistics data is cumulative. The value of data can only be increased.
> +    Most of the counters used in KVM are of this type.
> +    The corresponding ``count`` filed for this type is always 1.

filed -> field

> +  * ``KVM_STATS_TYPE_INSTANT``
> +    The statistics data is instantaneous. Its value can be increased or
> +    decreased. This type is usually used as a measurement of some resources,
> +    like the number of dirty pages, the number of large pages, etc.
> +    The corresponding ``count`` field for this type is always 1.
> +
> +Bits 4-7 of ``flags`` encode the unit:
> +  * ``KVM_STATS_UNIT_NONE``
> +    There is no unit for the value of statistics data. This usually means that
> +    the value is a simple counter of an event.
> +  * ``KVM_STATS_UNIT_BYTES``
> +    It indicates that the statistics data is used to measure memory size, in the
> +    unit of Byte, KiByte, MiByte, GiByte, etc. The unit of the data is
> +    determined by the ``exponent`` field in the descriptor. The
> +    ``KVM_STATS_BASE_POW2`` flag is valid in this case. The unit of the data is
> +    determined by ``pow(2, exponent)``. For example, if value is 10,
> +    ``exponent`` is 20, which means the unit of statistics data is MiByte, we
> +    can get the statistics data in the unit of Byte by
> +    ``value * pow(2, exponent) = 10 * pow(2, 20) = 10 MiByte`` which is
> +    10 * 1024 * 1024 Bytes.
> +  * ``KVM_STATS_UNIT_SECONDS``
> +    It indicates that the statistics data is used to measure time/latency, in
> +    the unit of nanosecond, microsecond, millisecond and second. The unit of the
> +    data is determined by the ``exponent`` field in the descriptor. The
> +    ``KVM_STATS_BASE_POW10`` flag is valid in this case. The unit of the data
> +    is determined by ``pow(10, exponent)``. For example, if value is 2000000,
> +    ``exponent`` is -6, which means the unit of statistics data is microsecond,
> +    we can get the statistics data in the unit of second by
> +    ``value * pow(10, exponent) = 2000000 * pow(10, -6) = 2 seconds``.
> +  * ``KVM_STATS_UNIT_CYCLES``
> +    It indicates that the statistics data is used to measure CPU clock cycles.
> +    The ``KVM_STATS_BASE_POW10`` flag is valid in this case. For example, if
> +    value is 200, ``exponent`` is 4, we can get the number of CPU clock cycles
> +    by ``value * pow(10, exponent) = 200 * pow(10, 4) = 2000000``.
> +
> +Bits 7-11 of ``flags`` encode the base:

Bits 8-11

> +  * ``KVM_STATS_BASE_POW10``
> +    The scale is based on power of 10. It is used for measurement of time and
> +    CPU clock cycles.
> +  * ``KVM_STATS_BASE_POW2``
> +    The scale is based on power of 2. It is used for measurement of memory size.
> +
> +The ``exponent`` field is the scale of corresponding statistics data. For
> +example, if the unit is ``KVM_STATS_UNIT_BYTES``, the base is
> +``KVM_STATS_BASE_POW2``, the ``exponent`` is 10, then we know that the real
> +unit of the statistics data is KBytes a.k.a pow(2, 10) = 1024 bytes.
> +
> +The ``size`` field is the number of values of this statistics data. It is in the
> +unit of ``unsigned long`` for VM or ``__u64`` for VCPU.
> +
> +The ``unused1`` and ``unused2`` fields are reserved for future
> +support for other types of statistics data, like log/linear histogram.
> +
> +The ``name`` field points to the name string of the statistics data. The name
> +string starts at the end of ``struct kvm_stats_desc``.
> +The maximum length (including trailing '\0') is indicated by ``name_size``
> +in ``struct kvm_stats_header``.
> +
> +The Stats Data block contains an array of data values of type ``struct
> +kvm_vm_stats_data`` or ``struct kvm_vcpu_stats_data``. It would be read by
> +user space periodically to pull statistics data.
> +The order of data value in Stats Data block is the same as the order of
> +descriptors in Descriptors block.
> +  * Statistics data for VM::
> +
> +       struct kvm_vm_stats_data {
> +               unsigned long value[0];
> +       };
> +
> +  * Statistics data for VCPU::
> +
> +       struct kvm_vcpu_stats_data {
> +               __u64 value[0];
> +       };
> +
>  5. The kvm_run structure
>  ========================
>
> @@ -6891,3 +7063,11 @@ This capability is always enabled.
>  This capability indicates that the KVM virtual PTP service is
>  supported in the host. A VMM can check whether the service is
>  available to the guest on migration.
> +
> +8.33 KVM_CAP_STATS_BINARY_FD
> +----------------------------
> +
> +:Architectures: all
> +
> +This capability indicates the feature that user space can create get a file
> +descriptor for every VM and VCPU to read statistics data in binary format.

nit: user space -> userspace (it's spelled that way throughout this document)

Cheers,
/fuad

> --
> 2.32.0.rc1.229.g3e70b5a671-goog
>
> _______________________________________________
> kvmarm mailing list
> kvmarm@lists.cs.columbia.edu
> https://lists.cs.columbia.edu/mailman/listinfo/kvmarm

^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: [PATCH v7 3/4] KVM: stats: Add documentation for statistics data binary interface
  2021-06-14  7:56   ` Fuad Tabba
@ 2021-06-14 13:19     ` Jing Zhang
  0 siblings, 0 replies; 32+ messages in thread
From: Jing Zhang @ 2021-06-14 13:19 UTC (permalink / raw)
  To: Fuad Tabba
  Cc: KVM, KVMARM, LinuxMIPS, KVMPPC, LinuxS390, Linuxkselftest,
	Paolo Bonzini, Marc Zyngier, James Morse, Julien Thierry,
	Suzuki K Poulose, Will Deacon, Huacai Chen, Aleksandar Markovic,
	Thomas Bogendoerfer, Paul Mackerras, Christian Borntraeger,
	Janosch Frank, David Hildenbrand, Cornelia Huck,
	Claudio Imbrenda, Sean Christopherson, Vitaly Kuznetsov,
	Jim Mattson, Peter Shier, Oliver Upton, David Rientjes,
	Emanuele Giuseppe Esposito, David Matlack, Ricardo Koller,
	Krish Sadhukhan

Hi Fuad,

On Mon, Jun 14, 2021 at 2:57 AM Fuad Tabba <tabba@google.com> wrote:
>
> Hi Jing,
>
>
> On Thu, Jun 3, 2021 at 10:14 PM Jing Zhang <jingzhangos@google.com> wrote:
> >
> > Update KVM API documentation for binary statistics.
> >
> > Reviewed-by: David Matlack <dmatlack@google.com>
> > Reviewed-by: Ricardo Koller <ricarkol@google.com>
> > Signed-off-by: Jing Zhang <jingzhangos@google.com>
> > ---
> >  Documentation/virt/kvm/api.rst | 180 +++++++++++++++++++++++++++++++++
> >  1 file changed, 180 insertions(+)
> >
> > diff --git a/Documentation/virt/kvm/api.rst b/Documentation/virt/kvm/api.rst
> > index 7fcb2fd38f42..550bfbdf611b 100644
> > --- a/Documentation/virt/kvm/api.rst
> > +++ b/Documentation/virt/kvm/api.rst
> > @@ -5034,6 +5034,178 @@ see KVM_XEN_VCPU_SET_ATTR above.
> >  The KVM_XEN_VCPU_ATTR_TYPE_RUNSTATE_ADJUST type may not be used
> >  with the KVM_XEN_VCPU_GET_ATTR ioctl.
> >
> > +4.130 KVM_GET_STATS_FD
> > +---------------------
>
> nit: missing one - (to match the subtitle length)
>
> > +
> > +:Capability: KVM_CAP_STATS_BINARY_FD
> > +:Architectures: all
> > +:Type: vm ioctl, vcpu ioctl
> > +:Parameters: none
> > +:Returns: statistics file descriptor on success, < 0 on error
> > +
> > +Errors:
> > +
> > +  ======     ======================================================
> > +  ENOMEM     if the fd could not be created due to lack of memory
> > +  EMFILE     if the number of opened files exceeds the limit
> > +  ======     ======================================================
> > +
> > +The file descriptor can be used to read VM/vCPU statistics data in binary
> > +format. The file data is organized into three blocks as below:
> > ++-------------+
> > +|   Header    |
> > ++-------------+
> > +| Descriptors |
> > ++-------------+
> > +| Stats Data  |
> > ++-------------+
> > +
> > +The Header block is always at the start of the file. It is only needed to be
> > +read one time for the lifetime of the file descriptor.
> > +It is in the form of ``struct kvm_stats_header`` as below::
> > +
> > +       #define KVM_STATS_ID_MAXLEN             64
> > +
> > +       struct kvm_stats_header {
> > +               char id[KVM_STATS_ID_MAXLEN];
> > +               __u32 name_size;
> > +               __u32 count;
> > +               __u32 desc_offset;
> > +               __u32 data_offset;
> > +       };
> > +
> > +The ``id`` field is identification for the corresponding KVM statistics. For
> > +VM statistics, it is in the form of "kvm-{kvm pid}", like "kvm-12345". For
> > +VCPU statistics, it is in the form of "kvm-{kvm pid}/vcpu-{vcpu id}", like
> > +"kvm-12345/vcpu-12".
> > +
> > +The ``name_size`` field is the size (byte) of the statistics name string
> > +(including trailing '\0') appended to the end of every statistics descriptor.
> > +
> > +The ``count`` field is the number of statistics.
> > +
> > +The ``desc_offset`` field is the offset of the Descriptors block from the start
> > +of the file indicated by the file descriptor.
> > +
> > +The ``data_offset`` field is the offset of the Stats Data block from the start
> > +of the file indicated by the file descriptor.
> > +
> > +The Descriptors block is only needed to be read once for the lifetime of the
> > +file descriptor. It is an array of ``struct kvm_stats_desc`` as shown in
> > +below code block::
> > +
> > +       #define KVM_STATS_TYPE_SHIFT            0
> > +       #define KVM_STATS_TYPE_MASK             (0xF << KVM_STATS_TYPE_SHIFT)
> > +       #define KVM_STATS_TYPE_CUMULATIVE       (0x0 << KVM_STATS_TYPE_SHIFT)
> > +       #define KVM_STATS_TYPE_INSTANT          (0x1 << KVM_STATS_TYPE_SHIFT)
> > +       #define KVM_STATS_TYPE_MAX              KVM_STATS_TYPE_INSTANT
> > +
> > +       #define KVM_STATS_UNIT_SHIFT            4
> > +       #define KVM_STATS_UNIT_MASK             (0xF << KVM_STATS_UNIT_SHIFT)
> > +       #define KVM_STATS_UNIT_NONE             (0x0 << KVM_STATS_UNIT_SHIFT)
> > +       #define KVM_STATS_UNIT_BYTES            (0x1 << KVM_STATS_UNIT_SHIFT)
> > +       #define KVM_STATS_UNIT_SECONDS          (0x2 << KVM_STATS_UNIT_SHIFT)
> > +       #define KVM_STATS_UNIT_CYCLES           (0x3 << KVM_STATS_UNIT_SHIFT)
> > +       #define KVM_STATS_UNIT_MAX              KVM_STATS_UNIT_CYCLES
> > +
> > +       #define KVM_STATS_BASE_SHIFT            8
> > +       #define KVM_STATS_BASE_MASK             (0xF << KVM_STATS_BASE_SHIFT)
> > +       #define KVM_STATS_BASE_POW10            (0x0 << KVM_STATS_BASE_SHIFT)
> > +       #define KVM_STATS_BASE_POW2             (0x1 << KVM_STATS_BASE_SHIFT)
> > +       #define KVM_STATS_BASE_MAX              KVM_STATS_BASE_POW2
> > +
> > +       struct kvm_stats_desc {
> > +               __u32 flags;
> > +               __s16 exponent;
> > +               __u16 size;
> > +               __u32 unused1;
> > +               __u32 unused2;
> > +               char name[0];
> > +       };
> > +
> > +The ``flags`` field contains the type and unit of the statistics data described
> > +by this descriptor. The following flags are supported:
> > +
> > +Bits 0-3 of ``flags`` encode the type:
> > +  * ``KVM_STATS_TYPE_CUMULATIVE``
> > +    The statistics data is cumulative. The value of data can only be increased.
> > +    Most of the counters used in KVM are of this type.
> > +    The corresponding ``count`` filed for this type is always 1.
>
> filed -> field
>
> > +  * ``KVM_STATS_TYPE_INSTANT``
> > +    The statistics data is instantaneous. Its value can be increased or
> > +    decreased. This type is usually used as a measurement of some resources,
> > +    like the number of dirty pages, the number of large pages, etc.
> > +    The corresponding ``count`` field for this type is always 1.
> > +
> > +Bits 4-7 of ``flags`` encode the unit:
> > +  * ``KVM_STATS_UNIT_NONE``
> > +    There is no unit for the value of statistics data. This usually means that
> > +    the value is a simple counter of an event.
> > +  * ``KVM_STATS_UNIT_BYTES``
> > +    It indicates that the statistics data is used to measure memory size, in the
> > +    unit of Byte, KiByte, MiByte, GiByte, etc. The unit of the data is
> > +    determined by the ``exponent`` field in the descriptor. The
> > +    ``KVM_STATS_BASE_POW2`` flag is valid in this case. The unit of the data is
> > +    determined by ``pow(2, exponent)``. For example, if value is 10,
> > +    ``exponent`` is 20, which means the unit of statistics data is MiByte, we
> > +    can get the statistics data in the unit of Byte by
> > +    ``value * pow(2, exponent) = 10 * pow(2, 20) = 10 MiByte`` which is
> > +    10 * 1024 * 1024 Bytes.
> > +  * ``KVM_STATS_UNIT_SECONDS``
> > +    It indicates that the statistics data is used to measure time/latency, in
> > +    the unit of nanosecond, microsecond, millisecond and second. The unit of the
> > +    data is determined by the ``exponent`` field in the descriptor. The
> > +    ``KVM_STATS_BASE_POW10`` flag is valid in this case. The unit of the data
> > +    is determined by ``pow(10, exponent)``. For example, if value is 2000000,
> > +    ``exponent`` is -6, which means the unit of statistics data is microsecond,
> > +    we can get the statistics data in the unit of second by
> > +    ``value * pow(10, exponent) = 2000000 * pow(10, -6) = 2 seconds``.
> > +  * ``KVM_STATS_UNIT_CYCLES``
> > +    It indicates that the statistics data is used to measure CPU clock cycles.
> > +    The ``KVM_STATS_BASE_POW10`` flag is valid in this case. For example, if
> > +    value is 200, ``exponent`` is 4, we can get the number of CPU clock cycles
> > +    by ``value * pow(10, exponent) = 200 * pow(10, 4) = 2000000``.
> > +
> > +Bits 7-11 of ``flags`` encode the base:
>
> Bits 8-11
>
> > +  * ``KVM_STATS_BASE_POW10``
> > +    The scale is based on power of 10. It is used for measurement of time and
> > +    CPU clock cycles.
> > +  * ``KVM_STATS_BASE_POW2``
> > +    The scale is based on power of 2. It is used for measurement of memory size.
> > +
> > +The ``exponent`` field is the scale of corresponding statistics data. For
> > +example, if the unit is ``KVM_STATS_UNIT_BYTES``, the base is
> > +``KVM_STATS_BASE_POW2``, the ``exponent`` is 10, then we know that the real
> > +unit of the statistics data is KBytes a.k.a pow(2, 10) = 1024 bytes.
> > +
> > +The ``size`` field is the number of values of this statistics data. It is in the
> > +unit of ``unsigned long`` for VM or ``__u64`` for VCPU.
> > +
> > +The ``unused1`` and ``unused2`` fields are reserved for future
> > +support for other types of statistics data, like log/linear histogram.
> > +
> > +The ``name`` field points to the name string of the statistics data. The name
> > +string starts at the end of ``struct kvm_stats_desc``.
> > +The maximum length (including trailing '\0') is indicated by ``name_size``
> > +in ``struct kvm_stats_header``.
> > +
> > +The Stats Data block contains an array of data values of type ``struct
> > +kvm_vm_stats_data`` or ``struct kvm_vcpu_stats_data``. It would be read by
> > +user space periodically to pull statistics data.
> > +The order of data value in Stats Data block is the same as the order of
> > +descriptors in Descriptors block.
> > +  * Statistics data for VM::
> > +
> > +       struct kvm_vm_stats_data {
> > +               unsigned long value[0];
> > +       };
> > +
> > +  * Statistics data for VCPU::
> > +
> > +       struct kvm_vcpu_stats_data {
> > +               __u64 value[0];
> > +       };
> > +
> >  5. The kvm_run structure
> >  ========================
> >
> > @@ -6891,3 +7063,11 @@ This capability is always enabled.
> >  This capability indicates that the KVM virtual PTP service is
> >  supported in the host. A VMM can check whether the service is
> >  available to the guest on migration.
> > +
> > +8.33 KVM_CAP_STATS_BINARY_FD
> > +----------------------------
> > +
> > +:Architectures: all
> > +
> > +This capability indicates the feature that user space can create get a file
> > +descriptor for every VM and VCPU to read statistics data in binary format.
>
> nit: user space -> userspace (it's spelled that way throughout this document)
>
> Cheers,
> /fuad
>
> > --
> > 2.32.0.rc1.229.g3e70b5a671-goog
> >
> > _______________________________________________
> > kvmarm mailing list
> > kvmarm@lists.cs.columbia.edu
> > https://lists.cs.columbia.edu/mailman/listinfo/kvmarm
Thanks for the review!
Jing

^ permalink raw reply	[flat|nested] 32+ messages in thread

end of thread, other threads:[~2021-06-14 13:20 UTC | newest]

Thread overview: 32+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2021-06-03 21:14 [PATCH v7 0/4] KVM statistics data fd-based binary interface Jing Zhang
2021-06-03 21:14 ` [PATCH v7 1/4] KVM: stats: Separate generic stats from architecture specific ones Jing Zhang
2021-06-07 19:22   ` Krish Sadhukhan
2021-06-11  6:57   ` Christian Borntraeger
2021-06-11 10:51     ` Paolo Bonzini
2021-06-11 12:00       ` Christian Borntraeger
2021-06-11 12:03         ` Paolo Bonzini
2021-06-11 12:08           ` Christian Borntraeger
2021-06-11 12:14             ` Paolo Bonzini
2021-06-03 21:14 ` [PATCH v7 2/4] KVM: stats: Add fd-based API to read binary stats data Jing Zhang
2021-06-07 19:22   ` Krish Sadhukhan
2021-06-07 22:10     ` Jing Zhang
2021-06-10 16:23   ` Paolo Bonzini
2021-06-10 17:26     ` Jing Zhang
2021-06-10 22:47     ` Jing Zhang
2021-06-11 10:32       ` Paolo Bonzini
2021-06-10 16:42   ` Paolo Bonzini
2021-06-10 17:27     ` Jing Zhang
2021-06-13 15:19   ` Fuad Tabba
2021-06-13 16:42     ` Jing Zhang
2021-06-03 21:14 ` [PATCH v7 3/4] KVM: stats: Add documentation for statistics data binary interface Jing Zhang
2021-06-07 19:22   ` Krish Sadhukhan
2021-06-07 22:09     ` Jing Zhang
2021-06-14  7:56   ` Fuad Tabba
2021-06-14 13:19     ` Jing Zhang
2021-06-03 21:14 ` [PATCH v7 4/4] KVM: selftests: Add selftest for KVM " Jing Zhang
2021-06-07 19:23   ` Krish Sadhukhan
2021-06-07 22:12     ` Jing Zhang
2021-06-10 16:46 ` [PATCH v7 0/4] KVM statistics data fd-based " Paolo Bonzini
2021-06-10 17:30   ` Jing Zhang
2021-06-11  6:57 ` Christian Borntraeger
2021-06-11 11:02   ` Paolo Bonzini

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).