linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [PATCH v7 0/3] KVM: perf: kvm events analysis tool
@ 2012-08-27  9:51 Dong Hao
  2012-08-27  9:51 ` [PATCH v7 1/3] KVM: x86: export svm/vmx exit code and vector code to userspace Dong Hao
                   ` (3 more replies)
  0 siblings, 4 replies; 26+ messages in thread
From: Dong Hao @ 2012-08-27  9:51 UTC (permalink / raw)
  To: avi, acme, mtosatti, mingo; +Cc: xiaoguangrong, linux-kernel, kvm

From: Xiao Guangrong <xiaoguangrong@linux.vnet.ibm.com>

Changelog:
- rebased it on Arnaldo's newest git tree perf/core branch

the change from Arnaldo's comments:
- directly get event from evsel->tp_format
- remove die() and return the proper error code
- rename thread->private to thread->priv

the change from David's comments:
- use is_valid_tracepoint instead of kvm_events_exist 

This patchset introduces a perf-based tool (perf kvm stat record/report)
which can analyze kvm events more smartly. Below is the presentation slice
on 2012 Japan LinuxCon:
http://events.linuxfoundation.org/images/stories/pdf/lcjp2012_guangrong.pdf
You can get more details from it. If any questions/comments, please feel free
to let us know.

This patchset is based on Arnaldo's git tree perf/core branch, and patch 2
is just doing the "improvement" work, which can be picked up independently.


Usage:
- kvm stat
  run a command and gather performance counter statistics, it is the alias of
  perf stat

- trace kvm events:
  perf kvm stat record, or, if other tracepoints are interesting as well, we
  can append the events like this:
  perf kvm stat record -e kvm:*
  If many guests are running, we can track the specified guest by using -p or
  --pid

- show the result:
  perf kvm stat report

The output example is following:
# pgrep qemu-kvm
26071
32253
32564

total 3 guests are running on the host

Then, track the guest whose pid is 26071:
# ./perf kvm stat record -p 26071
^C[ perf record: Woken up 9 times to write data ]
[ perf record: Captured and wrote 24.903 MB perf.data.guest (~1088034 samples) ]

See the vmexit events:
# ./perf kvm stat report --event=vmexit

Analyze events for all VCPUs:

             VM-EXIT    Samples  Samples%     Time%         Avg time

         APIC_ACCESS      65381    66.58%     5.95%     37.72us ( +-   6.54% )
  EXTERNAL_INTERRUPT      16031    16.32%     3.06%     79.11us ( +-   7.34% )
               CPUID       5360     5.46%     0.06%      4.50us ( +-  35.07% )
                 HLT       4496     4.58%    90.75%   8360.34us ( +-   5.22% )
       EPT_VIOLATION       2667     2.72%     0.04%      5.49us ( +-   5.05% )
   PENDING_INTERRUPT       2242     2.28%     0.03%      5.25us ( +-   2.96% )
       EXCEPTION_NMI       1332     1.36%     0.02%      6.53us ( +-   6.51% )
      IO_INSTRUCTION        383     0.39%     0.09%     93.39us ( +-  40.92% )
           CR_ACCESS        310     0.32%     0.00%      6.10us ( +-   3.95% )

Total Samples:98202, Total events handled time:41419293.63us.

See the mmio events:
# ./perf kvm stat report --event=mmio

Analyze events for all VCPUs:

         MMIO Access    Samples  Samples%     Time%         Avg time

        0xfee00380:W      58686    90.21%    15.67%      4.95us ( +-   2.96% )
        0xfee00300:R       2124     3.26%     1.48%     12.93us ( +-  14.75% )
        0xfee00310:W       2124     3.26%     0.34%      3.00us ( +-   1.33% )
        0xfee00300:W       2123     3.26%    82.50%    720.68us ( +-  10.24% )

Total Samples:65057, Total events handled time:1854470.45us.

See the ioport event:
# ./perf kvm stat report --event=ioport

Analyze events for all VCPUs:

      IO Port Access    Samples  Samples%     Time%         Avg time

         0xc090:POUT        383   100.00%   100.00%     89.00us ( +-  42.94% )

Total Samples:383, Total events handled time:34085.56us.

And, --vcpu is used to track the specified vcpu and --key is used to sort the
result:
# ./perf kvm stat report --event=vmexit --vcpu=0 --key=time

Analyze events for VCPU 0:

             VM-EXIT    Samples  Samples%     Time%         Avg time

                 HLT        551     5.05%    94.81%   9501.72us ( +-  12.52% )
  EXTERNAL_INTERRUPT       1390    12.74%     2.39%     94.80us ( +-  20.92% )
         APIC_ACCESS       6186    56.68%     2.62%     23.41us ( +-  23.62% )
      IO_INSTRUCTION         17     0.16%     0.01%     20.39us ( +-  22.33% )
       EXCEPTION_NMI         94     0.86%     0.01%      6.07us ( +-   7.13% )
   PENDING_INTERRUPT        199     1.82%     0.02%      5.48us ( +-   4.36% )
           CR_ACCESS         52     0.48%     0.00%      4.89us ( +-   4.09% )
       EPT_VIOLATION       2057    18.85%     0.12%      3.15us ( +-   1.33% )
               CPUID        368     3.37%     0.02%      2.82us ( +-   2.79% )

Total Samples:10914, Total events handled time:5521782.02us.


Dong Hao (3):
  KVM: x86: export svm/vmx exit code and vector code to userspace
  KVM: x86: trace mmio begin and complete
  KVM: perf: kvm events analysis tool

 arch/x86/include/asm/kvm_host.h       |   36 +-
 arch/x86/include/asm/svm.h            |  205 +++++---
 arch/x86/include/asm/vmx.h            |  126 +++--
 arch/x86/kvm/trace.h                  |   89 ----
 arch/x86/kvm/x86.c                    |   32 +-
 include/trace/events/kvm.h            |   37 ++
 tools/perf/Documentation/perf-kvm.txt |   30 +-
 tools/perf/MANIFEST                   |    3 +
 tools/perf/builtin-kvm.c              |  889 ++++++++++++++++++++++++++++++++-
 tools/perf/util/header.c              |   54 ++-
 tools/perf/util/header.h              |    1 +
 tools/perf/util/thread.h              |    2 +
 12 files changed, 1264 insertions(+), 240 deletions(-)

-- 
1.7.2.5


^ permalink raw reply	[flat|nested] 26+ messages in thread

* [PATCH v7 1/3] KVM: x86: export svm/vmx exit code and vector code to userspace
  2012-08-27  9:51 [PATCH v7 0/3] KVM: perf: kvm events analysis tool Dong Hao
@ 2012-08-27  9:51 ` Dong Hao
  2012-09-03 11:13   ` Avi Kivity
  2012-08-27  9:51 ` [PATCH v7 2/3] KVM: x86: trace mmio begin and complete Dong Hao
                   ` (2 subsequent siblings)
  3 siblings, 1 reply; 26+ messages in thread
From: Dong Hao @ 2012-08-27  9:51 UTC (permalink / raw)
  To: avi, acme, mtosatti, mingo; +Cc: xiaoguangrong, linux-kernel, kvm

From: Xiao Guangrong <xiaoguangrong@linux.vnet.ibm.com>

Exporting KVM exit information to userspace to be consumed by perf.

[ Dong Hao <haodong@linux.vnet.ibm.com>: rebase it on acme's git tree ]
Signed-off-by: Xiao Guangrong <xiaoguangrong@linux.vnet.ibm.com>
Signed-off-by: Dong Hao <haodong@linux.vnet.ibm.com>
---
 arch/x86/include/asm/kvm_host.h |   36 ++++---
 arch/x86/include/asm/svm.h      |  205 +++++++++++++++++++++++++--------------
 arch/x86/include/asm/vmx.h      |  126 ++++++++++++++++--------
 arch/x86/kvm/trace.h            |   89 -----------------
 4 files changed, 234 insertions(+), 222 deletions(-)

diff --git a/arch/x86/include/asm/kvm_host.h b/arch/x86/include/asm/kvm_host.h
index 09155d6..ad2d229 100644
--- a/arch/x86/include/asm/kvm_host.h
+++ b/arch/x86/include/asm/kvm_host.h
@@ -11,6 +11,24 @@
 #ifndef _ASM_X86_KVM_HOST_H
 #define _ASM_X86_KVM_HOST_H
 
+#define DE_VECTOR 0
+#define DB_VECTOR 1
+#define BP_VECTOR 3
+#define OF_VECTOR 4
+#define BR_VECTOR 5
+#define UD_VECTOR 6
+#define NM_VECTOR 7
+#define DF_VECTOR 8
+#define TS_VECTOR 10
+#define NP_VECTOR 11
+#define SS_VECTOR 12
+#define GP_VECTOR 13
+#define PF_VECTOR 14
+#define MF_VECTOR 16
+#define MC_VECTOR 18
+
+#ifdef __KERNEL__
+
 #include <linux/types.h>
 #include <linux/mm.h>
 #include <linux/mmu_notifier.h>
@@ -75,22 +93,6 @@
 #define KVM_HPAGE_MASK(x)	(~(KVM_HPAGE_SIZE(x) - 1))
 #define KVM_PAGES_PER_HPAGE(x)	(KVM_HPAGE_SIZE(x) / PAGE_SIZE)
 
-#define DE_VECTOR 0
-#define DB_VECTOR 1
-#define BP_VECTOR 3
-#define OF_VECTOR 4
-#define BR_VECTOR 5
-#define UD_VECTOR 6
-#define NM_VECTOR 7
-#define DF_VECTOR 8
-#define TS_VECTOR 10
-#define NP_VECTOR 11
-#define SS_VECTOR 12
-#define GP_VECTOR 13
-#define PF_VECTOR 14
-#define MF_VECTOR 16
-#define MC_VECTOR 18
-
 #define SELECTOR_TI_MASK (1 << 2)
 #define SELECTOR_RPL_MASK 0x03
 
@@ -994,4 +996,6 @@ int kvm_pmu_read_pmc(struct kvm_vcpu *vcpu, unsigned pmc, u64 *data);
 void kvm_handle_pmu_event(struct kvm_vcpu *vcpu);
 void kvm_deliver_pmi(struct kvm_vcpu *vcpu);
 
+#endif
+
 #endif /* _ASM_X86_KVM_HOST_H */
diff --git a/arch/x86/include/asm/svm.h b/arch/x86/include/asm/svm.h
index f2b83bc..cdf5674 100644
--- a/arch/x86/include/asm/svm.h
+++ b/arch/x86/include/asm/svm.h
@@ -1,6 +1,135 @@
 #ifndef __SVM_H
 #define __SVM_H
 
+#define SVM_EXIT_READ_CR0      0x000
+#define SVM_EXIT_READ_CR3      0x003
+#define SVM_EXIT_READ_CR4      0x004
+#define SVM_EXIT_READ_CR8      0x008
+#define SVM_EXIT_WRITE_CR0     0x010
+#define SVM_EXIT_WRITE_CR3     0x013
+#define SVM_EXIT_WRITE_CR4     0x014
+#define SVM_EXIT_WRITE_CR8     0x018
+#define SVM_EXIT_READ_DR0      0x020
+#define SVM_EXIT_READ_DR1      0x021
+#define SVM_EXIT_READ_DR2      0x022
+#define SVM_EXIT_READ_DR3      0x023
+#define SVM_EXIT_READ_DR4      0x024
+#define SVM_EXIT_READ_DR5      0x025
+#define SVM_EXIT_READ_DR6      0x026
+#define SVM_EXIT_READ_DR7      0x027
+#define SVM_EXIT_WRITE_DR0     0x030
+#define SVM_EXIT_WRITE_DR1     0x031
+#define SVM_EXIT_WRITE_DR2     0x032
+#define SVM_EXIT_WRITE_DR3     0x033
+#define SVM_EXIT_WRITE_DR4     0x034
+#define SVM_EXIT_WRITE_DR5     0x035
+#define SVM_EXIT_WRITE_DR6     0x036
+#define SVM_EXIT_WRITE_DR7     0x037
+#define SVM_EXIT_EXCP_BASE     0x040
+#define SVM_EXIT_INTR          0x060
+#define SVM_EXIT_NMI           0x061
+#define SVM_EXIT_SMI           0x062
+#define SVM_EXIT_INIT          0x063
+#define SVM_EXIT_VINTR         0x064
+#define SVM_EXIT_CR0_SEL_WRITE 0x065
+#define SVM_EXIT_IDTR_READ     0x066
+#define SVM_EXIT_GDTR_READ     0x067
+#define SVM_EXIT_LDTR_READ     0x068
+#define SVM_EXIT_TR_READ       0x069
+#define SVM_EXIT_IDTR_WRITE    0x06a
+#define SVM_EXIT_GDTR_WRITE    0x06b
+#define SVM_EXIT_LDTR_WRITE    0x06c
+#define SVM_EXIT_TR_WRITE      0x06d
+#define SVM_EXIT_RDTSC         0x06e
+#define SVM_EXIT_RDPMC         0x06f
+#define SVM_EXIT_PUSHF         0x070
+#define SVM_EXIT_POPF          0x071
+#define SVM_EXIT_CPUID         0x072
+#define SVM_EXIT_RSM           0x073
+#define SVM_EXIT_IRET          0x074
+#define SVM_EXIT_SWINT         0x075
+#define SVM_EXIT_INVD          0x076
+#define SVM_EXIT_PAUSE         0x077
+#define SVM_EXIT_HLT           0x078
+#define SVM_EXIT_INVLPG        0x079
+#define SVM_EXIT_INVLPGA       0x07a
+#define SVM_EXIT_IOIO          0x07b
+#define SVM_EXIT_MSR           0x07c
+#define SVM_EXIT_TASK_SWITCH   0x07d
+#define SVM_EXIT_FERR_FREEZE   0x07e
+#define SVM_EXIT_SHUTDOWN      0x07f
+#define SVM_EXIT_VMRUN         0x080
+#define SVM_EXIT_VMMCALL       0x081
+#define SVM_EXIT_VMLOAD        0x082
+#define SVM_EXIT_VMSAVE        0x083
+#define SVM_EXIT_STGI          0x084
+#define SVM_EXIT_CLGI          0x085
+#define SVM_EXIT_SKINIT        0x086
+#define SVM_EXIT_RDTSCP        0x087
+#define SVM_EXIT_ICEBP         0x088
+#define SVM_EXIT_WBINVD        0x089
+#define SVM_EXIT_MONITOR       0x08a
+#define SVM_EXIT_MWAIT         0x08b
+#define SVM_EXIT_MWAIT_COND    0x08c
+#define SVM_EXIT_XSETBV        0x08d
+#define SVM_EXIT_NPF           0x400
+
+#define SVM_EXIT_ERR           -1
+
+#define SVM_EXIT_REASONS \
+	{ SVM_EXIT_READ_CR0,    "read_cr0" }, \
+	{ SVM_EXIT_READ_CR3,    "read_cr3" }, \
+	{ SVM_EXIT_READ_CR4,    "read_cr4" }, \
+	{ SVM_EXIT_READ_CR8,    "read_cr8" }, \
+	{ SVM_EXIT_WRITE_CR0,   "write_cr0" }, \
+	{ SVM_EXIT_WRITE_CR3,   "write_cr3" }, \
+	{ SVM_EXIT_WRITE_CR4,   "write_cr4" }, \
+	{ SVM_EXIT_WRITE_CR8,   "write_cr8" }, \
+	{ SVM_EXIT_READ_DR0,    "read_dr0" }, \
+	{ SVM_EXIT_READ_DR1,    "read_dr1" }, \
+	{ SVM_EXIT_READ_DR2,    "read_dr2" }, \
+	{ SVM_EXIT_READ_DR3,    "read_dr3" }, \
+	{ SVM_EXIT_WRITE_DR0,   "write_dr0" }, \
+	{ SVM_EXIT_WRITE_DR1,   "write_dr1" }, \
+	{ SVM_EXIT_WRITE_DR2,   "write_dr2" }, \
+	{ SVM_EXIT_WRITE_DR3,   "write_dr3" }, \
+	{ SVM_EXIT_WRITE_DR5,   "write_dr5" }, \
+	{ SVM_EXIT_WRITE_DR7,   "write_dr7" }, \
+	{ SVM_EXIT_EXCP_BASE + DB_VECTOR,       "DB excp" }, \
+	{ SVM_EXIT_EXCP_BASE + BP_VECTOR,       "BP excp" }, \
+	{ SVM_EXIT_EXCP_BASE + UD_VECTOR,       "UD excp" }, \
+	{ SVM_EXIT_EXCP_BASE + PF_VECTOR,       "PF excp" }, \
+	{ SVM_EXIT_EXCP_BASE + NM_VECTOR,       "NM excp" }, \
+	{ SVM_EXIT_EXCP_BASE + MC_VECTOR,       "MC excp" }, \
+	{ SVM_EXIT_INTR,        "interrupt" }, \
+	{ SVM_EXIT_NMI,         "nmi" }, \
+	{ SVM_EXIT_SMI,         "smi" }, \
+	{ SVM_EXIT_INIT,        "init" }, \
+	{ SVM_EXIT_VINTR,       "vintr" }, \
+	{ SVM_EXIT_CPUID,       "cpuid" }, \
+	{ SVM_EXIT_INVD,        "invd" }, \
+	{ SVM_EXIT_HLT,         "hlt" }, \
+	{ SVM_EXIT_INVLPG,      "invlpg" }, \
+	{ SVM_EXIT_INVLPGA,     "invlpga" }, \
+	{ SVM_EXIT_IOIO,        "io" }, \
+	{ SVM_EXIT_MSR,         "msr" }, \
+	{ SVM_EXIT_TASK_SWITCH, "task_switch" }, \
+	{ SVM_EXIT_SHUTDOWN,    "shutdown" }, \
+	{ SVM_EXIT_VMRUN,       "vmrun" }, \
+	{ SVM_EXIT_VMMCALL,     "hypercall" }, \
+	{ SVM_EXIT_VMLOAD,      "vmload" }, \
+	{ SVM_EXIT_VMSAVE,      "vmsave" }, \
+	{ SVM_EXIT_STGI,        "stgi" }, \
+	{ SVM_EXIT_CLGI,        "clgi" }, \
+	{ SVM_EXIT_SKINIT,      "skinit" }, \
+	{ SVM_EXIT_WBINVD,      "wbinvd" }, \
+	{ SVM_EXIT_MONITOR,     "monitor" }, \
+	{ SVM_EXIT_MWAIT,       "mwait" }, \
+	{ SVM_EXIT_XSETBV,      "xsetbv" }, \
+	{ SVM_EXIT_NPF,         "npf" }
+
+#ifdef __KERNEL__
+
 enum {
 	INTERCEPT_INTR,
 	INTERCEPT_NMI,
@@ -264,81 +393,6 @@ struct __attribute__ ((__packed__)) vmcb {
 
 #define SVM_EXITINFO_REG_MASK 0x0F
 
-#define	SVM_EXIT_READ_CR0 	0x000
-#define	SVM_EXIT_READ_CR3 	0x003
-#define	SVM_EXIT_READ_CR4 	0x004
-#define	SVM_EXIT_READ_CR8 	0x008
-#define	SVM_EXIT_WRITE_CR0 	0x010
-#define	SVM_EXIT_WRITE_CR3 	0x013
-#define	SVM_EXIT_WRITE_CR4 	0x014
-#define	SVM_EXIT_WRITE_CR8 	0x018
-#define	SVM_EXIT_READ_DR0 	0x020
-#define	SVM_EXIT_READ_DR1 	0x021
-#define	SVM_EXIT_READ_DR2 	0x022
-#define	SVM_EXIT_READ_DR3 	0x023
-#define	SVM_EXIT_READ_DR4 	0x024
-#define	SVM_EXIT_READ_DR5 	0x025
-#define	SVM_EXIT_READ_DR6 	0x026
-#define	SVM_EXIT_READ_DR7 	0x027
-#define	SVM_EXIT_WRITE_DR0 	0x030
-#define	SVM_EXIT_WRITE_DR1 	0x031
-#define	SVM_EXIT_WRITE_DR2 	0x032
-#define	SVM_EXIT_WRITE_DR3 	0x033
-#define	SVM_EXIT_WRITE_DR4 	0x034
-#define	SVM_EXIT_WRITE_DR5 	0x035
-#define	SVM_EXIT_WRITE_DR6 	0x036
-#define	SVM_EXIT_WRITE_DR7 	0x037
-#define SVM_EXIT_EXCP_BASE      0x040
-#define SVM_EXIT_INTR		0x060
-#define SVM_EXIT_NMI		0x061
-#define SVM_EXIT_SMI		0x062
-#define SVM_EXIT_INIT		0x063
-#define SVM_EXIT_VINTR		0x064
-#define SVM_EXIT_CR0_SEL_WRITE	0x065
-#define SVM_EXIT_IDTR_READ	0x066
-#define SVM_EXIT_GDTR_READ	0x067
-#define SVM_EXIT_LDTR_READ	0x068
-#define SVM_EXIT_TR_READ	0x069
-#define SVM_EXIT_IDTR_WRITE	0x06a
-#define SVM_EXIT_GDTR_WRITE	0x06b
-#define SVM_EXIT_LDTR_WRITE	0x06c
-#define SVM_EXIT_TR_WRITE	0x06d
-#define SVM_EXIT_RDTSC		0x06e
-#define SVM_EXIT_RDPMC		0x06f
-#define SVM_EXIT_PUSHF		0x070
-#define SVM_EXIT_POPF		0x071
-#define SVM_EXIT_CPUID		0x072
-#define SVM_EXIT_RSM		0x073
-#define SVM_EXIT_IRET		0x074
-#define SVM_EXIT_SWINT		0x075
-#define SVM_EXIT_INVD		0x076
-#define SVM_EXIT_PAUSE		0x077
-#define SVM_EXIT_HLT		0x078
-#define SVM_EXIT_INVLPG		0x079
-#define SVM_EXIT_INVLPGA	0x07a
-#define SVM_EXIT_IOIO		0x07b
-#define SVM_EXIT_MSR		0x07c
-#define SVM_EXIT_TASK_SWITCH	0x07d
-#define SVM_EXIT_FERR_FREEZE	0x07e
-#define SVM_EXIT_SHUTDOWN	0x07f
-#define SVM_EXIT_VMRUN		0x080
-#define SVM_EXIT_VMMCALL	0x081
-#define SVM_EXIT_VMLOAD		0x082
-#define SVM_EXIT_VMSAVE		0x083
-#define SVM_EXIT_STGI		0x084
-#define SVM_EXIT_CLGI		0x085
-#define SVM_EXIT_SKINIT		0x086
-#define SVM_EXIT_RDTSCP		0x087
-#define SVM_EXIT_ICEBP		0x088
-#define SVM_EXIT_WBINVD		0x089
-#define SVM_EXIT_MONITOR	0x08a
-#define SVM_EXIT_MWAIT		0x08b
-#define SVM_EXIT_MWAIT_COND	0x08c
-#define SVM_EXIT_XSETBV		0x08d
-#define SVM_EXIT_NPF  		0x400
-
-#define SVM_EXIT_ERR		-1
-
 #define SVM_CR0_SELECTIVE_MASK (X86_CR0_TS | X86_CR0_MP)
 
 #define SVM_VMLOAD ".byte 0x0f, 0x01, 0xda"
@@ -350,3 +404,4 @@ struct __attribute__ ((__packed__)) vmcb {
 
 #endif
 
+#endif
diff --git a/arch/x86/include/asm/vmx.h b/arch/x86/include/asm/vmx.h
index 74fcb96..61e04e9 100644
--- a/arch/x86/include/asm/vmx.h
+++ b/arch/x86/include/asm/vmx.h
@@ -25,6 +25,88 @@
  *
  */
 
+#define VMX_EXIT_REASONS_FAILED_VMENTRY         0x80000000
+
+#define EXIT_REASON_EXCEPTION_NMI       0
+#define EXIT_REASON_EXTERNAL_INTERRUPT  1
+#define EXIT_REASON_TRIPLE_FAULT        2
+
+#define EXIT_REASON_PENDING_INTERRUPT   7
+#define EXIT_REASON_NMI_WINDOW          8
+#define EXIT_REASON_TASK_SWITCH         9
+#define EXIT_REASON_CPUID               10
+#define EXIT_REASON_HLT                 12
+#define EXIT_REASON_INVD                13
+#define EXIT_REASON_INVLPG              14
+#define EXIT_REASON_RDPMC               15
+#define EXIT_REASON_RDTSC               16
+#define EXIT_REASON_VMCALL              18
+#define EXIT_REASON_VMCLEAR             19
+#define EXIT_REASON_VMLAUNCH            20
+#define EXIT_REASON_VMPTRLD             21
+#define EXIT_REASON_VMPTRST             22
+#define EXIT_REASON_VMREAD              23
+#define EXIT_REASON_VMRESUME            24
+#define EXIT_REASON_VMWRITE             25
+#define EXIT_REASON_VMOFF               26
+#define EXIT_REASON_VMON                27
+#define EXIT_REASON_CR_ACCESS           28
+#define EXIT_REASON_DR_ACCESS           29
+#define EXIT_REASON_IO_INSTRUCTION      30
+#define EXIT_REASON_MSR_READ            31
+#define EXIT_REASON_MSR_WRITE           32
+#define EXIT_REASON_INVALID_STATE       33
+#define EXIT_REASON_MWAIT_INSTRUCTION   36
+#define EXIT_REASON_MONITOR_INSTRUCTION 39
+#define EXIT_REASON_PAUSE_INSTRUCTION   40
+#define EXIT_REASON_MCE_DURING_VMENTRY  41
+#define EXIT_REASON_TPR_BELOW_THRESHOLD 43
+#define EXIT_REASON_APIC_ACCESS         44
+#define EXIT_REASON_EPT_VIOLATION       48
+#define EXIT_REASON_EPT_MISCONFIG       49
+#define EXIT_REASON_WBINVD              54
+#define EXIT_REASON_XSETBV              55
+#define EXIT_REASON_INVPCID             58
+
+#define VMX_EXIT_REASONS \
+   { EXIT_REASON_EXCEPTION_NMI,         "EXCEPTION_NMI" }, \
+   { EXIT_REASON_EXTERNAL_INTERRUPT,    "EXTERNAL_INTERRUPT" }, \
+   { EXIT_REASON_TRIPLE_FAULT,          "TRIPLE_FAULT" }, \
+   { EXIT_REASON_PENDING_INTERRUPT,     "PENDING_INTERRUPT" }, \
+   { EXIT_REASON_NMI_WINDOW,            "NMI_WINDOW" }, \
+   { EXIT_REASON_TASK_SWITCH,           "TASK_SWITCH" }, \
+   { EXIT_REASON_CPUID,                 "CPUID" }, \
+   { EXIT_REASON_HLT,                   "HLT" }, \
+   { EXIT_REASON_INVLPG,                "INVLPG" }, \
+   { EXIT_REASON_RDPMC,                 "RDPMC" }, \
+   { EXIT_REASON_RDTSC,                 "RDTSC" }, \
+   { EXIT_REASON_VMCALL,                "VMCALL" }, \
+   { EXIT_REASON_VMCLEAR,               "VMCLEAR" }, \
+   { EXIT_REASON_VMLAUNCH,              "VMLAUNCH" }, \
+   { EXIT_REASON_VMPTRLD,               "VMPTRLD" }, \
+   { EXIT_REASON_VMPTRST,               "VMPTRST" }, \
+   { EXIT_REASON_VMREAD,                "VMREAD" }, \
+   { EXIT_REASON_VMRESUME,              "VMRESUME" }, \
+   { EXIT_REASON_VMWRITE,               "VMWRITE" }, \
+   { EXIT_REASON_VMOFF,                 "VMOFF" }, \
+   { EXIT_REASON_VMON,                  "VMON" }, \
+   { EXIT_REASON_CR_ACCESS,             "CR_ACCESS" }, \
+   { EXIT_REASON_DR_ACCESS,             "DR_ACCESS" }, \
+   { EXIT_REASON_IO_INSTRUCTION,        "IO_INSTRUCTION" }, \
+   { EXIT_REASON_MSR_READ,              "MSR_READ" }, \
+   { EXIT_REASON_MSR_WRITE,             "MSR_WRITE" }, \
+   { EXIT_REASON_MWAIT_INSTRUCTION,     "MWAIT_INSTRUCTION" }, \
+   { EXIT_REASON_MONITOR_INSTRUCTION,   "MONITOR_INSTRUCTION" }, \
+   { EXIT_REASON_PAUSE_INSTRUCTION,     "PAUSE_INSTRUCTION" }, \
+   { EXIT_REASON_MCE_DURING_VMENTRY,    "MCE_DURING_VMENTRY" }, \
+   { EXIT_REASON_TPR_BELOW_THRESHOLD,   "TPR_BELOW_THRESHOLD" }, \
+   { EXIT_REASON_APIC_ACCESS,           "APIC_ACCESS" }, \
+   { EXIT_REASON_EPT_VIOLATION,         "EPT_VIOLATION" }, \
+   { EXIT_REASON_EPT_MISCONFIG,         "EPT_MISCONFIG" }, \
+   { EXIT_REASON_WBINVD,                "WBINVD" }
+
+#ifdef __KERNEL__
+
 #include <linux/types.h>
 
 /*
@@ -241,48 +323,6 @@ enum vmcs_field {
 	HOST_RIP                        = 0x00006c16,
 };
 
-#define VMX_EXIT_REASONS_FAILED_VMENTRY         0x80000000
-
-#define EXIT_REASON_EXCEPTION_NMI       0
-#define EXIT_REASON_EXTERNAL_INTERRUPT  1
-#define EXIT_REASON_TRIPLE_FAULT        2
-
-#define EXIT_REASON_PENDING_INTERRUPT   7
-#define EXIT_REASON_NMI_WINDOW		8
-#define EXIT_REASON_TASK_SWITCH         9
-#define EXIT_REASON_CPUID               10
-#define EXIT_REASON_HLT                 12
-#define EXIT_REASON_INVD                13
-#define EXIT_REASON_INVLPG              14
-#define EXIT_REASON_RDPMC               15
-#define EXIT_REASON_RDTSC               16
-#define EXIT_REASON_VMCALL              18
-#define EXIT_REASON_VMCLEAR             19
-#define EXIT_REASON_VMLAUNCH            20
-#define EXIT_REASON_VMPTRLD             21
-#define EXIT_REASON_VMPTRST             22
-#define EXIT_REASON_VMREAD              23
-#define EXIT_REASON_VMRESUME            24
-#define EXIT_REASON_VMWRITE             25
-#define EXIT_REASON_VMOFF               26
-#define EXIT_REASON_VMON                27
-#define EXIT_REASON_CR_ACCESS           28
-#define EXIT_REASON_DR_ACCESS           29
-#define EXIT_REASON_IO_INSTRUCTION      30
-#define EXIT_REASON_MSR_READ            31
-#define EXIT_REASON_MSR_WRITE           32
-#define EXIT_REASON_INVALID_STATE	33
-#define EXIT_REASON_MWAIT_INSTRUCTION   36
-#define EXIT_REASON_MONITOR_INSTRUCTION 39
-#define EXIT_REASON_PAUSE_INSTRUCTION   40
-#define EXIT_REASON_MCE_DURING_VMENTRY	 41
-#define EXIT_REASON_TPR_BELOW_THRESHOLD 43
-#define EXIT_REASON_APIC_ACCESS         44
-#define EXIT_REASON_EPT_VIOLATION       48
-#define EXIT_REASON_EPT_MISCONFIG       49
-#define EXIT_REASON_WBINVD		54
-#define EXIT_REASON_XSETBV		55
-#define EXIT_REASON_INVPCID		58
 
 /*
  * Interruption-information format
@@ -488,3 +528,5 @@ enum vm_instruction_error_number {
 };
 
 #endif
+
+#endif
diff --git a/arch/x86/kvm/trace.h b/arch/x86/kvm/trace.h
index a71faf7..bca63f0 100644
--- a/arch/x86/kvm/trace.h
+++ b/arch/x86/kvm/trace.h
@@ -183,95 +183,6 @@ TRACE_EVENT(kvm_apic,
 #define KVM_ISA_VMX   1
 #define KVM_ISA_SVM   2
 
-#define VMX_EXIT_REASONS \
-	{ EXIT_REASON_EXCEPTION_NMI,		"EXCEPTION_NMI" }, \
-	{ EXIT_REASON_EXTERNAL_INTERRUPT,	"EXTERNAL_INTERRUPT" }, \
-	{ EXIT_REASON_TRIPLE_FAULT,		"TRIPLE_FAULT" }, \
-	{ EXIT_REASON_PENDING_INTERRUPT,	"PENDING_INTERRUPT" }, \
-	{ EXIT_REASON_NMI_WINDOW,		"NMI_WINDOW" }, \
-	{ EXIT_REASON_TASK_SWITCH,		"TASK_SWITCH" }, \
-	{ EXIT_REASON_CPUID,			"CPUID" }, \
-	{ EXIT_REASON_HLT,			"HLT" }, \
-	{ EXIT_REASON_INVLPG,			"INVLPG" }, \
-	{ EXIT_REASON_RDPMC,			"RDPMC" }, \
-	{ EXIT_REASON_RDTSC,			"RDTSC" }, \
-	{ EXIT_REASON_VMCALL,			"VMCALL" }, \
-	{ EXIT_REASON_VMCLEAR,			"VMCLEAR" }, \
-	{ EXIT_REASON_VMLAUNCH,			"VMLAUNCH" }, \
-	{ EXIT_REASON_VMPTRLD,			"VMPTRLD" }, \
-	{ EXIT_REASON_VMPTRST,			"VMPTRST" }, \
-	{ EXIT_REASON_VMREAD,			"VMREAD" }, \
-	{ EXIT_REASON_VMRESUME,			"VMRESUME" }, \
-	{ EXIT_REASON_VMWRITE,			"VMWRITE" }, \
-	{ EXIT_REASON_VMOFF,			"VMOFF" }, \
-	{ EXIT_REASON_VMON,			"VMON" }, \
-	{ EXIT_REASON_CR_ACCESS,		"CR_ACCESS" }, \
-	{ EXIT_REASON_DR_ACCESS,		"DR_ACCESS" }, \
-	{ EXIT_REASON_IO_INSTRUCTION,		"IO_INSTRUCTION" }, \
-	{ EXIT_REASON_MSR_READ,			"MSR_READ" }, \
-	{ EXIT_REASON_MSR_WRITE,		"MSR_WRITE" }, \
-	{ EXIT_REASON_MWAIT_INSTRUCTION,	"MWAIT_INSTRUCTION" }, \
-	{ EXIT_REASON_MONITOR_INSTRUCTION,	"MONITOR_INSTRUCTION" }, \
-	{ EXIT_REASON_PAUSE_INSTRUCTION,	"PAUSE_INSTRUCTION" }, \
-	{ EXIT_REASON_MCE_DURING_VMENTRY,	"MCE_DURING_VMENTRY" }, \
-	{ EXIT_REASON_TPR_BELOW_THRESHOLD,	"TPR_BELOW_THRESHOLD" },	\
-	{ EXIT_REASON_APIC_ACCESS,		"APIC_ACCESS" }, \
-	{ EXIT_REASON_EPT_VIOLATION,		"EPT_VIOLATION" }, \
-	{ EXIT_REASON_EPT_MISCONFIG,		"EPT_MISCONFIG" }, \
-	{ EXIT_REASON_WBINVD,			"WBINVD" }
-
-#define SVM_EXIT_REASONS \
-	{ SVM_EXIT_READ_CR0,			"read_cr0" }, \
-	{ SVM_EXIT_READ_CR3,			"read_cr3" }, \
-	{ SVM_EXIT_READ_CR4,			"read_cr4" }, \
-	{ SVM_EXIT_READ_CR8,			"read_cr8" }, \
-	{ SVM_EXIT_WRITE_CR0,			"write_cr0" }, \
-	{ SVM_EXIT_WRITE_CR3,			"write_cr3" }, \
-	{ SVM_EXIT_WRITE_CR4,			"write_cr4" }, \
-	{ SVM_EXIT_WRITE_CR8,			"write_cr8" }, \
-	{ SVM_EXIT_READ_DR0,			"read_dr0" }, \
-	{ SVM_EXIT_READ_DR1,			"read_dr1" }, \
-	{ SVM_EXIT_READ_DR2,			"read_dr2" }, \
-	{ SVM_EXIT_READ_DR3,			"read_dr3" }, \
-	{ SVM_EXIT_WRITE_DR0,			"write_dr0" }, \
-	{ SVM_EXIT_WRITE_DR1,			"write_dr1" }, \
-	{ SVM_EXIT_WRITE_DR2,			"write_dr2" }, \
-	{ SVM_EXIT_WRITE_DR3,			"write_dr3" }, \
-	{ SVM_EXIT_WRITE_DR5,			"write_dr5" }, \
-	{ SVM_EXIT_WRITE_DR7,			"write_dr7" }, \
-	{ SVM_EXIT_EXCP_BASE + DB_VECTOR,	"DB excp" }, \
-	{ SVM_EXIT_EXCP_BASE + BP_VECTOR,	"BP excp" }, \
-	{ SVM_EXIT_EXCP_BASE + UD_VECTOR,	"UD excp" }, \
-	{ SVM_EXIT_EXCP_BASE + PF_VECTOR,	"PF excp" }, \
-	{ SVM_EXIT_EXCP_BASE + NM_VECTOR,	"NM excp" }, \
-	{ SVM_EXIT_EXCP_BASE + MC_VECTOR,	"MC excp" }, \
-	{ SVM_EXIT_INTR,			"interrupt" }, \
-	{ SVM_EXIT_NMI,				"nmi" }, \
-	{ SVM_EXIT_SMI,				"smi" }, \
-	{ SVM_EXIT_INIT,			"init" }, \
-	{ SVM_EXIT_VINTR,			"vintr" }, \
-	{ SVM_EXIT_CPUID,			"cpuid" }, \
-	{ SVM_EXIT_INVD,			"invd" }, \
-	{ SVM_EXIT_HLT,				"hlt" }, \
-	{ SVM_EXIT_INVLPG,			"invlpg" }, \
-	{ SVM_EXIT_INVLPGA,			"invlpga" }, \
-	{ SVM_EXIT_IOIO,			"io" }, \
-	{ SVM_EXIT_MSR,				"msr" }, \
-	{ SVM_EXIT_TASK_SWITCH,			"task_switch" }, \
-	{ SVM_EXIT_SHUTDOWN,			"shutdown" }, \
-	{ SVM_EXIT_VMRUN,			"vmrun" }, \
-	{ SVM_EXIT_VMMCALL,			"hypercall" }, \
-	{ SVM_EXIT_VMLOAD,			"vmload" }, \
-	{ SVM_EXIT_VMSAVE,			"vmsave" }, \
-	{ SVM_EXIT_STGI,			"stgi" }, \
-	{ SVM_EXIT_CLGI,			"clgi" }, \
-	{ SVM_EXIT_SKINIT,			"skinit" }, \
-	{ SVM_EXIT_WBINVD,			"wbinvd" }, \
-	{ SVM_EXIT_MONITOR,			"monitor" }, \
-	{ SVM_EXIT_MWAIT,			"mwait" }, \
-	{ SVM_EXIT_XSETBV,			"xsetbv" }, \
-	{ SVM_EXIT_NPF,				"npf" }
-
 /*
  * Tracepoint for kvm guest exit:
  */
-- 
1.7.2.5


^ permalink raw reply related	[flat|nested] 26+ messages in thread

* [PATCH v7 2/3] KVM: x86: trace mmio begin and complete
  2012-08-27  9:51 [PATCH v7 0/3] KVM: perf: kvm events analysis tool Dong Hao
  2012-08-27  9:51 ` [PATCH v7 1/3] KVM: x86: export svm/vmx exit code and vector code to userspace Dong Hao
@ 2012-08-27  9:51 ` Dong Hao
  2012-09-03 11:07   ` Avi Kivity
  2012-08-27  9:51 ` [PATCH v7 3/3] KVM: perf: kvm events analysis tool Dong Hao
  2012-08-27  9:59 ` [PATCH v7 0/3] " Xiao Guangrong
  3 siblings, 1 reply; 26+ messages in thread
From: Dong Hao @ 2012-08-27  9:51 UTC (permalink / raw)
  To: avi, acme, mtosatti, mingo; +Cc: xiaoguangrong, linux-kernel, kvm

From: Xiao Guangrong <xiaoguangrong@linux.vnet.ibm.com>

'perf kvm stat record/report' will use kvm_exit and kvm_mmio(read...) to
calculate mmio read emulated time for the old kernel, in order to trace
mmio read event more exactly, we add kvm_mmio_begin to trace the time when
mmio read begins, also, add kvm_io_done to trace the time when mmio/pio is
completed

[ Dong Hao <haodong@linux.vnet.ibm.com>: rebase it on current kvm tree ]
Signed-off-by: Xiao Guangrong <xiaoguangrong@linux.vnet.ibm.com>
Signed-off-by: Dong Hao <haodong@linux.vnet.ibm.com>
---
 arch/x86/kvm/x86.c         |   32 ++++++++++++++++++++------------
 include/trace/events/kvm.h |   37 +++++++++++++++++++++++++++++++++++++
 2 files changed, 57 insertions(+), 12 deletions(-)

diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c
index 42bce48..b90394d 100644
--- a/arch/x86/kvm/x86.c
+++ b/arch/x86/kvm/x86.c
@@ -3828,9 +3828,12 @@ mmio:
 	/*
 	 * Is this MMIO handled locally?
 	 */
+	trace_kvm_mmio_begin(vcpu->vcpu_id, write, gpa);
 	handled = ops->read_write_mmio(vcpu, gpa, bytes, val);
-	if (handled == bytes)
+	if (handled == bytes) {
+		trace_kvm_io_done(vcpu->vcpu_id);
 		return X86EMUL_CONTINUE;
+	}
 
 	gpa += handled;
 	bytes -= handled;
@@ -4025,6 +4028,7 @@ static int emulator_pio_in_out(struct kvm_vcpu *vcpu, int size,
 	vcpu->arch.pio.size = size;
 
 	if (!kernel_pio(vcpu, vcpu->arch.pio_data)) {
+		trace_kvm_io_done(vcpu->vcpu_id);
 		vcpu->arch.pio.count = 0;
 		return 1;
 	}
@@ -4625,9 +4629,7 @@ restart:
 		inject_emulated_exception(vcpu);
 		r = EMULATE_DONE;
 	} else if (vcpu->arch.pio.count) {
-		if (!vcpu->arch.pio.in)
-			vcpu->arch.pio.count = 0;
-		else
+		if (vcpu->arch.pio.in)
 			writeback = false;
 		r = EMULATE_DO_MMIO;
 	} else if (vcpu->mmio_needed) {
@@ -4658,8 +4660,6 @@ int kvm_fast_pio_out(struct kvm_vcpu *vcpu, int size, unsigned short port)
 	unsigned long val = kvm_register_read(vcpu, VCPU_REGS_RAX);
 	int ret = emulator_pio_out_emulated(&vcpu->arch.emulate_ctxt,
 					    size, port, &val, 1);
-	/* do not return to emulator after return from userspace */
-	vcpu->arch.pio.count = 0;
 	return ret;
 }
 EXPORT_SYMBOL_GPL(kvm_fast_pio_out);
@@ -5509,11 +5509,16 @@ static int complete_mmio(struct kvm_vcpu *vcpu)
 {
 	struct kvm_run *run = vcpu->run;
 	struct kvm_mmio_fragment *frag;
-	int r;
+	int r = 1;
 
 	if (!(vcpu->arch.pio.count || vcpu->mmio_needed))
 		return 1;
 
+	if (vcpu->arch.pio.count && !vcpu->arch.pio.in) {
+		vcpu->arch.pio.count = 0;
+		goto exit;
+	}
+
 	if (vcpu->mmio_needed) {
 		/* Complete previous fragment */
 		frag = &vcpu->mmio_fragments[vcpu->mmio_cur_fragment++];
@@ -5521,8 +5526,10 @@ static int complete_mmio(struct kvm_vcpu *vcpu)
 			memcpy(frag->data, run->mmio.data, frag->len);
 		if (vcpu->mmio_cur_fragment == vcpu->mmio_nr_fragments) {
 			vcpu->mmio_needed = 0;
+
 			if (vcpu->mmio_is_write)
-				return 1;
+				goto exit;
+
 			vcpu->mmio_read_completed = 1;
 			goto done;
 		}
@@ -5539,11 +5546,12 @@ static int complete_mmio(struct kvm_vcpu *vcpu)
 	}
 done:
 	vcpu->srcu_idx = srcu_read_lock(&vcpu->kvm->srcu);
-	r = emulate_instruction(vcpu, EMULTYPE_NO_DECODE);
+	r = emulate_instruction(vcpu, EMULTYPE_NO_DECODE) == EMULATE_DONE;
 	srcu_read_unlock(&vcpu->kvm->srcu, vcpu->srcu_idx);
-	if (r != EMULATE_DONE)
-		return 0;
-	return 1;
+
+exit:
+	trace_kvm_io_done(vcpu->vcpu_id);
+	return r;
 }
 
 int kvm_arch_vcpu_ioctl_run(struct kvm_vcpu *vcpu, struct kvm_run *kvm_run)
diff --git a/include/trace/events/kvm.h b/include/trace/events/kvm.h
index 7ef9e75..d4182fa 100644
--- a/include/trace/events/kvm.h
+++ b/include/trace/events/kvm.h
@@ -177,6 +177,43 @@ TRACE_EVENT(kvm_mmio,
 		  __entry->len, __entry->gpa, __entry->val)
 );
 
+TRACE_EVENT(kvm_mmio_begin,
+	TP_PROTO(unsigned int vcpu_id, bool rw, u64 gpa),
+	TP_ARGS(vcpu_id, rw, gpa),
+
+	TP_STRUCT__entry(
+		__field(unsigned int, vcpu_id)
+		__field(int, type)
+		__field(u64, gpa)
+	),
+
+	TP_fast_assign(
+		__entry->vcpu_id = vcpu_id;
+		__entry->type = rw ? KVM_TRACE_MMIO_WRITE :
+				      KVM_TRACE_MMIO_READ;
+		__entry->gpa = gpa;
+	),
+
+	TP_printk("vcpu %u mmio %s gpa 0x%llx", __entry->vcpu_id,
+		  __print_symbolic(__entry->type, kvm_trace_symbol_mmio),
+		  __entry->gpa)
+);
+
+TRACE_EVENT(kvm_io_done,
+	TP_PROTO(unsigned int vcpu_id),
+	TP_ARGS(vcpu_id),
+
+	TP_STRUCT__entry(
+		__field(unsigned int, vcpu_id)
+	),
+
+	TP_fast_assign(
+		__entry->vcpu_id = vcpu_id;
+	),
+
+	TP_printk("vcpu %u", __entry->vcpu_id)
+);
+
 #define kvm_fpu_load_symbol	\
 	{0, "unload"},		\
 	{1, "load"}
-- 
1.7.2.5


^ permalink raw reply related	[flat|nested] 26+ messages in thread

* [PATCH v7 3/3] KVM: perf: kvm events analysis tool
  2012-08-27  9:51 [PATCH v7 0/3] KVM: perf: kvm events analysis tool Dong Hao
  2012-08-27  9:51 ` [PATCH v7 1/3] KVM: x86: export svm/vmx exit code and vector code to userspace Dong Hao
  2012-08-27  9:51 ` [PATCH v7 2/3] KVM: x86: trace mmio begin and complete Dong Hao
@ 2012-08-27  9:51 ` Dong Hao
  2012-08-27 15:53   ` Andrew Jones
  2012-08-30 18:29   ` David Ahern
  2012-08-27  9:59 ` [PATCH v7 0/3] " Xiao Guangrong
  3 siblings, 2 replies; 26+ messages in thread
From: Dong Hao @ 2012-08-27  9:51 UTC (permalink / raw)
  To: avi, acme, mtosatti, mingo; +Cc: xiaoguangrong, linux-kernel, kvm

From: Xiao Guangrong <xiaoguangrong@linux.vnet.ibm.com>

Add 'perf kvm stat' support to analyze kvm vmexit/mmio/ioport smartly

Usage:
- kvm stat
  run a command and gather performance counter statistics, it is the alias of
  perf stat

- trace kvm events:
  perf kvm stat record, or, if other tracepoints are interesting as well, we
  can append the events like this:
  perf kvm stat record -e timer:*
  If many guests are running, we can track the specified guest by using -p or
  --pid

- show the result:
  perf kvm stat report

The output example is following:
# pgrep qemu-kvm
26071
32253
32564

total 3 guests are running on the host

Then, track the guest whose pid is 26071:
# ./perf kvm stat record -p 26071
^C[ perf record: Woken up 9 times to write data ]
[ perf record: Captured and wrote 24.903 MB perf.data.guest (~1088034 samples) ]

See the vmexit events:
# ./perf kvm stat report --event=vmexit

Analyze events for all VCPUs:

             VM-EXIT    Samples  Samples%     Time%         Avg time

         APIC_ACCESS      65381    66.58%     5.95%     37.72us ( +-   6.54% )
  EXTERNAL_INTERRUPT      16031    16.32%     3.06%     79.11us ( +-   7.34% )
               CPUID       5360     5.46%     0.06%      4.50us ( +-  35.07% )
                 HLT       4496     4.58%    90.75%   8360.34us ( +-   5.22% )
       EPT_VIOLATION       2667     2.72%     0.04%      5.49us ( +-   5.05% )
   PENDING_INTERRUPT       2242     2.28%     0.03%      5.25us ( +-   2.96% )
       EXCEPTION_NMI       1332     1.36%     0.02%      6.53us ( +-   6.51% )
      IO_INSTRUCTION        383     0.39%     0.09%     93.39us ( +-  40.92% )
           CR_ACCESS        310     0.32%     0.00%      6.10us ( +-   3.95% )

Total Samples:98202, Total events handled time:41419293.63us.

See the mmio events:
# ./perf kvm stat report --event=mmio

Analyze events for all VCPUs:

         MMIO Access    Samples  Samples%     Time%         Avg time

        0xfee00380:W      58686    90.21%    15.67%      4.95us ( +-   2.96% )
        0xfee00300:R       2124     3.26%     1.48%     12.93us ( +-  14.75% )
        0xfee00310:W       2124     3.26%     0.34%      3.00us ( +-   1.33% )
        0xfee00300:W       2123     3.26%    82.50%    720.68us ( +-  10.24% )

Total Samples:65057, Total events handled time:1854470.45us.

See the ioport event:
# ./perf kvm stat report --event=ioport

Analyze events for all VCPUs:

      IO Port Access    Samples  Samples%     Time%         Avg time

         0xc090:POUT        383   100.00%   100.00%     89.00us ( +-  42.94% )

Total Samples:383, Total events handled time:34085.56us.

And, --vcpu is used to track the specified vcpu and --key is used to sort the
result:
# ./perf kvm stat report --event=vmexit --vcpu=0 --key=time

Analyze events for VCPU 0:

             VM-EXIT    Samples  Samples%     Time%         Avg time

                 HLT        551     5.05%    94.81%   9501.72us ( +-  12.52% )
  EXTERNAL_INTERRUPT       1390    12.74%     2.39%     94.80us ( +-  20.92% )
         APIC_ACCESS       6186    56.68%     2.62%     23.41us ( +-  23.62% )
      IO_INSTRUCTION         17     0.16%     0.01%     20.39us ( +-  22.33% )
       EXCEPTION_NMI         94     0.86%     0.01%      6.07us ( +-   7.13% )
   PENDING_INTERRUPT        199     1.82%     0.02%      5.48us ( +-   4.36% )
           CR_ACCESS         52     0.48%     0.00%      4.89us ( +-   4.09% )
       EPT_VIOLATION       2057    18.85%     0.12%      3.15us ( +-   1.33% )
               CPUID        368     3.37%     0.02%      2.82us ( +-   2.79% )

Total Samples:10914, Total events handled time:5521782.02us.

[ Dong Hao <haodong@linux.vnet.ibm.com>:
     - rebase it on current acme's tree
     - fix the compiling-error on i386
]

Signed-off-by: Xiao Guangrong <xiaoguangrong@linux.vnet.ibm.com>
Signed-off-by: Dong Hao <haodong@linux.vnet.ibm.com>
---
 tools/perf/Documentation/perf-kvm.txt |   30 +-
 tools/perf/MANIFEST                   |    3 +
 tools/perf/builtin-kvm.c              |  889 ++++++++++++++++++++++++++++++++-
 tools/perf/util/header.c              |   54 ++-
 tools/perf/util/header.h              |    1 +
 tools/perf/util/thread.h              |    2 +
 6 files changed, 973 insertions(+), 6 deletions(-)

diff --git a/tools/perf/Documentation/perf-kvm.txt b/tools/perf/Documentation/perf-kvm.txt
index dd84cb2..326f2cb 100644
--- a/tools/perf/Documentation/perf-kvm.txt
+++ b/tools/perf/Documentation/perf-kvm.txt
@@ -12,7 +12,7 @@ SYNOPSIS
 	[--guestkallsyms=<path> --guestmodules=<path> | --guestvmlinux=<path>]]
 	{top|record|report|diff|buildid-list}
 'perf kvm' [--host] [--guest] [--guestkallsyms=<path> --guestmodules=<path>
-	| --guestvmlinux=<path>] {top|record|report|diff|buildid-list}
+	| --guestvmlinux=<path>] {top|record|report|diff|buildid-list|stat}
 
 DESCRIPTION
 -----------
@@ -38,6 +38,18 @@ There are a couple of variants of perf kvm:
   so that other tools can be used to fetch packages with matching symbol tables
   for use by perf report.
 
+  'perf kvm stat <command>' to run a command and gather performance counter
+  statistics.
+  Especially, perf 'kvm stat record/report' generates a statistical analysis
+  of KVM events. Currently, vmexit, mmio and ioport events are supported.
+  'perf kvm stat record <command>' records kvm events and the events between
+  start and end <command>.
+  And this command produces a file which contains tracing results of kvm
+  events.
+
+  'perf kvm stat report' reports statistical data which includes events
+  handled time, samples, and so on.
+
 OPTIONS
 -------
 -i::
@@ -68,7 +80,21 @@ OPTIONS
 --guestvmlinux=<path>::
 	Guest os kernel vmlinux.
 
+STAT REPORT OPTIONS
+-------------------
+--vcpu=<value>::
+       analyze events which occures on this vcpu. (default: all vcpus)
+
+--events=<value>::
+       events to be analyzed. Possible values: vmexit, mmio, ioport.
+       (default: vmexit)
+-k::
+--key=<value>::
+       Sorting key. Possible values: sample (default, sort by samples
+       number), time (sort by average time).
+
 SEE ALSO
 --------
 linkperf:perf-top[1], linkperf:perf-record[1], linkperf:perf-report[1],
-linkperf:perf-diff[1], linkperf:perf-buildid-list[1]
+linkperf:perf-diff[1], linkperf:perf-buildid-list[1],
+linkperf:perf-stat[1]
diff --git a/tools/perf/MANIFEST b/tools/perf/MANIFEST
index b4b572e..6896648 100644
--- a/tools/perf/MANIFEST
+++ b/tools/perf/MANIFEST
@@ -15,3 +15,6 @@ arch/*/lib/memset*.S
 include/linux/poison.h
 include/linux/magic.h
 include/linux/hw_breakpoint.h
+arch/x86/include/asm/svm.h
+arch/x86/include/asm/vmx.h
+arch/x86/include/asm/kvm_host.h
diff --git a/tools/perf/builtin-kvm.c b/tools/perf/builtin-kvm.c
index 9fc6e0f..c852ade 100644
--- a/tools/perf/builtin-kvm.c
+++ b/tools/perf/builtin-kvm.c
@@ -1,6 +1,7 @@
 #include "builtin.h"
 #include "perf.h"
 
+#include "util/evsel.h"
 #include "util/util.h"
 #include "util/cache.h"
 #include "util/symbol.h"
@@ -10,8 +11,9 @@
 
 #include "util/parse-options.h"
 #include "util/trace-event.h"
-
 #include "util/debug.h"
+#include "util/debugfs.h"
+#include "util/tool.h"
 
 #include <sys/prctl.h>
 
@@ -19,11 +21,890 @@
 #include <pthread.h>
 #include <math.h>
 
-static const char		*file_name;
+#include "../../arch/x86/include/asm/svm.h"
+#include "../../arch/x86/include/asm/vmx.h"
+#include "../../arch/x86/include/asm/kvm_host.h"
+
+struct event_key {
+	#define INVALID_KEY     (~0ULL)
+	u64 key;
+	int info;
+};
+
+struct kvm_events_ops {
+	bool (*is_begin_event)(struct event_format *event, void *data,
+				struct event_key *key);
+	bool (*is_end_event)(struct event_format *event, void *data,
+				struct event_key *key);
+	void (*decode_key)(struct event_key *key, char decode[20]);
+	const char *name;
+};
+
+static void exit_event_get_key(struct event_format *event, void *data,
+				struct event_key *key)
+{
+	key->info = 0;
+	key->key = raw_field_value(event, "exit_reason", data);
+}
+
+static bool kvm_exit_event(struct event_format *event)
+{
+	return !strcmp(event->name, "kvm_exit");
+}
+
+static bool exit_event_begin(struct event_format *event, void *data,
+                            struct event_key *key)
+{
+	if (kvm_exit_event(event)) {
+		exit_event_get_key(event, data, key);
+		return true;
+	}
+
+	return false;
+}
+
+static bool kvm_entry_event(struct event_format *event)
+{
+	return !strcmp(event->name, "kvm_entry");
+}
+
+static bool exit_event_end(struct event_format *event, void *data __unused,
+				struct event_key *key __unused)
+{
+	return kvm_entry_event(event);
+}
+
+struct exit_reasons_table {
+	unsigned long exit_code;
+	const char *reason;
+};
+
+struct exit_reasons_table vmx_exit_reasons[] = {
+	VMX_EXIT_REASONS
+};
+
+struct exit_reasons_table svm_exit_reasons[] = {
+	SVM_EXIT_REASONS
+};
+
+static int cpu_isa;
+
+static const char *get_exit_reason(u64 exit_code)
+{
+	int table_size = ARRAY_SIZE(svm_exit_reasons);
+	struct exit_reasons_table *table = svm_exit_reasons;
+
+	if (cpu_isa == 1) {
+		table = vmx_exit_reasons;
+		table_size = ARRAY_SIZE(vmx_exit_reasons);
+	}
+
+	while (table_size--) {
+		if (table->exit_code == exit_code)
+			return table->reason;
+		table++;
+	}
+
+	pr_err("unknown kvm exit code:%lld on %s\n",
+		(unsigned long long)exit_code, cpu_isa ? "VMX" : "SVM");
+	return "UNKNOWN";
+}
+
+static void exit_event_decode_key(struct event_key *key, char decode[20])
+{
+       const char *exit_reason = get_exit_reason(key->key);
+
+       snprintf(decode, 20, "%s", exit_reason);
+}
+
+static struct kvm_events_ops exit_events = {
+       .is_begin_event = exit_event_begin,
+       .is_end_event = exit_event_end,
+       .decode_key = exit_event_decode_key,
+       .name = "VM-EXIT"
+};
+
+    /*
+     * For the old kernel, we treat:
+     * the time of MMIO write: kvm_mmio(KVM_TRACE_MMIO_WRITE...) -> kvm_entry
+     * the time of MMIO read: kvm_exit -> kvm_mmio(KVM_TRACE_MMIO_READ...).
+     *
+     * For the new kernel, we use kvm_mmio_begin and kvm_io_done to make
+     * things better.
+     */
+static void mmio_event_get_key(struct event_format *event, void *data,
+                              struct event_key *key)
+{
+       key->key = raw_field_value(event, "gpa", data);
+       key->info = raw_field_value(event, "type", data);
+}
+
+#define KVM_TRACE_MMIO_READ_UNSATISFIED 0
+#define KVM_TRACE_MMIO_READ 1
+#define KVM_TRACE_MMIO_WRITE 2
+
+static bool kvm_io_done_event(struct event_format *event)
+{
+       return !strcmp(event->name, "kvm_io_done");
+}
+
+static bool mmio_event_begin(struct event_format *event, void *data,
+                            struct event_key *key)
+{
+	/* MMIO read begin in old kernel. */
+	if (kvm_exit_event(event))
+		return true;
+
+	/* MMIO write begin in old kernel. */
+	if (!strcmp(event->name, "kvm_mmio") &&
+		raw_field_value(event, "type", data) == KVM_TRACE_MMIO_WRITE) {
+		mmio_event_get_key(event, data, key);
+		return true;
+	}
+
+	/* MMIO read/write begin in new kernel. */
+	if (!strcmp(event->name, "kvm_mmio_begin")) {
+		mmio_event_get_key(event, data, key);
+		return true;
+	}
+
+	return false;
+}
+
+static bool mmio_event_end(struct event_format *event,  void *data,
+			struct event_key *key)
+{
+	/* MMIO write end in old kernel. */
+	if (kvm_entry_event(event))
+		return true;
+
+	/* MMIO read end in the old kernel.*/
+	if (!strcmp(event->name, "kvm_mmio") &&
+		raw_field_value(event, "type", data) == KVM_TRACE_MMIO_READ) {
+		mmio_event_get_key(event, data, key);
+		return true;
+	}
+
+	/* MMIO read/write end event in the new kernel.*/
+	return kvm_io_done_event(event);
+}
+
+static void mmio_event_decode_key(struct event_key *key, char decode[20])
+{
+	snprintf(decode, 20, "%#lx:%s", (unsigned long)key->key,
+				key->info == KVM_TRACE_MMIO_WRITE ? "W" : "R");
+}
+
+static struct kvm_events_ops mmio_events = {
+	.is_begin_event = mmio_event_begin,
+	.is_end_event = mmio_event_end,
+	.decode_key = mmio_event_decode_key,
+	.name = "MMIO Access"
+};
+
+/*
+ * For the old kernel, the time of emulation pio access is from kvm_pio to
+ * kvm_entry. In the new kernel, the end time is indicated by kvm_io_done.
+ */
+static void ioport_event_get_key(struct event_format *event, void *data,
+			struct event_key *key)
+{
+	key->key = raw_field_value(event, "port", data);
+	key->info = raw_field_value(event, "rw", data);
+}
+
+static bool ioport_event_begin(struct event_format *event, void *data,
+			struct event_key *key)
+{
+	if (!strcmp(event->name, "kvm_pio")) {
+		ioport_event_get_key(event, data, key);
+		return true;
+	}
+
+	return false;
+}
+
+static bool ioport_event_end(struct event_format *event, void *data __unused,
+			     struct event_key *key __unused)
+{
+	if (kvm_entry_event(event))
+		return true;
+
+	return kvm_io_done_event(event);
+}
+
+static void ioport_event_decode_key(struct event_key *key, char decode[20])
+{
+	snprintf(decode, 20, "%#llx:%s", (unsigned long long)key->key,
+				key->info ? "POUT" : "PIN");
+}
+
+static struct kvm_events_ops ioport_events = {
+	.is_begin_event = ioport_event_begin,
+	.is_end_event = ioport_event_end,
+	.decode_key = ioport_event_decode_key,
+	.name = "IO Port Access"
+};
+
+static const char *report_event = "vmexit";
+struct kvm_events_ops *events_ops;
+
+static bool register_kvm_events_ops(void)
+{
+	bool ret = true;
+
+	if (!strcmp(report_event, "vmexit"))
+		events_ops = &exit_events;
+	else if (!strcmp(report_event, "mmio"))
+		events_ops = &mmio_events;
+	else if (!strcmp(report_event, "ioport"))
+		events_ops = &ioport_events;
+	else {
+		pr_err("Unknown report event:%s\n", report_event);
+		ret = false;
+	}
+
+	return ret;
+}
+
+struct event_stats {
+	u64 count;
+	u64 time;
+
+	/* used to calculate stddev. */
+	double mean;
+	double M2;
+};
+
+struct kvm_event {
+	struct list_head hash_entry;
+	struct rb_node rb;
+
+	struct event_key key;
+
+	struct event_stats total;
+
+	#define DEFAULT_VCPU_NUM 8
+	int max_vcpu;
+	struct event_stats *vcpu;
+};
+
+struct vcpu_event_record {
+	int vcpu_id;
+	u64 start_time;
+	struct kvm_event *last_event;
+};
+
+#define EVENTS_BITS			12
+#define EVENTS_CACHE_SIZE	(1UL << EVENTS_BITS)
+
+static u64 total_time;
+static u64 total_count;
+static struct list_head kvm_events_cache[EVENTS_CACHE_SIZE];
+
+static void init_kvm_event_record(void)
+{
+	int i;
+
+	for (i = 0; i < (int)EVENTS_CACHE_SIZE; i++)
+		INIT_LIST_HEAD(&kvm_events_cache[i]);
+}
+
+static int kvm_events_hash_fn(u64 key)
+{
+	return key & (EVENTS_CACHE_SIZE - 1);
+}
+
+static bool kvm_event_expand(struct kvm_event *event, int vcpu_id)
+{
+	int old_max_vcpu = event->max_vcpu;
+
+	if (vcpu_id < event->max_vcpu)
+		return true;
+
+	while (event->max_vcpu <= vcpu_id)
+		event->max_vcpu += DEFAULT_VCPU_NUM;
+
+	event->vcpu = realloc(event->vcpu,
+			      event->max_vcpu * sizeof(*event->vcpu));
+	if (!event->vcpu) {
+		pr_err("Not enough memory\n");
+		return false;
+	}
+
+	memset(event->vcpu + old_max_vcpu, 0,
+	       (event->max_vcpu - old_max_vcpu) * sizeof(*event->vcpu));
+	return true;
+}
+
+static struct kvm_event *kvm_alloc_init_event(struct event_key *key)
+{
+	struct kvm_event *event;
+
+	event = zalloc(sizeof(*event));
+	if (!event) {
+		pr_err("Not enough memory\n");
+		return NULL;
+	}
+
+	event->key = *key;
+	return event;
+}
+
+static struct kvm_event *find_create_kvm_event(struct event_key *key)
+{
+	struct kvm_event *event;
+	struct list_head *head;
+
+	BUG_ON(key->key == INVALID_KEY);
+
+	head = &kvm_events_cache[kvm_events_hash_fn(key->key)];
+	list_for_each_entry(event, head, hash_entry)
+		if (event->key.key == key->key && event->key.info == key->info)
+			return event;
+
+	event = kvm_alloc_init_event(key);
+	if (!event)
+		return NULL;
+
+	list_add(&event->hash_entry, head);
+	return event;
+}
+
+static void handle_begin_event(struct vcpu_event_record *vcpu_record,
+			       struct event_key *key, u64 timestamp)
+{
+	struct kvm_event *event = NULL;
+
+	if (key->key != INVALID_KEY)
+		event = find_create_kvm_event(key);
+
+	vcpu_record->last_event = event;
+	vcpu_record->start_time = timestamp;
+}
+
+static void update_event_stats(struct event_stats *stats, u64 time_diff)
+{
+	double delta;
+
+	stats->count++;
+	stats->time += time_diff;
+
+	delta = time_diff - stats->mean;
+	stats->mean += delta / stats->count;
+	stats->M2 += delta*(time_diff - stats->mean);
+}
+
+static double event_stats_stddev(int vcpu_id, struct kvm_event *event)
+{
+	struct event_stats *stats = &event->total;
+	double variance, variance_mean, stddev;
+
+	if (vcpu_id != -1)
+		stats = &event->vcpu[vcpu_id];
+
+	BUG_ON(!stats->count);
+
+	variance = stats->M2 / (stats->count - 1);
+	variance_mean = variance / stats->count;
+	stddev = sqrt(variance_mean);
+
+	return stddev * 100 / stats->mean;
+}
+
+static void update_kvm_event(struct kvm_event *event, int vcpu_id,
+			     u64 time_diff)
+{
+	update_event_stats(&event->total, time_diff);
+
+	if (!kvm_event_expand(event, vcpu_id))
+		return;
+
+	update_event_stats(&event->vcpu[vcpu_id], time_diff);
+}
+
+static void handle_end_event(struct vcpu_event_record *vcpu_record,
+			     struct event_key *key, u64 timestamp)
+{
+	struct kvm_event *event;
+	u64 time_begin, time_diff;
+
+	event = vcpu_record->last_event;
+	time_begin = vcpu_record->start_time;
+
+	/* The begin event is not caught. */
+	if (!time_begin)
+		return;
+
+	/*
+	 * In some case, the 'begin event' only records the start timestamp,
+	 * the actual event is recognized in the 'end event' (e.g. mmio-event
+	 * in the old kernel).
+	 */
+
+	/* Both begin and end events did not get the key. */
+	if (!event && key->key == INVALID_KEY)
+		return;
+
+	if (!event)
+		event = find_create_kvm_event(key);
+
+	if (!event)
+		return;
+
+	vcpu_record->last_event = NULL;
+	vcpu_record->start_time = 0;
+
+	BUG_ON(timestamp < time_begin);
+
+	time_diff = timestamp - time_begin;
+	update_kvm_event(event, vcpu_record->vcpu_id, time_diff);
+}
+
+static struct vcpu_event_record
+*per_vcpu_record(struct thread *thread, struct event_format *event, void *data)
+{
+	/* Only kvm_entry records vcpu id. */
+	if (!thread->priv && kvm_entry_event(event)) {
+		struct vcpu_event_record *vcpu_record;
+
+		vcpu_record = zalloc(sizeof(struct vcpu_event_record));
+		if (!vcpu_record) {
+			pr_err("Not enough memory\n");
+			return NULL;
+		}
+
+		vcpu_record->vcpu_id = raw_field_value(event, "vcpu_id", data);
+		thread->priv = vcpu_record;
+	}
+
+	return (struct vcpu_event_record *)thread->priv;
+}
+
+static void handle_kvm_event(struct thread *thread, struct event_format *event,
+			     void *data, u64 timestamp)
+{
+	struct vcpu_event_record *vcpu_record;
+	struct event_key key = {.key = INVALID_KEY};
+
+	vcpu_record = per_vcpu_record(thread, event, data);
+	if (!vcpu_record)
+		return;
+
+	if (events_ops->is_begin_event(event, data, &key))
+		return handle_begin_event(vcpu_record, &key, timestamp);
+
+	if (events_ops->is_end_event(event, data, &key))
+		return handle_end_event(vcpu_record, &key, timestamp);
+}
+
+typedef int (*key_cmp_fun)(struct kvm_event*, struct kvm_event*, int);
+struct kvm_event_key {
+	const char *name;
+	key_cmp_fun key;
+};
+
+static int trace_vcpu = -1;
+#define GET_EVENT_KEY(member)						\
+static u64 get_event_ ##member(struct kvm_event *event, int vcpu)	\
+{									\
+	if (vcpu == -1)							\
+		return event->total.member;				\
+									\
+	if (vcpu >= event->max_vcpu)					\
+		return 0;						\
+									\
+	return event->vcpu[vcpu].member;				\
+}
+
+#define COMPARE_EVENT_KEY(member)					\
+GET_EVENT_KEY(member)							\
+static int compare_kvm_event_ ## member(struct kvm_event *one,		\
+					struct kvm_event *two, int vcpu)\
+{									\
+	return get_event_ ##member(one, vcpu) >				\
+				get_event_ ##member(two, vcpu);		\
+}
+
+GET_EVENT_KEY(time);
+COMPARE_EVENT_KEY(count);
+COMPARE_EVENT_KEY(mean);
+
+#define DEF_SORT_NAME_KEY(name, compare_key)	\
+	{ #name, compare_kvm_event_ ## compare_key }
+
+static struct kvm_event_key keys[] = {
+	DEF_SORT_NAME_KEY(sample, count),
+	DEF_SORT_NAME_KEY(time, mean),
+	{ NULL, NULL }
+};
+
+static const char *sort_key = "sample";
+static key_cmp_fun compare;
+
+static bool select_key(void)
+{
+	int i;
+
+	for (i = 0; keys[i].name; i++) {
+		if (!strcmp(keys[i].name, sort_key)) {
+			compare = keys[i].key;
+			return true;
+		}
+	}
+
+	pr_err("Unknown compare key:%s\n", sort_key);
+	return false;
+}
+
+static struct rb_root result;
+static void insert_to_result(struct kvm_event *event, key_cmp_fun bigger,
+			     int vcpu)
+{
+	struct rb_node **rb = &result.rb_node;
+	struct rb_node *parent = NULL;
+	struct kvm_event *p;
+
+	while (*rb) {
+		p = container_of(*rb, struct kvm_event, rb);
+		parent = *rb;
+
+		if (bigger(event, p, vcpu))
+			rb = &(*rb)->rb_left;
+		else
+			rb = &(*rb)->rb_right;
+	}
+
+	rb_link_node(&event->rb, parent, rb);
+	rb_insert_color(&event->rb, &result);
+}
+
+static void update_total_count(struct kvm_event *event, int vcpu)
+{
+	total_count += get_event_count(event, vcpu);
+	total_time += get_event_time(event, vcpu);
+}
+
+static bool event_is_valid(struct kvm_event *event, int vcpu)
+{
+	return !!get_event_count(event, vcpu);
+}
+
+static void sort_result(int vcpu)
+{
+	unsigned int i;
+	struct kvm_event *event;
+
+	for (i = 0; i < EVENTS_CACHE_SIZE; i++)
+		list_for_each_entry(event, &kvm_events_cache[i], hash_entry)
+			if (event_is_valid(event, vcpu)) {
+				update_total_count(event, vcpu);
+				insert_to_result(event, compare, vcpu);
+			}
+}
+
+/* returns left most element of result, and erase it */
+static struct kvm_event *pop_from_result(void)
+{
+	struct rb_node *node = result.rb_node;
+
+	if (!node)
+		return NULL;
+
+	while (node->rb_left)
+		node = node->rb_left;
+
+	rb_erase(node, &result);
+	return container_of(node, struct kvm_event, rb);
+}
+
+static void print_vcpu_info(int vcpu)
+{
+	pr_info("Analyze events for ");
+
+	if (vcpu == -1)
+		pr_info("all VCPUs:\n\n");
+	else
+		pr_info("VCPU %d:\n\n", vcpu);
+}
+
+static void print_result(int vcpu)
+{
+	char decode[20];
+	struct kvm_event *event;
+
+	pr_info("\n\n");
+	print_vcpu_info(vcpu);
+	pr_info("%20s ", events_ops->name);
+	pr_info("%10s ", "Samples");
+	pr_info("%9s ", "Samples%");
+
+	pr_info("%9s ", "Time%");
+	pr_info("%16s ", "Avg time");
+	pr_info("\n\n");
+
+	while ((event = pop_from_result())) {
+		u64 ecount, etime;
+
+		ecount = get_event_count(event, vcpu);
+		etime = get_event_time(event, vcpu);
+
+		events_ops->decode_key(&event->key, decode);
+		pr_info("%20s ", decode);
+		pr_info("%10llu ", (unsigned long long)ecount);
+		pr_info("%8.2f%% ", (double)ecount / total_count * 100);
+		pr_info("%8.2f%% ", (double)etime / total_time * 100);
+		pr_info("%9.2fus ( +-%7.2f%% )", (double)etime / ecount/1e3,
+			event_stats_stddev(trace_vcpu, event));
+		pr_info("\n");
+	}
+
+	pr_info("\nTotal Samples:%lld, Total events handled time:%.2fus.\n\n",
+		(unsigned long long)total_count, total_time / 1e3);
+}
+
+static int process_sample_event(struct perf_tool *tool __used,
+				union perf_event *event,
+				struct perf_sample *sample,
+				struct perf_evsel *evsel,
+				struct machine *machine)
+{
+	struct thread *thread = machine__findnew_thread(machine, sample->tid);
+
+	if (thread == NULL) {
+		pr_debug("problem processing %d event, skipping it.\n",
+			event->header.type);
+		return -1;
+	}
+
+	handle_kvm_event(thread, evsel->tp_format, sample->raw_data,
+			 sample->time);
+	return 0;
+}
+
+static struct perf_tool eops = {
+	.sample			= process_sample_event,
+	.comm			= perf_event__process_comm,
+	.ordered_samples	= true,
+};
+
+static int get_cpu_isa(struct perf_session *session)
+{
+	char *cpuid;
+	int isa;
+
+	cpuid = perf_header__read_feature(session, HEADER_CPUID);
+
+	if (!cpuid) {
+		pr_err("read HEADER_CPUID failed.\n");
+		return -ENOTSUP;
+	}
+
+	if (strstr(cpuid, "Intel"))
+		isa = 1;
+	else if (strstr(cpuid, "AMD"))
+		isa = 0;
+	else {
+		pr_err("CPU %s is not supported.\n", cpuid);
+		isa = -ENOTSUP;
+	}
+
+	free(cpuid);
+	return isa;
+}
+
+static const char *file_name;
+
+static int read_events(void)
+{
+	struct perf_session *kvm_session;
+	int ret;
+
+	kvm_session = perf_session__new(file_name, O_RDONLY, 0, false, &eops);
+	if (!kvm_session) {
+		pr_err("Initializing perf session failed\n");
+		return -EINVAL;
+	}
+
+	if (!perf_session__has_traces(kvm_session, "kvm record"))
+		return -EINVAL;
+
+	/*
+	 * Do not use 'isa' recorded in kvm_exit tracepoint since it is not
+	 * traced in the old kernel.
+	 */
+	ret = get_cpu_isa(kvm_session);
+
+	if (ret < 0)
+		return ret;
+
+	cpu_isa = ret;
+
+	return perf_session__process_events(kvm_session, &eops);
+}
+
+static bool verify_vcpu(int vcpu)
+{
+	if (vcpu != -1 && vcpu < 0) {
+		pr_err("Invalid vcpu:%d.\n", vcpu);
+		return false;
+	}
+
+	return true;
+}
+
+static int kvm_events_report_vcpu(int vcpu)
+{
+	int ret = -EINVAL;
+
+	if (!verify_vcpu(vcpu))
+		goto exit;
+
+	if (!select_key())
+		goto exit;
+
+	if (!register_kvm_events_ops())
+		goto exit;
+
+	init_kvm_event_record();
+	setup_pager();
+
+	ret = read_events();
+	if (ret)
+		goto exit;
+
+	sort_result(vcpu);
+	print_result(vcpu);
+exit:
+	return ret;
+}
+
+static const char * const record_args[] = {
+	"record",
+	"-R",
+	"-f",
+	"-m", "1024",
+	"-c", "1",
+	"-e", "kvm:kvm_entry",
+	"-e", "kvm:kvm_exit",
+	"-e", "kvm:kvm_mmio",
+	"-e", "kvm:kvm_pio",
+};
+
+static const char * const new_event[] = {
+	"kvm_mmio_begin",
+	"kvm_io_done"
+};
+
+static bool kvm_record_specified_guest(int argc, const char **argv)
+{
+	int i;
+
+	for (i = 0; i < argc; i++)
+		if (!strcmp(argv[i], "-p") || !strcmp(argv[i], "--pid"))
+			return true;
+
+	return false;
+}
+
+#define STRDUP_FAIL_EXIT(s)		\
+	({	char *_p;		\
+		_p = strdup(s);		\
+		if (!_p)		\
+			return -ENOMEM;	\
+		_p;			\
+	})
+
+static int kvm_events_record(int argc, const char **argv)
+{
+	unsigned int rec_argc, i, j;
+	const char **rec_argv;
+
+	rec_argc = ARRAY_SIZE(record_args) + argc + 2;
+	rec_argc += ARRAY_SIZE(new_event) * 2;
+	rec_argv = calloc(rec_argc + 1, sizeof(char *));
+
+	if (rec_argv == NULL)
+		return -ENOMEM;
+
+	for (i = 0; i < ARRAY_SIZE(record_args); i++)
+		rec_argv[i] = STRDUP_FAIL_EXIT(record_args[i]);
+
+	/*
+	 * Append "-a" only if "-p"/"--pid" is not specified since they
+	 * are mutually exclusive.
+	 */
+	if (!kvm_record_specified_guest(argc, argv))
+		rec_argv[i++] = STRDUP_FAIL_EXIT("-a");
+
+	rec_argv[i++] = STRDUP_FAIL_EXIT("-o");
+	rec_argv[i++] = STRDUP_FAIL_EXIT(file_name);
+
+	for (j = 0; j < ARRAY_SIZE(new_event); j++)
+		if (is_valid_tracepoint(new_event[j])) {
+			char event[256];
+
+			sprintf(event, "kvm:%s", new_event[j]);
+
+			rec_argv[i++] = STRDUP_FAIL_EXIT("-e");
+			rec_argv[i++] = STRDUP_FAIL_EXIT(event);
+		}
+
+	for (j = 1; j < (unsigned int)argc; j++, i++)
+		rec_argv[i] = argv[j];
+
+	return cmd_record(i, rec_argv, NULL);
+}
+
+static const char * const kvm_events_report_usage[] = {
+	"perf kvm stat report [<options>]",
+	NULL
+};
+
+static const struct option kvm_events_report_options[] = {
+	OPT_STRING(0, "event", &report_event, "report event",
+		    "event for reporting: vmexit, mmio, ioport"),
+	OPT_INTEGER(0, "vcpu", &trace_vcpu,
+		    "vcpu id to report"),
+	OPT_STRING('k', "key", &sort_key, "sort-key",
+		    "key for sorting: sample(sort by samples number)"
+		    " time (sort by avg time)"),
+	OPT_END()
+};
+
+static int kvm_events_report(int argc, const char **argv)
+{
+	symbol__init();
+
+	if (argc) {
+		argc = parse_options(argc, argv,
+				     kvm_events_report_options,
+				     kvm_events_report_usage, 0);
+		if (argc)
+			usage_with_options(kvm_events_report_usage,
+					   kvm_events_report_options);
+	}
+
+	return kvm_events_report_vcpu(trace_vcpu);
+}
+
+static int kvm_cmd_stat(int argc, const char **argv)
+{
+	if (argc > 1) {
+		if (!strncmp(argv[1], "rec", 3))
+			return kvm_events_record(argc - 1, argv + 1);
+
+		if (!strncmp(argv[1], "rep", 3))
+			return kvm_events_report(argc - 1 , argv + 1);
+	}
+
+	return cmd_stat(argc, argv, NULL);
+}
+
 static char			name_buffer[256];
 
 static const char * const kvm_usage[] = {
-	"perf kvm [<options>] {top|record|report|diff|buildid-list}",
+	"perf kvm [<options>] {top|record|report|diff|buildid-list|stat}",
 	NULL
 };
 
@@ -135,6 +1016,8 @@ int cmd_kvm(int argc, const char **argv, const char *prefix __used)
 		return cmd_top(argc, argv, NULL);
 	else if (!strncmp(argv[0], "buildid-list", 12))
 		return __cmd_buildid_list(argc, argv);
+	else if (!strncmp(argv[0], "stat", 4))
+		return kvm_cmd_stat(argc, argv);
 	else
 		usage_with_options(kvm_usage, kvm_options);
 
diff --git a/tools/perf/util/header.c b/tools/perf/util/header.c
index 9696e64..a3778d2 100644
--- a/tools/perf/util/header.c
+++ b/tools/perf/util/header.c
@@ -1633,6 +1633,11 @@ static int process_build_id(struct perf_file_section *section,
 	return 0;
 }
 
+static char *read_cpuid(struct perf_header *ph, int fd)
+{
+	return do_read_string(fd, ph);
+}
+
 static struct perf_evsel *
 perf_evlist__find_by_index(struct perf_evlist *evlist, int idx)
 {
@@ -1686,6 +1691,7 @@ process_event_desc(struct perf_file_section *section __unused,
 struct feature_ops {
 	int (*write)(int fd, struct perf_header *h, struct perf_evlist *evlist);
 	void (*print)(struct perf_header *h, int fd, FILE *fp);
+	char *(*read)(struct perf_header *h, int fd);
 	int (*process)(struct perf_file_section *section,
 		       struct perf_header *h, int feat, int fd, void *data);
 	const char *name;
@@ -1700,6 +1706,9 @@ struct feature_ops {
 #define FEAT_OPF(n, func) \
 	[n] = { .name = #n, .write = write_##func, .print = print_##func, \
 		.full_only = true }
+#define FEAT_OPA_R(n, func) \
+	[n] = { .name = #n, .write = write_##func, .print = print_##func, \
+		.read  = read_##func }
 
 /* feature_ops not implemented: */
 #define print_tracing_data	NULL
@@ -1714,7 +1723,7 @@ static const struct feature_ops feat_ops[HEADER_LAST_FEATURE] = {
 	FEAT_OPA(HEADER_ARCH,		arch),
 	FEAT_OPA(HEADER_NRCPUS,		nrcpus),
 	FEAT_OPA(HEADER_CPUDESC,	cpudesc),
-	FEAT_OPA(HEADER_CPUID,		cpuid),
+	FEAT_OPA_R(HEADER_CPUID,	cpuid),
 	FEAT_OPA(HEADER_TOTAL_MEM,	total_mem),
 	FEAT_OPP(HEADER_EVENT_DESC,	event_desc),
 	FEAT_OPA(HEADER_CMDLINE,	cmdline),
@@ -1769,6 +1778,49 @@ int perf_header__fprintf_info(struct perf_session *session, FILE *fp, bool full)
 	return 0;
 }
 
+struct header_read_data {
+	int feat;
+	char *result;
+};
+
+static int perf_file_section__read_feature(struct perf_file_section *section,
+				struct perf_header *ph,
+				int feat, int fd, void *data)
+{
+	struct header_read_data *hd = data;
+
+	if (feat != hd->feat)
+		return 0;
+
+	if (lseek(fd, section->offset, SEEK_SET) == (off_t)-1) {
+		pr_debug("Failed to lseek to %" PRIu64 " offset for feature "
+				"%d, continuing...\n", section->offset, feat);
+	return 0;
+	}
+
+	if (feat >= HEADER_LAST_FEATURE) {
+		pr_warning("unknown feature %d\n", feat);
+		return 0;
+	}
+
+	hd->result = feat_ops[feat].read(ph, fd);
+	return 0;
+}
+
+char *perf_header__read_feature(struct perf_session *session, int feat)
+{
+	struct perf_header *header = &session->header;
+	struct header_read_data hd;
+	int fd = session->fd;
+
+	hd.feat = feat;
+	hd.result = NULL;
+
+	perf_header__process_sections(header, fd, &hd,
+				perf_file_section__read_feature);
+	return hd.result;
+}
+
 static int do_write_feat(int fd, struct perf_header *h, int type,
 			 struct perf_file_section **p,
 			 struct perf_evlist *evlist)
diff --git a/tools/perf/util/header.h b/tools/perf/util/header.h
index 9d5eedc..a49436d 100644
--- a/tools/perf/util/header.h
+++ b/tools/perf/util/header.h
@@ -94,6 +94,7 @@ int perf_header__process_sections(struct perf_header *header, int fd,
 				  int feat, int fd, void *data));
 
 int perf_header__fprintf_info(struct perf_session *s, FILE *fp, bool full);
+char *perf_header__read_feature(struct perf_session *session, int feat);
 
 int build_id_cache__add_s(const char *sbuild_id, const char *debugdir,
 			  const char *name, bool is_kallsyms);
diff --git a/tools/perf/util/thread.h b/tools/perf/util/thread.h
index 70c2c13..f66610b 100644
--- a/tools/perf/util/thread.h
+++ b/tools/perf/util/thread.h
@@ -16,6 +16,8 @@ struct thread {
 	bool			comm_set;
 	char			*comm;
 	int			comm_len;
+
+	void			*priv;
 };
 
 struct machine;
-- 
1.7.2.5


^ permalink raw reply related	[flat|nested] 26+ messages in thread

* Re: [PATCH v7 0/3] KVM: perf: kvm events analysis tool
  2012-08-27  9:51 [PATCH v7 0/3] KVM: perf: kvm events analysis tool Dong Hao
                   ` (2 preceding siblings ...)
  2012-08-27  9:51 ` [PATCH v7 3/3] KVM: perf: kvm events analysis tool Dong Hao
@ 2012-08-27  9:59 ` Xiao Guangrong
  2012-08-27 12:53   ` David Ahern
  3 siblings, 1 reply; 26+ messages in thread
From: Xiao Guangrong @ 2012-08-27  9:59 UTC (permalink / raw)
  To: Dong Hao; +Cc: avi, acme, mtosatti, mingo, linux-kernel, kvm, David Ahern

CC David.

Hi David,

I should apologize to you that Dong forgot to post the patchset
to you. Could you pick these up from the mail list?

On 08/27/2012 05:51 PM, Dong Hao wrote:
> From: Xiao Guangrong <xiaoguangrong@linux.vnet.ibm.com>
> 
> Changelog:
> - rebased it on Arnaldo's newest git tree perf/core branch
> 
> the change from Arnaldo's comments:
> - directly get event from evsel->tp_format
> - remove die() and return the proper error code
> - rename thread->private to thread->priv
> 
> the change from David's comments:
> - use is_valid_tracepoint instead of kvm_events_exist 
> 
> This patchset introduces a perf-based tool (perf kvm stat record/report)
> which can analyze kvm events more smartly. Below is the presentation slice
> on 2012 Japan LinuxCon:
> http://events.linuxfoundation.org/images/stories/pdf/lcjp2012_guangrong.pdf
> You can get more details from it. If any questions/comments, please feel free
> to let us know.
> 
> This patchset is based on Arnaldo's git tree perf/core branch, and patch 2
> is just doing the "improvement" work, which can be picked up independently.
> 
> 
> Usage:
> - kvm stat
>   run a command and gather performance counter statistics, it is the alias of
>   perf stat
> 
> - trace kvm events:
>   perf kvm stat record, or, if other tracepoints are interesting as well, we
>   can append the events like this:
>   perf kvm stat record -e kvm:*
>   If many guests are running, we can track the specified guest by using -p or
>   --pid
> 
> - show the result:
>   perf kvm stat report
> 
> The output example is following:
> # pgrep qemu-kvm
> 26071
> 32253
> 32564
> 
> total 3 guests are running on the host
> 
> Then, track the guest whose pid is 26071:
> # ./perf kvm stat record -p 26071
> ^C[ perf record: Woken up 9 times to write data ]
> [ perf record: Captured and wrote 24.903 MB perf.data.guest (~1088034 samples) ]
> 
> See the vmexit events:
> # ./perf kvm stat report --event=vmexit
> 
> Analyze events for all VCPUs:
> 
>              VM-EXIT    Samples  Samples%     Time%         Avg time
> 
>          APIC_ACCESS      65381    66.58%     5.95%     37.72us ( +-   6.54% )
>   EXTERNAL_INTERRUPT      16031    16.32%     3.06%     79.11us ( +-   7.34% )
>                CPUID       5360     5.46%     0.06%      4.50us ( +-  35.07% )
>                  HLT       4496     4.58%    90.75%   8360.34us ( +-   5.22% )
>        EPT_VIOLATION       2667     2.72%     0.04%      5.49us ( +-   5.05% )
>    PENDING_INTERRUPT       2242     2.28%     0.03%      5.25us ( +-   2.96% )
>        EXCEPTION_NMI       1332     1.36%     0.02%      6.53us ( +-   6.51% )
>       IO_INSTRUCTION        383     0.39%     0.09%     93.39us ( +-  40.92% )
>            CR_ACCESS        310     0.32%     0.00%      6.10us ( +-   3.95% )
> 
> Total Samples:98202, Total events handled time:41419293.63us.
> 
> See the mmio events:
> # ./perf kvm stat report --event=mmio
> 
> Analyze events for all VCPUs:
> 
>          MMIO Access    Samples  Samples%     Time%         Avg time
> 
>         0xfee00380:W      58686    90.21%    15.67%      4.95us ( +-   2.96% )
>         0xfee00300:R       2124     3.26%     1.48%     12.93us ( +-  14.75% )
>         0xfee00310:W       2124     3.26%     0.34%      3.00us ( +-   1.33% )
>         0xfee00300:W       2123     3.26%    82.50%    720.68us ( +-  10.24% )
> 
> Total Samples:65057, Total events handled time:1854470.45us.
> 
> See the ioport event:
> # ./perf kvm stat report --event=ioport
> 
> Analyze events for all VCPUs:
> 
>       IO Port Access    Samples  Samples%     Time%         Avg time
> 
>          0xc090:POUT        383   100.00%   100.00%     89.00us ( +-  42.94% )
> 
> Total Samples:383, Total events handled time:34085.56us.
> 
> And, --vcpu is used to track the specified vcpu and --key is used to sort the
> result:
> # ./perf kvm stat report --event=vmexit --vcpu=0 --key=time
> 
> Analyze events for VCPU 0:
> 
>              VM-EXIT    Samples  Samples%     Time%         Avg time
> 
>                  HLT        551     5.05%    94.81%   9501.72us ( +-  12.52% )
>   EXTERNAL_INTERRUPT       1390    12.74%     2.39%     94.80us ( +-  20.92% )
>          APIC_ACCESS       6186    56.68%     2.62%     23.41us ( +-  23.62% )
>       IO_INSTRUCTION         17     0.16%     0.01%     20.39us ( +-  22.33% )
>        EXCEPTION_NMI         94     0.86%     0.01%      6.07us ( +-   7.13% )
>    PENDING_INTERRUPT        199     1.82%     0.02%      5.48us ( +-   4.36% )
>            CR_ACCESS         52     0.48%     0.00%      4.89us ( +-   4.09% )
>        EPT_VIOLATION       2057    18.85%     0.12%      3.15us ( +-   1.33% )
>                CPUID        368     3.37%     0.02%      2.82us ( +-   2.79% )
> 
> Total Samples:10914, Total events handled time:5521782.02us.
> 
> 
> Dong Hao (3):
>   KVM: x86: export svm/vmx exit code and vector code to userspace
>   KVM: x86: trace mmio begin and complete
>   KVM: perf: kvm events analysis tool
> 
>  arch/x86/include/asm/kvm_host.h       |   36 +-
>  arch/x86/include/asm/svm.h            |  205 +++++---
>  arch/x86/include/asm/vmx.h            |  126 +++--
>  arch/x86/kvm/trace.h                  |   89 ----
>  arch/x86/kvm/x86.c                    |   32 +-
>  include/trace/events/kvm.h            |   37 ++
>  tools/perf/Documentation/perf-kvm.txt |   30 +-
>  tools/perf/MANIFEST                   |    3 +
>  tools/perf/builtin-kvm.c              |  889 ++++++++++++++++++++++++++++++++-
>  tools/perf/util/header.c              |   54 ++-
>  tools/perf/util/header.h              |    1 +
>  tools/perf/util/thread.h              |    2 +
>  12 files changed, 1264 insertions(+), 240 deletions(-)
> 


^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: [PATCH v7 0/3] KVM: perf: kvm events analysis tool
  2012-08-27  9:59 ` [PATCH v7 0/3] " Xiao Guangrong
@ 2012-08-27 12:53   ` David Ahern
  0 siblings, 0 replies; 26+ messages in thread
From: David Ahern @ 2012-08-27 12:53 UTC (permalink / raw)
  To: Xiao Guangrong; +Cc: Dong Hao, avi, acme, mtosatti, mingo, linux-kernel, kvm

On 8/27/12 3:59 AM, Xiao Guangrong wrote:
> CC David.
>
> Hi David,
>
> I should apologize to you that Dong forgot to post the patchset
> to you. Could you pick these up from the mail list?

Yes, I do catch all perf related emails to LKML. I'll take a look at the 
patches today or tomorrow.

David


^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: [PATCH v7 3/3] KVM: perf: kvm events analysis tool
  2012-08-27  9:51 ` [PATCH v7 3/3] KVM: perf: kvm events analysis tool Dong Hao
@ 2012-08-27 15:53   ` Andrew Jones
  2012-08-27 19:34     ` David Ahern
  2012-09-02 13:51     ` don
  2012-08-30 18:29   ` David Ahern
  1 sibling, 2 replies; 26+ messages in thread
From: Andrew Jones @ 2012-08-27 15:53 UTC (permalink / raw)
  To: Dong Hao; +Cc: avi, acme, mtosatti, mingo, xiaoguangrong, linux-kernel, kvm

On Mon, Aug 27, 2012 at 05:51:46PM +0800, Dong Hao wrote:

<snip>

> +struct event_stats {
> +	u64 count;
> +	u64 time;
> +
> +	/* used to calculate stddev. */
> +	double mean;
> +	double M2;
> +};

How about moving the stats functions from builtin-stat.c to e.g.
util/stats.c, and then reusing them? Then this struct (which I would
rename to kvm_event_stats) would look like this

struct kvm_event_stats {
        u64 time;
        struct stats stats;
};

of course the get_event_ accessor generators would need tweaking

<snip>

> +static void update_event_stats(struct event_stats *stats, u64 time_diff)
> +{
> +	double delta;
> +
> +	stats->count++;
> +	stats->time += time_diff;
> +
> +	delta = time_diff - stats->mean;
> +	stats->mean += delta / stats->count;
> +	stats->M2 += delta*(time_diff - stats->mean);
> +}

Reusing stats would allow this to become just

static void update_event_stats(struct kvm_event_stats *stats, u64 time_diff)
{
	update_stats(&kvm_stats->stats, time_diff);
	kvm_stats->time += time_diff;
}

> +
> +static double event_stats_stddev(int vcpu_id, struct kvm_event *event)
> +{
> +	struct event_stats *stats = &event->total;
> +	double variance, variance_mean, stddev;
> +
> +	if (vcpu_id != -1)
> +		stats = &event->vcpu[vcpu_id];
> +
> +	BUG_ON(!stats->count);
> +
> +	variance = stats->M2 / (stats->count - 1);
> +	variance_mean = variance / stats->count;
> +	stddev = sqrt(variance_mean);
> +
> +	return stddev * 100 / stats->mean;

This function's name implies it returns the stddev, but it returns the
relative stddev instead. Maybe rename it? This would be simplified
with code reuse too to basically just

return stddev_stats(&kvm_stats->stats) * 100 / kvm_stats->stats.mean;

Drew

^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: [PATCH v7 3/3] KVM: perf: kvm events analysis tool
  2012-08-27 15:53   ` Andrew Jones
@ 2012-08-27 19:34     ` David Ahern
  2012-08-28  6:35       ` Andrew Jones
  2012-09-02 13:51     ` don
  1 sibling, 1 reply; 26+ messages in thread
From: David Ahern @ 2012-08-27 19:34 UTC (permalink / raw)
  To: Andrew Jones
  Cc: Dong Hao, avi, acme, mtosatti, mingo, xiaoguangrong, linux-kernel, kvm

On 8/27/12 9:53 AM, Andrew Jones wrote:
> On Mon, Aug 27, 2012 at 05:51:46PM +0800, Dong Hao wrote:
>
> <snip>
>
>> +struct event_stats {
>> +	u64 count;
>> +	u64 time;
>> +
>> +	/* used to calculate stddev. */
>> +	double mean;
>> +	double M2;
>> +};
>
> How about moving the stats functions from builtin-stat.c to e.g.
> util/stats.c, and then reusing them? Then this struct (which I would
> rename to kvm_event_stats) would look like this
>
> struct kvm_event_stats {
>          u64 time;
>          struct stats stats;
> };
>
> of course the get_event_ accessor generators would need tweaking

Given the history of the command (first submitted back in February) code 
refactoring can wait until there is a second user for the stats code.

David

^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: [PATCH v7 3/3] KVM: perf: kvm events analysis tool
  2012-08-27 19:34     ` David Ahern
@ 2012-08-28  6:35       ` Andrew Jones
  2012-08-28 17:19         ` David Ahern
  0 siblings, 1 reply; 26+ messages in thread
From: Andrew Jones @ 2012-08-28  6:35 UTC (permalink / raw)
  To: David Ahern
  Cc: Dong Hao, avi, acme, mtosatti, mingo, xiaoguangrong, linux-kernel, kvm

On Mon, Aug 27, 2012 at 01:34:36PM -0600, David Ahern wrote:
> On 8/27/12 9:53 AM, Andrew Jones wrote:
> >On Mon, Aug 27, 2012 at 05:51:46PM +0800, Dong Hao wrote:
> >
> ><snip>
> >
> >>+struct event_stats {
> >>+	u64 count;
> >>+	u64 time;
> >>+
> >>+	/* used to calculate stddev. */
> >>+	double mean;
> >>+	double M2;
> >>+};
> >
> >How about moving the stats functions from builtin-stat.c to e.g.
> >util/stats.c, and then reusing them? Then this struct (which I would
> >rename to kvm_event_stats) would look like this
> >
> >struct kvm_event_stats {
> >         u64 time;
> >         struct stats stats;
> >};
> >
> >of course the get_event_ accessor generators would need tweaking
> 
> Given the history of the command (first submitted back in February)
> code refactoring can wait until there is a second user for the stats
> code.

This is the second user. As I said, move the stats code from
builtin-stat.c. Or are you suggesting to duplicate the code once,
and then refactor the second (really the third) time it's needed?

Drew

^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: [PATCH v7 3/3] KVM: perf: kvm events analysis tool
  2012-08-28  6:35       ` Andrew Jones
@ 2012-08-28 17:19         ` David Ahern
  0 siblings, 0 replies; 26+ messages in thread
From: David Ahern @ 2012-08-28 17:19 UTC (permalink / raw)
  To: Andrew Jones
  Cc: Dong Hao, avi, acme, mtosatti, mingo, xiaoguangrong, linux-kernel, kvm

On 8/28/12 12:35 AM, Andrew Jones wrote:
> This is the second user. As I said, move the stats code from
> builtin-stat.c. Or are you suggesting to duplicate the code once,
> and then refactor the second (really the third) time it's needed?

gah, I misunderstood your comment. I see your point now.

Davi


^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: [PATCH v7 3/3] KVM: perf: kvm events analysis tool
  2012-08-27  9:51 ` [PATCH v7 3/3] KVM: perf: kvm events analysis tool Dong Hao
  2012-08-27 15:53   ` Andrew Jones
@ 2012-08-30 18:29   ` David Ahern
  2012-09-03  8:48     ` don
  2012-09-13  4:56     ` David Ahern
  1 sibling, 2 replies; 26+ messages in thread
From: David Ahern @ 2012-08-30 18:29 UTC (permalink / raw)
  To: Dong Hao; +Cc: avi, acme, mtosatti, mingo, xiaoguangrong, linux-kernel, kvm

In addition to Andrew's comment about making the stats struct and 
functions generic...

On 8/27/12 3:51 AM, Dong Hao wrote:
---8<---

> +static void exit_event_decode_key(struct event_key *key, char decode[20])
> +{
> +       const char *exit_reason = get_exit_reason(key->key);
> +
> +       snprintf(decode, 20, "%s", exit_reason);
> +}

Use scnprintf rather than snprintf.

---8<---

> +static bool kvm_event_expand(struct kvm_event *event, int vcpu_id)
> +{
> +	int old_max_vcpu = event->max_vcpu;
> +
> +	if (vcpu_id < event->max_vcpu)
> +		return true;
> +
> +	while (event->max_vcpu <= vcpu_id)
> +		event->max_vcpu += DEFAULT_VCPU_NUM;
> +
> +	event->vcpu = realloc(event->vcpu,
> +			      event->max_vcpu * sizeof(*event->vcpu));
> +	if (!event->vcpu) {

If realloc fails you leak memory by overwriting the current pointer.

---8<---

> +static double event_stats_stddev(int vcpu_id, struct kvm_event *event)
> +{
> +	struct event_stats *stats = &event->total;
> +	double variance, variance_mean, stddev;
> +
> +	if (vcpu_id != -1)
> +		stats = &event->vcpu[vcpu_id];
> +
> +	BUG_ON(!stats->count);
> +
> +	variance = stats->M2 / (stats->count - 1);
> +	variance_mean = variance / stats->count;
> +	stddev = sqrt(variance_mean);
> +
> +	return stddev * 100 / stats->mean;
> +}

perf should be consistent in the stddev it shows the user. Any reason to 
dump relative stddev versus stddev used by perf-stat?

---8<---

> +/* returns left most element of result, and erase it */
> +static struct kvm_event *pop_from_result(void)
> +{
> +	struct rb_node *node = result.rb_node;
> +
> +	if (!node)
> +		return NULL;
> +
> +	while (node->rb_left)
> +		node = node->rb_left;

Use rb_first().

---8<---

> +	/*
> +	 * Append "-a" only if "-p"/"--pid" is not specified since they
> +	 * are mutually exclusive.
> +	 */
> +	if (!kvm_record_specified_guest(argc, argv))
> +		rec_argv[i++] = STRDUP_FAIL_EXIT("-a");

Other perf-kvm commands rely on perf-record semantics -- i.e., for user 
to add the -a or -p option.

---8<---

> +static const char * const kvm_events_report_usage[] = {
> +	"perf kvm stat report [<options>]",
> +	NULL
> +};
> +
> +static const struct option kvm_events_report_options[] = {
> +	OPT_STRING(0, "event", &report_event, "report event",
> +		    "event for reporting: vmexit, mmio, ioport"),
> +	OPT_INTEGER(0, "vcpu", &trace_vcpu,
> +		    "vcpu id to report"),
> +	OPT_STRING('k', "key", &sort_key, "sort-key",
> +		    "key for sorting: sample(sort by samples number)"
> +		    " time (sort by avg time)"),
> +	OPT_END()
> +};
> +
> +static int kvm_events_report(int argc, const char **argv)
> +{
> +	symbol__init();
> +
> +	if (argc) {
> +		argc = parse_options(argc, argv,
> +				     kvm_events_report_options,
> +				     kvm_events_report_usage, 0);
> +		if (argc)
> +			usage_with_options(kvm_events_report_usage,
> +					   kvm_events_report_options);
> +	}
> +
> +	return kvm_events_report_vcpu(trace_vcpu);
> +}
> +
> +static int kvm_cmd_stat(int argc, const char **argv)
> +{
> +	if (argc > 1) {
> +		if (!strncmp(argv[1], "rec", 3))
> +			return kvm_events_record(argc - 1, argv + 1);
> +
> +		if (!strncmp(argv[1], "rep", 3))
> +			return kvm_events_report(argc - 1 , argv + 1);
> +	}
> +
> +	return cmd_stat(argc, argv, NULL);
> +}
> +
>   static char			name_buffer[256];
>
>   static const char * const kvm_usage[] = {
> -	"perf kvm [<options>] {top|record|report|diff|buildid-list}",
> +	"perf kvm [<options>] {top|record|report|diff|buildid-list|stat}",
>   	NULL
>   };
>

The usage for the report/record sub commands of stat is never shown. e.g.,
$ perf kvm stat
--> shows help for perf-stat

$ perf kvm
--> shows the above and perf-kvm's usage

To get help for the record/report subcommands you have to know that 
record and report are subcommands.

---8<---

> +static int perf_file_section__read_feature(struct perf_file_section *section,
> +				struct perf_header *ph,
> +				int feat, int fd, void *data)
> +{
> +	struct header_read_data *hd = data;
> +
> +	if (feat != hd->feat)
> +		return 0;
> +
> +	if (lseek(fd, section->offset, SEEK_SET) == (off_t)-1) {
> +		pr_debug("Failed to lseek to %" PRIu64 " offset for feature "
> +				"%d, continuing...\n", section->offset, feat);
> +	return 0;
> +	}
> +
> +	if (feat >= HEADER_LAST_FEATURE) {
> +		pr_warning("unknown feature %d\n", feat);
> +		return 0;
> +	}
> +
> +	hd->result = feat_ops[feat].read(ph, fd);

you should verify the read() function is implemented for the requested 
feature.

David

^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: [PATCH v7 3/3] KVM: perf: kvm events analysis tool
  2012-08-27 15:53   ` Andrew Jones
  2012-08-27 19:34     ` David Ahern
@ 2012-09-02 13:51     ` don
  1 sibling, 0 replies; 26+ messages in thread
From: don @ 2012-09-02 13:51 UTC (permalink / raw)
  To: Andrew Jones
  Cc: avi, acme, mtosatti, mingo, xiaoguangrong, linux-kernel, kvm, dsahern

于 2012年08月27日 23:53, Andrew Jones 写道:
> On Mon, Aug 27, 2012 at 05:51:46PM +0800, Dong Hao wrote:
>
> <snip>
>
>> +struct event_stats {
>> +	u64 count;
>> +	u64 time;
>> +
>> +	/* used to calculate stddev. */
>> +	double mean;
>> +	double M2;
>> +};
> How about moving the stats functions from builtin-stat.c to e.g.
> util/stats.c, and then reusing them? Then this struct (which I would
> rename to kvm_event_stats) would look like this
>
> struct kvm_event_stats {
>          u64 time;
>          struct stats stats;
> };
>
> of course the get_event_ accessor generators would need tweaking
>
> <snip>
>
>> +static void update_event_stats(struct event_stats *stats, u64 time_diff)
>> +{
>> +	double delta;
>> +
>> +	stats->count++;
>> +	stats->time += time_diff;
>> +
>> +	delta = time_diff - stats->mean;
>> +	stats->mean += delta / stats->count;
>> +	stats->M2 += delta*(time_diff - stats->mean);
>> +}
> Reusing stats would allow this to become just
>
> static void update_event_stats(struct kvm_event_stats *stats, u64 time_diff)
> {
> 	update_stats(&kvm_stats->stats, time_diff);
> 	kvm_stats->time += time_diff;
> }
>
>> +
>> +static double event_stats_stddev(int vcpu_id, struct kvm_event *event)
>> +{
>> +	struct event_stats *stats = &event->total;
>> +	double variance, variance_mean, stddev;
>> +
>> +	if (vcpu_id != -1)
>> +		stats = &event->vcpu[vcpu_id];
>> +
>> +	BUG_ON(!stats->count);
>> +
>> +	variance = stats->M2 / (stats->count - 1);
>> +	variance_mean = variance / stats->count;
>> +	stddev = sqrt(variance_mean);
>> +
>> +	return stddev * 100 / stats->mean;
> This function's name implies it returns the stddev, but it returns the
> relative stddev instead. Maybe rename it? This would be simplified
> with code reuse too to basically just
>
> return stddev_stats(&kvm_stats->stats) * 100 / kvm_stats->stats.mean;
>
> Drew
>
Sorry for my late response, Andrew and thank you very much for your 
comments. We will try
to realize it in our next version.

Thanks,
Dong


^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: [PATCH v7 3/3] KVM: perf: kvm events analysis tool
  2012-08-30 18:29   ` David Ahern
@ 2012-09-03  8:48     ` don
  2012-09-03 16:04       ` David Ahern
  2012-09-13  4:56     ` David Ahern
  1 sibling, 1 reply; 26+ messages in thread
From: don @ 2012-09-03  8:48 UTC (permalink / raw)
  To: David Ahern; +Cc: avi, acme, mtosatti, mingo, xiaoguangrong, linux-kernel, kvm

于 2012年08月31日 02:29, David Ahern 写道:
> In addition to Andrew's comment about making the stats struct and 
> functions generic...
Yes. :-)
>
> On 8/27/12 3:51 AM, Dong Hao wrote:
> ---8<---
>
>> +static void exit_event_decode_key(struct event_key *key, char 
>> decode[20])
>> +{
>> +       const char *exit_reason = get_exit_reason(key->key);
>> +
>> +       snprintf(decode, 20, "%s", exit_reason);
>> +}
>
> Use scnprintf rather than snprintf.
Why? Since we don't care about the return value, what's the difference?
>
> ---8<---
>
>> +static bool kvm_event_expand(struct kvm_event *event, int vcpu_id)
>> +{
>> +    int old_max_vcpu = event->max_vcpu;
>> +
>> +    if (vcpu_id < event->max_vcpu)
>> +        return true;
>> +
>> +    while (event->max_vcpu <= vcpu_id)
>> +        event->max_vcpu += DEFAULT_VCPU_NUM;
>> +
>> +    event->vcpu = realloc(event->vcpu,
>> +                  event->max_vcpu * sizeof(*event->vcpu));
>> +    if (!event->vcpu) {
>
> If realloc fails you leak memory by overwriting the current pointer.
Thanks for pointing it out, we will terminate the running instance in 
our next
version.
>
> ---8<---
>
>> +static double event_stats_stddev(int vcpu_id, struct kvm_event *event)
>> +{
>> +    struct event_stats *stats = &event->total;
>> +    double variance, variance_mean, stddev;
>> +
>> +    if (vcpu_id != -1)
>> +        stats = &event->vcpu[vcpu_id];
>> +
>> +    BUG_ON(!stats->count);
>> +
>> +    variance = stats->M2 / (stats->count - 1);
>> +    variance_mean = variance / stats->count;
>> +    stddev = sqrt(variance_mean);
>> +
>> +    return stddev * 100 / stats->mean;
>> +}
>
> perf should be consistent in the stddev it shows the user. Any reason 
> to dump relative stddev versus stddev used by perf-stat?
Since 'perf stat' uses relative standard deviation rather than stddev, 
'perf kvm stat'
just follows the style of 'perf stat'.

>
> ---8<---
>
>> +/* returns left most element of result, and erase it */
>> +static struct kvm_event *pop_from_result(void)
>> +{
>> +    struct rb_node *node = result.rb_node;
>> +
>> +    if (!node)
>> +        return NULL;
>> +
>> +    while (node->rb_left)
>> +        node = node->rb_left;
>
> Use rb_first().
OK, we will use it in the next version.
>
> ---8<---
>
>> +    /*
>> +     * Append "-a" only if "-p"/"--pid" is not specified since they
>> +     * are mutually exclusive.
>> +     */
>> +    if (!kvm_record_specified_guest(argc, argv))
>> +        rec_argv[i++] = STRDUP_FAIL_EXIT("-a");
>
> Other perf-kvm commands rely on perf-record semantics -- i.e., for 
> user to add the -a or -p option.
You mean, remove '-a' from the default options, then:
if a user wants to record all guest he will use 'perf stat record -a';
and if a user wants to record the specified guest, he should use
'perf stat record -p xxx'?

Well, as the style of other subcommand, e.g., perf lock/perf sched, the
'perf xxx record' record all events on all cpus, no need to use '-a'.

Based on mentioned above, I prefer the original way. ;)
>
> ---8<---
>
>> +static const char * const kvm_events_report_usage[] = {
>> +    "perf kvm stat report [<options>]",
>> +    NULL
>> +};
>> +
>> +static const struct option kvm_events_report_options[] = {
>> +    OPT_STRING(0, "event", &report_event, "report event",
>> +            "event for reporting: vmexit, mmio, ioport"),
>> +    OPT_INTEGER(0, "vcpu", &trace_vcpu,
>> +            "vcpu id to report"),
>> +    OPT_STRING('k', "key", &sort_key, "sort-key",
>> +            "key for sorting: sample(sort by samples number)"
>> +            " time (sort by avg time)"),
>> +    OPT_END()
>> +};
>> +
>> +static int kvm_events_report(int argc, const char **argv)
>> +{
>> +    symbol__init();
>> +
>> +    if (argc) {
>> +        argc = parse_options(argc, argv,
>> +                     kvm_events_report_options,
>> +                     kvm_events_report_usage, 0);
>> +        if (argc)
>> +            usage_with_options(kvm_events_report_usage,
>> +                       kvm_events_report_options);
>> +    }
>> +
>> +    return kvm_events_report_vcpu(trace_vcpu);
>> +}
>> +
>> +static int kvm_cmd_stat(int argc, const char **argv)
>> +{
>> +    if (argc > 1) {
>> +        if (!strncmp(argv[1], "rec", 3))
>> +            return kvm_events_record(argc - 1, argv + 1);
>> +
>> +        if (!strncmp(argv[1], "rep", 3))
>> +            return kvm_events_report(argc - 1 , argv + 1);
>> +    }
>> +
>> +    return cmd_stat(argc, argv, NULL);
>> +}
>> +
>>   static char            name_buffer[256];
>>
>>   static const char * const kvm_usage[] = {
>> -    "perf kvm [<options>] {top|record|report|diff|buildid-list}",
>> +    "perf kvm [<options>] {top|record|report|diff|buildid-list|stat}",
>>       NULL
>>   };
>>
>
> The usage for the report/record sub commands of stat is never shown. 
> e.g.,
> $ perf kvm stat
> --> shows help for perf-stat
>
> $ perf kvm
> --> shows the above and perf-kvm's usage
>
> To get help for the record/report subcommands you have to know that 
> record and report are subcommands.
Okay, we will improve this.
>
> ---8<---
>
>> +static int perf_file_section__read_feature(struct perf_file_section 
>> *section,
>> +                struct perf_header *ph,
>> +                int feat, int fd, void *data)
>> +{
>> +    struct header_read_data *hd = data;
>> +
>> +    if (feat != hd->feat)
>> +        return 0;
>> +
>> +    if (lseek(fd, section->offset, SEEK_SET) == (off_t)-1) {
>> +        pr_debug("Failed to lseek to %" PRIu64 " offset for feature "
>> +                "%d, continuing...\n", section->offset, feat);
>> +    return 0;
>> +    }
>> +
>> +    if (feat >= HEADER_LAST_FEATURE) {
>> +        pr_warning("unknown feature %d\n", feat);
>> +        return 0;
>> +    }
>> +
>> +    hd->result = feat_ops[feat].read(ph, fd);
>
> you should verify the read() function is implemented for the requested 
> feature.
OK, thank you for your comments, we will correct it. :-)


^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: [PATCH v7 2/3] KVM: x86: trace mmio begin and complete
  2012-08-27  9:51 ` [PATCH v7 2/3] KVM: x86: trace mmio begin and complete Dong Hao
@ 2012-09-03 11:07   ` Avi Kivity
  2012-09-04  4:06     ` Xiao Guangrong
  0 siblings, 1 reply; 26+ messages in thread
From: Avi Kivity @ 2012-09-03 11:07 UTC (permalink / raw)
  To: Dong Hao; +Cc: acme, mtosatti, mingo, xiaoguangrong, linux-kernel, kvm

On 08/27/2012 12:51 PM, Dong Hao wrote:
> From: Xiao Guangrong <xiaoguangrong@linux.vnet.ibm.com>
> 
> 'perf kvm stat record/report' will use kvm_exit and kvm_mmio(read...) to
> calculate mmio read emulated time for the old kernel, in order to trace
> mmio read event more exactly, we add kvm_mmio_begin to trace the time when
> mmio read begins, also, add kvm_io_done to trace the time when mmio/pio is
> completed
> 

Why is this so critical?

If a lot of time is spent in in-kernel mmio, then 'perf top' will report
it.  Otherwise the time between kvm_exit and kvm_entry describes the
time spent in the host.  Not all of it is mmio handling, but it is quite
close.

-- 
error compiling committee.c: too many arguments to function

^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: [PATCH v7 1/3] KVM: x86: export svm/vmx exit code and vector code to userspace
  2012-08-27  9:51 ` [PATCH v7 1/3] KVM: x86: export svm/vmx exit code and vector code to userspace Dong Hao
@ 2012-09-03 11:13   ` Avi Kivity
  2012-09-04  3:53     ` Xiao Guangrong
  0 siblings, 1 reply; 26+ messages in thread
From: Avi Kivity @ 2012-09-03 11:13 UTC (permalink / raw)
  To: Dong Hao; +Cc: acme, mtosatti, mingo, xiaoguangrong, linux-kernel, kvm

On 08/27/2012 12:51 PM, Dong Hao wrote:
> From: Xiao Guangrong <xiaoguangrong@linux.vnet.ibm.com>
> 
> Exporting KVM exit information to userspace to be consumed by perf.
> 
> [ Dong Hao <haodong@linux.vnet.ibm.com>: rebase it on acme's git tree ]
> Signed-off-by: Xiao Guangrong <xiaoguangrong@linux.vnet.ibm.com>
> Signed-off-by: Dong Hao <haodong@linux.vnet.ibm.com>
> ---
>  arch/x86/include/asm/kvm_host.h |   36 ++++---

Please put ABIs in kvm.h.  But see below.

>  arch/x86/include/asm/svm.h      |  205 +++++++++++++++++++++++++--------------
>  arch/x86/include/asm/vmx.h      |  126 ++++++++++++++++--------
>  arch/x86/kvm/trace.h            |   89 -----------------
>  4 files changed, 234 insertions(+), 222 deletions(-)
> 
>  
> +#define DE_VECTOR 0
> +#define DB_VECTOR 1
> +#define BP_VECTOR 3
> +#define OF_VECTOR 4
> +#define BR_VECTOR 5
> +#define UD_VECTOR 6
> +#define NM_VECTOR 7
> +#define DF_VECTOR 8
> +#define TS_VECTOR 10
> +#define NP_VECTOR 11
> +#define SS_VECTOR 12
> +#define GP_VECTOR 13
> +#define PF_VECTOR 14
> +#define MF_VECTOR 16
> +#define MC_VECTOR 18

This is not a kvm ABI, but an x86 architecture constants.  It should be
put into an existing x86 header.

> +
>  #endif /* _ASM_X86_KVM_HOST_H */
> diff --git a/arch/x86/include/asm/svm.h b/arch/x86/include/asm/svm.h
> index f2b83bc..cdf5674 100644
> --- a/arch/x86/include/asm/svm.h
> +++ b/arch/x86/include/asm/svm.h
> @@ -1,6 +1,135 @@
>  #ifndef __SVM_H
>  #define __SVM_H
>  
> +
> +#ifdef __KERNEL__
> +

The entire file can be exported; nothing in there is implementation
specific.

> diff --git a/arch/x86/include/asm/vmx.h b/arch/x86/include/asm/vmx.h
> index 74fcb96..61e04e9 100644
> --- a/arch/x86/include/asm/vmx.h
> +++ b/arch/x86/include/asm/vmx.h
> +
> +#ifdef __KERNEL__
> +

Ditto.


-- 
error compiling committee.c: too many arguments to function

^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: [PATCH v7 3/3] KVM: perf: kvm events analysis tool
  2012-09-03  8:48     ` don
@ 2012-09-03 16:04       ` David Ahern
  0 siblings, 0 replies; 26+ messages in thread
From: David Ahern @ 2012-09-03 16:04 UTC (permalink / raw)
  To: don; +Cc: avi, acme, mtosatti, mingo, xiaoguangrong, linux-kernel, kvm

On 9/3/12 2:48 AM, don wrote:
> 于 2012年08月31日 02:29, David Ahern 写道:
>> In addition to Andrew's comment about making the stats struct and
>> functions generic...
> Yes. :-)
>>
>> On 8/27/12 3:51 AM, Dong Hao wrote:
>> ---8<---
>>
>>> +static void exit_event_decode_key(struct event_key *key, char
>>> decode[20])
>>> +{
>>> +       const char *exit_reason = get_exit_reason(key->key);
>>> +
>>> +       snprintf(decode, 20, "%s", exit_reason);
>>> +}
>>
>> Use scnprintf rather than snprintf.
> Why? Since we don't care about the return value, what's the difference?

1. consistency (scnprintf is preferred).
2. what if a new exit reason is added in the future that is larger than 
19 characters? Better to be safe now.

>>
>> ---8<---
>>
>>> +static bool kvm_event_expand(struct kvm_event *event, int vcpu_id)
>>> +{
>>> +    int old_max_vcpu = event->max_vcpu;
>>> +
>>> +    if (vcpu_id < event->max_vcpu)
>>> +        return true;
>>> +
>>> +    while (event->max_vcpu <= vcpu_id)
>>> +        event->max_vcpu += DEFAULT_VCPU_NUM;
>>> +
>>> +    event->vcpu = realloc(event->vcpu,
>>> +                  event->max_vcpu * sizeof(*event->vcpu));
>>> +    if (!event->vcpu) {
>>
>> If realloc fails you leak memory by overwriting the current pointer.
> Thanks for pointing it out, we will terminate the running instance in
> our next
> version.

Ok. Make sure to free the memory of the previous pointer on failure 
before cleaning up. The idea is that all allocations are properly freed 
on exit.

>>
>> ---8<---
>>
>>> +static double event_stats_stddev(int vcpu_id, struct kvm_event *event)
>>> +{
>>> +    struct event_stats *stats = &event->total;
>>> +    double variance, variance_mean, stddev;
>>> +
>>> +    if (vcpu_id != -1)
>>> +        stats = &event->vcpu[vcpu_id];
>>> +
>>> +    BUG_ON(!stats->count);
>>> +
>>> +    variance = stats->M2 / (stats->count - 1);
>>> +    variance_mean = variance / stats->count;
>>> +    stddev = sqrt(variance_mean);
>>> +
>>> +    return stddev * 100 / stats->mean;
>>> +}
>>
>> perf should be consistent in the stddev it shows the user. Any reason
>> to dump relative stddev versus stddev used by perf-stat?
> Since 'perf stat' uses relative standard deviation rather than stddev,
> 'perf kvm stat'
> just follows the style of 'perf stat'.

Yes, I've been working on an idea motivated by Andi Kleen. I have 
noticed that perf stat also uses relative stddev just in a non-direct 
way. I suggest moving common stats code from perf-stat to 
utils/stat.[ch], add a rel_stddev_stats function that returns the above 
and have perf-kvm use that.


>> ---8<---
>>
>>> +    /*
>>> +     * Append "-a" only if "-p"/"--pid" is not specified since they
>>> +     * are mutually exclusive.
>>> +     */
>>> +    if (!kvm_record_specified_guest(argc, argv))
>>> +        rec_argv[i++] = STRDUP_FAIL_EXIT("-a");
>>
>> Other perf-kvm commands rely on perf-record semantics -- i.e., for
>> user to add the -a or -p option.
> You mean, remove '-a' from the default options, then:
> if a user wants to record all guest he will use 'perf stat record -a';
> and if a user wants to record the specified guest, he should use
> 'perf stat record -p xxx'?

Yes. Well, 'perf kvm stat record' with a -a or -p. That makes the new 
stat subcommand consistent with just 'perf kvm record'.

>
> Well, as the style of other subcommand, e.g., perf lock/perf sched, the
> 'perf xxx record' record all events on all cpus, no need to use '-a'.
>
> Based on mentioned above, I prefer the original way. ;)

Yes, I noticed that some commands add -a before calling cmd_record(). 
Can't change that but we can keep perf-kvm consistent with its own 
variants which means not automagically adding -a.

David

^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: [PATCH v7 1/3] KVM: x86: export svm/vmx exit code and vector code to userspace
  2012-09-03 11:13   ` Avi Kivity
@ 2012-09-04  3:53     ` Xiao Guangrong
  0 siblings, 0 replies; 26+ messages in thread
From: Xiao Guangrong @ 2012-09-04  3:53 UTC (permalink / raw)
  To: Avi Kivity; +Cc: Dong Hao, acme, mtosatti, mingo, linux-kernel, kvm

On 09/03/2012 07:13 PM, Avi Kivity wrote:
> On 08/27/2012 12:51 PM, Dong Hao wrote:
>> From: Xiao Guangrong <xiaoguangrong@linux.vnet.ibm.com>
>>
>> Exporting KVM exit information to userspace to be consumed by perf.
>>
>> [ Dong Hao <haodong@linux.vnet.ibm.com>: rebase it on acme's git tree ]
>> Signed-off-by: Xiao Guangrong <xiaoguangrong@linux.vnet.ibm.com>
>> Signed-off-by: Dong Hao <haodong@linux.vnet.ibm.com>
>> ---
>>  arch/x86/include/asm/kvm_host.h |   36 ++++---
> 
> Please put ABIs in kvm.h.  But see below.
> 
>>  arch/x86/include/asm/svm.h      |  205 +++++++++++++++++++++++++--------------
>>  arch/x86/include/asm/vmx.h      |  126 ++++++++++++++++--------
>>  arch/x86/kvm/trace.h            |   89 -----------------
>>  4 files changed, 234 insertions(+), 222 deletions(-)
>>
>>  
>> +#define DE_VECTOR 0
>> +#define DB_VECTOR 1
>> +#define BP_VECTOR 3
>> +#define OF_VECTOR 4
>> +#define BR_VECTOR 5
>> +#define UD_VECTOR 6
>> +#define NM_VECTOR 7
>> +#define DF_VECTOR 8
>> +#define TS_VECTOR 10
>> +#define NP_VECTOR 11
>> +#define SS_VECTOR 12
>> +#define GP_VECTOR 13
>> +#define PF_VECTOR 14
>> +#define MF_VECTOR 16
>> +#define MC_VECTOR 18
> 
> This is not a kvm ABI, but an x86 architecture constants.  It should be
> put into an existing x86 header.

Okay, i will move them into asm/kvm.h

> 
>> +
>>  #endif /* _ASM_X86_KVM_HOST_H */
>> diff --git a/arch/x86/include/asm/svm.h b/arch/x86/include/asm/svm.h
>> index f2b83bc..cdf5674 100644
>> --- a/arch/x86/include/asm/svm.h
>> +++ b/arch/x86/include/asm/svm.h
>> @@ -1,6 +1,135 @@
>>  #ifndef __SVM_H
>>  #define __SVM_H
>>  
>> +
>> +#ifdef __KERNEL__
>> +
> 
> The entire file can be exported; nothing in there is implementation
> specific.

Unfortunately, i have tied this but failed:

../../arch/x86/include/asm/svm.h:262:1: error: packed attribute is unnecessary for ‘vmcb_seg’ [-Werror=packed]
../../arch/x86/include/asm/svm.h:307:1: error: packed attribute is unnecessary for ‘vmcb_save_area’ [-Werror=packed]
../../arch/x86/include/asm/svm.h:312:1: error: packed attribute is unnecessary for ‘vmcb’ [-Werror=packed]
......

> 
>> diff --git a/arch/x86/include/asm/vmx.h b/arch/x86/include/asm/vmx.h
>> index 74fcb96..61e04e9 100644
>> --- a/arch/x86/include/asm/vmx.h
>> +++ b/arch/x86/include/asm/vmx.h
>> +
>> +#ifdef __KERNEL__
>> +
> 
> Ditto.

Or:

../../arch/x86/include/asm/vmx.h:376:0: error: "REG_R8" redefined [-Werror]
/usr/include/sys/ucontext.h:46:0: note: this is the location of the previous definition
../../arch/x86/include/asm/vmx.h:377:0: error: "REG_R9" redefined [-Werror]
/usr/include/sys/ucontext.h:48:0: note: this is the location of the previous definition
......


^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: [PATCH v7 2/3] KVM: x86: trace mmio begin and complete
  2012-09-03 11:07   ` Avi Kivity
@ 2012-09-04  4:06     ` Xiao Guangrong
  0 siblings, 0 replies; 26+ messages in thread
From: Xiao Guangrong @ 2012-09-04  4:06 UTC (permalink / raw)
  To: Avi Kivity; +Cc: Dong Hao, acme, mtosatti, mingo, linux-kernel, kvm

On 09/03/2012 07:07 PM, Avi Kivity wrote:
> On 08/27/2012 12:51 PM, Dong Hao wrote:
>> From: Xiao Guangrong <xiaoguangrong@linux.vnet.ibm.com>
>>
>> 'perf kvm stat record/report' will use kvm_exit and kvm_mmio(read...) to
>> calculate mmio read emulated time for the old kernel, in order to trace
>> mmio read event more exactly, we add kvm_mmio_begin to trace the time when
>> mmio read begins, also, add kvm_io_done to trace the time when mmio/pio is
>> completed
>>
> 
> Why is this so critical?
> 
> If a lot of time is spent in in-kernel mmio, then 'perf top' will report
> it.  Otherwise the time between kvm_exit and kvm_entry describes the
> time spent in the host.  Not all of it is mmio handling, but it is quite
> close.

I have done some test, the new events can get more exact result, this is a
example in my slides:

the event handled time calculated by old events is: 0.000066(s)
calculated by new events, the result is: 0.000005(s).

So, i think it is worth introducing these new events.

But if you do not care it or think these events are duplicate with current
events or unmaintainable, i do not have strong opinion on that. Please let
me know, i will drop them in the next version.




^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: [PATCH v7 3/3] KVM: perf: kvm events analysis tool
  2012-08-30 18:29   ` David Ahern
  2012-09-03  8:48     ` don
@ 2012-09-13  4:56     ` David Ahern
  2012-09-13 13:45       ` Arnaldo Carvalho de Melo
  2012-09-14  2:56       ` Xiao Guangrong
  1 sibling, 2 replies; 26+ messages in thread
From: David Ahern @ 2012-09-13  4:56 UTC (permalink / raw)
  To: Dong Hao, acme, xiaoguangrong
  Cc: avi, mtosatti, Ingo Molnar, linux-kernel, kvm

>>   static const char * const kvm_usage[] = {
>> -    "perf kvm [<options>] {top|record|report|diff|buildid-list}",
>> +    "perf kvm [<options>] {top|record|report|diff|buildid-list|stat}",
>>       NULL
>>   };
>>
>
> The usage for the report/record sub commands of stat is never shown. e.g.,
> $ perf kvm stat
> --> shows help for perf-stat
>
> $ perf kvm
> --> shows the above and perf-kvm's usage

[I deleted this thread, so having to reply to one of my responses. 
hopefully noone is unduly harmed by this.]

I've been using this command a bit lately -- especially on nested 
virtualization -- and I think the syntax is quirky - meaning wrong. In 
my case I always follow up a record with a report and end up using a 
shell script wrapper that combines the 2 and running it repeatedly. e.g.,

     $PERF kvm stat record -o $FILE -p $pid -- sleep $time
     [ $? -eq 0 ] && $PERF --no-pager kvm  -i $FILE stat report

As my daughter likes to say - awkward.

That suggests what is really needed is a 'live' mode - a continual 
updating of the output like perf top, not a record and analyze later 
mode. Which does come back to why I responded to this email -- the 
syntax is klunky and awkward.

So, I spent a fair amount of time today implementing a live mode. And 
after a lot of swearing at the tracepoint processing code I finally have 
it working. And the format extends easily (meaning < day and the next 
step) to a perf-based kvm_stat replacement. Example syntax is:

    perf kvm stat [-p <pid>|-a|...]

which defaults to an update delay of 1 second, and vmexit analysis.

The guts of the processing logic come from the existing kvm-events code. 
The changes focus on combining the record and report paths into one. The 
display needs some help (Arnaldo?), but it seems to work well.

I'd like to get opinions on what next? IMO, the record/report path 
should not get a foot hold from a backward compatibility perspective and 
having to maintain those options. I am willing to take the existing 
patches into git to maintain authorship and from there apply patches to 
make the live mode work - which includes a bit of refactoring of perf 
code (like the stats changes).

Before I march down this path, any objections, opinions, etc?

David

^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: [PATCH v7 3/3] KVM: perf: kvm events analysis tool
  2012-09-13  4:56     ` David Ahern
@ 2012-09-13 13:45       ` Arnaldo Carvalho de Melo
  2012-09-13 14:14         ` David Ahern
  2012-09-14  2:56       ` Xiao Guangrong
  1 sibling, 1 reply; 26+ messages in thread
From: Arnaldo Carvalho de Melo @ 2012-09-13 13:45 UTC (permalink / raw)
  To: David Ahern
  Cc: Dong Hao, xiaoguangrong, avi, mtosatti, Ingo Molnar, linux-kernel, kvm

Em Wed, Sep 12, 2012 at 10:56:44PM -0600, David Ahern escreveu:
> >>  static const char * const kvm_usage[] = {
> >>+    "perf kvm [<options>] {top|record|report|diff|buildid-list|stat}",

> >The usage for the report/record sub commands of stat is never shown. e.g.,
> >$ perf kvm stat
> >--> shows help for perf-stat

> >$ perf kvm
> >--> shows the above and perf-kvm's usage
> 
> [I deleted this thread, so having to reply to one of my responses.
> hopefully noone is unduly harmed by this.]
> 
> I've been using this command a bit lately -- especially on nested
> virtualization -- and I think the syntax is quirky - meaning wrong.
> In my case I always follow up a record with a report and end up
> using a shell script wrapper that combines the 2 and running it
> repeatedly. e.g.,
> 
>     $PERF kvm stat record -o $FILE -p $pid -- sleep $time
>     [ $? -eq 0 ] && $PERF --no-pager kvm  -i $FILE stat report
> 
> As my daughter likes to say - awkward.
> 
> That suggests what is really needed is a 'live' mode - a continual
> updating of the output like perf top, not a record and analyze later
> mode. Which does come back to why I responded to this email -- the
> syntax is klunky and awkward.
> 
> So, I spent a fair amount of time today implementing a live mode.
> And after a lot of swearing at the tracepoint processing code I

What kind of swearing? I'm working on 'perf test' entries for
tracepoints to make sure we don't regress on the perf/libtraceevent
junction, doing that as prep work for further simplifying tracepoint
tools like sched, kvm, kmem, etc.

> finally have it working. And the format extends easily (meaning <
> day and the next step) to a perf-based kvm_stat replacement. Example
> syntax is:
> 
>    perf kvm stat [-p <pid>|-a|...]
> 
> which defaults to an update delay of 1 second, and vmexit analysis.
> 
> The guts of the processing logic come from the existing kvm-events
> code. The changes focus on combining the record and report paths
> into one. The display needs some help (Arnaldo?), but it seems to
> work well.
> 
> I'd like to get opinions on what next? IMO, the record/report path
> should not get a foot hold from a backward compatibility perspective
> and having to maintain those options. I am willing to take the
> existing patches into git to maintain authorship and from there
> apply patches to make the live mode work - which includes a bit of
> refactoring of perf code (like the stats changes).
> 
> Before I march down this path, any objections, opinions, etc?

Can I see the code?

- Arnaldo

^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: [PATCH v7 3/3] KVM: perf: kvm events analysis tool
  2012-09-13 13:45       ` Arnaldo Carvalho de Melo
@ 2012-09-13 14:14         ` David Ahern
  2012-09-13 14:31           ` Arnaldo Carvalho de Melo
  0 siblings, 1 reply; 26+ messages in thread
From: David Ahern @ 2012-09-13 14:14 UTC (permalink / raw)
  To: Arnaldo Carvalho de Melo
  Cc: Dong Hao, xiaoguangrong, avi, mtosatti, Ingo Molnar, linux-kernel, kvm

On 9/13/12 7:45 AM, Arnaldo Carvalho de Melo wrote:
> Em Wed, Sep 12, 2012 at 10:56:44PM -0600, David Ahern escreveu:
>>>>   static const char * const kvm_usage[] = {
>>>> +    "perf kvm [<options>] {top|record|report|diff|buildid-list|stat}",
>
>>> The usage for the report/record sub commands of stat is never shown. e.g.,
>>> $ perf kvm stat
>>> --> shows help for perf-stat
>
>>> $ perf kvm
>>> --> shows the above and perf-kvm's usage
>>
>> [I deleted this thread, so having to reply to one of my responses.
>> hopefully noone is unduly harmed by this.]
>>
>> I've been using this command a bit lately -- especially on nested
>> virtualization -- and I think the syntax is quirky - meaning wrong.
>> In my case I always follow up a record with a report and end up
>> using a shell script wrapper that combines the 2 and running it
>> repeatedly. e.g.,
>>
>>      $PERF kvm stat record -o $FILE -p $pid -- sleep $time
>>      [ $? -eq 0 ] && $PERF --no-pager kvm  -i $FILE stat report
>>
>> As my daughter likes to say - awkward.
>>
>> That suggests what is really needed is a 'live' mode - a continual
>> updating of the output like perf top, not a record and analyze later
>> mode. Which does come back to why I responded to this email -- the
>> syntax is klunky and awkward.
>>
>> So, I spent a fair amount of time today implementing a live mode.
>> And after a lot of swearing at the tracepoint processing code I
>
> What kind of swearing? I'm working on 'perf test' entries for
> tracepoints to make sure we don't regress on the perf/libtraceevent
> junction, doing that as prep work for further simplifying tracepoint
> tools like sched, kvm, kmem, etc.

Have you seen how the tracing initialization is done? ugly. record 
generates tracing data events and report uses those to do the init so 
you can access the raw_data. I ended up writing this:

static int perf_kvm__tracing_init(void)
{
     struct tracing_data *tdata;
     char temp_file[] = "/tmp/perf-XXXXXXXX";
     int fd;

     fd = mkstemp(temp_file);
     if (fd < 0) {
         pr_err("mkstemp failed\n");
         return -1;
     }
     unlink(temp_file);

     tdata = tracing_data_get(&kvm_events.evlist->entries, fd, false);
     if (!tdata)
         return -1;
     lseek(fd, 0, SEEK_SET);
     (void) trace_report(fd, &kvm_events.session->pevent, false);
     tracing_data_put(tdata);

     return 0;
}


>
>> finally have it working. And the format extends easily (meaning <
>> day and the next step) to a perf-based kvm_stat replacement. Example
>> syntax is:
>>
>>     perf kvm stat [-p <pid>|-a|...]
>>
>> which defaults to an update delay of 1 second, and vmexit analysis.
>>
>> The guts of the processing logic come from the existing kvm-events
>> code. The changes focus on combining the record and report paths
>> into one. The display needs some help (Arnaldo?), but it seems to
>> work well.
>>
>> I'd like to get opinions on what next? IMO, the record/report path
>> should not get a foot hold from a backward compatibility perspective
>> and having to maintain those options. I am willing to take the
>> existing patches into git to maintain authorship and from there
>> apply patches to make the live mode work - which includes a bit of
>> refactoring of perf code (like the stats changes).
>>
>> Before I march down this path, any objections, opinions, etc?
>
> Can I see the code?

Let me clean it up over the weekend and send out an RFC for it.

David

^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: [PATCH v7 3/3] KVM: perf: kvm events analysis tool
  2012-09-13 14:14         ` David Ahern
@ 2012-09-13 14:31           ` Arnaldo Carvalho de Melo
  0 siblings, 0 replies; 26+ messages in thread
From: Arnaldo Carvalho de Melo @ 2012-09-13 14:31 UTC (permalink / raw)
  To: David Ahern
  Cc: Dong Hao, xiaoguangrong, avi, mtosatti, Ingo Molnar, linux-kernel, kvm

Em Thu, Sep 13, 2012 at 08:14:37AM -0600, David Ahern escreveu:
> On 9/13/12 7:45 AM, Arnaldo Carvalho de Melo wrote:
> >Em Wed, Sep 12, 2012 at 10:56:44PM -0600, David Ahern escreveu:

> >>So, I spent a fair amount of time today implementing a live mode.
> >>And after a lot of swearing at the tracepoint processing code I

> >What kind of swearing? I'm working on 'perf test' entries for
> >tracepoints to make sure we don't regress on the perf/libtraceevent
> >junction, doing that as prep work for further simplifying tracepoint
> >tools like sched, kvm, kmem, etc.
 
> Have you seen how the tracing initialization is done? ugly. record
> generates tracing data events and report uses those to do the init
> so you can access the raw_data. I ended up writing this:

And all we need is the list of fields so that we can use
perf_evsel__{int,str}val like I did in my 'perf sched' patch series (in
my perf/core branch), and even those accessors I'll tweak some more as
we don't need to check the endianness of the events, its in the same
machine, etc.

I'm trying to get by without using a 'pevent' just using 'event_format',
its doable when everything is local, as a single machine top tool is.

I want to just create the tracepoint events and process them like in
'top', using code more or less like what is in test__PERF_RECORD.

This still needs more work, so I think you can continue in your path and
eventually we'll have infrastructure to do it the way I'm describing,
optimizing the case where the "record" and "top" are in the same
machine, i.e. a short circuited 'live mode' with the top machinery
completely reused for tools, be it written in C, like 'sched', 'kvm',
'kmem', etc, or in perl or python.

- Arnaldo
 
> static int perf_kvm__tracing_init(void)
> {
>     struct tracing_data *tdata;
>     char temp_file[] = "/tmp/perf-XXXXXXXX";
>     int fd;
> 
>     fd = mkstemp(temp_file);
>     if (fd < 0) {
>         pr_err("mkstemp failed\n");
>         return -1;
>     }
>     unlink(temp_file);
> 
>     tdata = tracing_data_get(&kvm_events.evlist->entries, fd, false);
>     if (!tdata)
>         return -1;
>     lseek(fd, 0, SEEK_SET);
>     (void) trace_report(fd, &kvm_events.session->pevent, false);
>     tracing_data_put(tdata);
> 
>     return 0;
> }
> 
> 
> >
> >>finally have it working. And the format extends easily (meaning <
> >>day and the next step) to a perf-based kvm_stat replacement. Example
> >>syntax is:
> >>
> >>    perf kvm stat [-p <pid>|-a|...]
> >>
> >>which defaults to an update delay of 1 second, and vmexit analysis.
> >>
> >>The guts of the processing logic come from the existing kvm-events
> >>code. The changes focus on combining the record and report paths
> >>into one. The display needs some help (Arnaldo?), but it seems to
> >>work well.
> >>
> >>I'd like to get opinions on what next? IMO, the record/report path
> >>should not get a foot hold from a backward compatibility perspective
> >>and having to maintain those options. I am willing to take the
> >>existing patches into git to maintain authorship and from there
> >>apply patches to make the live mode work - which includes a bit of
> >>refactoring of perf code (like the stats changes).
> >>
> >>Before I march down this path, any objections, opinions, etc?
> >
> >Can I see the code?
> 
> Let me clean it up over the weekend and send out an RFC for it.
> 
> David

^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: [PATCH v7 3/3] KVM: perf: kvm events analysis tool
  2012-09-13  4:56     ` David Ahern
  2012-09-13 13:45       ` Arnaldo Carvalho de Melo
@ 2012-09-14  2:56       ` Xiao Guangrong
  2012-09-14 11:51         ` David Ahern
  1 sibling, 1 reply; 26+ messages in thread
From: Xiao Guangrong @ 2012-09-14  2:56 UTC (permalink / raw)
  To: David Ahern; +Cc: Dong Hao, acme, avi, mtosatti, Ingo Molnar, linux-kernel, kvm

On 09/13/2012 12:56 PM, David Ahern wrote:

> 
> That suggests what is really needed is a 'live' mode - a continual updating of the output like perf top, not a record and analyze later mode. Which does come back to why I responded to this email -- the syntax is klunky and awkward.
> 
> So, I spent a fair amount of time today implementing a live mode. And after a lot of swearing at the tracepoint processing code I finally have it working. And the format extends easily (meaning < day and the next step) to a perf-based kvm_stat replacement. Example syntax is:
> 
>    perf kvm stat [-p <pid>|-a|...]
> 
> which defaults to an update delay of 1 second, and vmexit analysis.

Hi David,

I am very glad to see the live mode, it is very similar with kvm_stat(*). I think
kvm guys will like it.

> 
> The guts of the processing logic come from the existing kvm-events code. The changes focus on combining the record and report paths into one. The display needs some help (Arnaldo?), but it seems to work well.
> 
> I'd like to get opinions on what next? IMO, the record/report path should not get a foot hold from a backward compatibility perspective and having to maintain those options. I am willing to take the existing patches into git to maintain authorship and from there apply patches to make the live mode work - which includes a bit of refactoring of perf code (like the stats changes).

We'd better keep the record/report function, sometimes, we can only get perf.data
from the customers whose machine can not be reached for us.

Especially, other tracepoints are also interesting for us when the customers encounter
the performance issue, we always ask costumes to use "perf kvm stat -e xxx" to append
other events, like lock:*. Then, we can get not only the information of kvm events
by using 'perf kvm stat' but also other informations like 'perf lock' or 'perf script'
to get the whole sequences.

> 
> Before I march down this path, any objections, opinions, etc?

And, i think "live mode" is also useful for 'perf lock/sched', could you implement it
in perf core?

By the way, the new version of our patchset is ready, do you want to add your implement
after it is accepted by Arnaldo? Or are you going to post it with our patchset together?

Thanks!

* kvm_stat can be found at scripts/kvm/kvm_stat in the code of Qemu which locate at
  https://git.kernel.org/pub/scm/virt/kvm/qemu-kvm.git.


^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: [PATCH v7 3/3] KVM: perf: kvm events analysis tool
  2012-09-14  2:56       ` Xiao Guangrong
@ 2012-09-14 11:51         ` David Ahern
  0 siblings, 0 replies; 26+ messages in thread
From: David Ahern @ 2012-09-14 11:51 UTC (permalink / raw)
  To: Xiao Guangrong
  Cc: Dong Hao, acme, avi, mtosatti, Ingo Molnar, linux-kernel, kvm

On 9/13/12 8:56 PM, Xiao Guangrong wrote:
> We'd better keep the record/report function, sometimes, we can only get perf.data
> from the customers whose machine can not be reached for us.
>
> Especially, other tracepoints are also interesting for us when the customers encounter
> the performance issue, we always ask costumes to use "perf kvm stat -e xxx" to append
> other events, like lock:*. Then, we can get not only the information of kvm events
> by using 'perf kvm stat' but also other informations like 'perf lock' or 'perf script'
> to get the whole sequences.

Ok.

> And, i think "live mode" is also useful for 'perf lock/sched', could you implement it
> in perf core?

The code will be re-usable, so lock and sched should be able to add a 
live mode too.

>
> By the way, the new version of our patchset is ready, do you want to add your implement
> after it is accepted by Arnaldo? Or are you going to post it with our patchset together?

Then go ahead and submit it for inclusion; I'll add my stuff on top.

David

^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: [PATCH v7 3/3] KVM: perf: kvm events analysis tool
  2012-08-24  1:15 ` [PATCH v7 3/3] " Dong Hao
@ 2012-08-24 17:53   ` David Ahern
  0 siblings, 0 replies; 26+ messages in thread
From: David Ahern @ 2012-08-24 17:53 UTC (permalink / raw)
  To: Dong Hao; +Cc: avi, acme, mtosatti, mingo, xiaoguangrong, linux-kernel, kvm

This one does not apply to Arnaldo's current perf/core branch. Please 
rebase.

David

On 8/23/12 7:15 PM, Dong Hao wrote:
> From: Xiao Guangrong <xiaoguangrong@linux.vnet.ibm.com>
>
> Add 'perf kvm stat' support to analyze kvm vmexit/mmio/ioport smartly
>
> Usage:
> - kvm stat
>    run a command and gather performance counter statistics, it is the alias of
>    perf stat
>
> - trace kvm events:
>    perf kvm stat record, or, if other tracepoints are interesting as well, we
>    can append the events like this:
>    perf kvm stat record -e timer:*
>    If many guests are running, we can track the specified guest by using -p or
>    --pid
>
> - show the result:
>    perf kvm stat report
>
> The output example is following:
> # pgrep qemu-kvm
> 26071
> 32253
> 32564
>
> total 3 guests are running on the host
>
> Then, track the guest whose pid is 26071:
> # ./perf kvm stat record -p 26071
> ^C[ perf record: Woken up 9 times to write data ]
> [ perf record: Captured and wrote 24.903 MB perf.data.guest (~1088034 samples) ]
>
> See the vmexit events:
> # ./perf kvm stat report --event=vmexit
>
> Analyze events for all VCPUs:
>
>               VM-EXIT    Samples  Samples%     Time%         Avg time
>
>           APIC_ACCESS      65381    66.58%     5.95%     37.72us ( +-   6.54% )
>    EXTERNAL_INTERRUPT      16031    16.32%     3.06%     79.11us ( +-   7.34% )
>                 CPUID       5360     5.46%     0.06%      4.50us ( +-  35.07% )
>                   HLT       4496     4.58%    90.75%   8360.34us ( +-   5.22% )
>         EPT_VIOLATION       2667     2.72%     0.04%      5.49us ( +-   5.05% )
>     PENDING_INTERRUPT       2242     2.28%     0.03%      5.25us ( +-   2.96% )
>         EXCEPTION_NMI       1332     1.36%     0.02%      6.53us ( +-   6.51% )
>        IO_INSTRUCTION        383     0.39%     0.09%     93.39us ( +-  40.92% )
>             CR_ACCESS        310     0.32%     0.00%      6.10us ( +-   3.95% )
>
> Total Samples:98202, Total events handled time:41419293.63us.
>
> See the mmio events:
> # ./perf kvm stat report --event=mmio
>
> Analyze events for all VCPUs:
>
>           MMIO Access    Samples  Samples%     Time%         Avg time
>
>          0xfee00380:W      58686    90.21%    15.67%      4.95us ( +-   2.96% )
>          0xfee00300:R       2124     3.26%     1.48%     12.93us ( +-  14.75% )
>          0xfee00310:W       2124     3.26%     0.34%      3.00us ( +-   1.33% )
>          0xfee00300:W       2123     3.26%    82.50%    720.68us ( +-  10.24% )
>
> Total Samples:65057, Total events handled time:1854470.45us.
>
> See the ioport event:
> # ./perf kvm stat report --event=ioport
>
> Analyze events for all VCPUs:
>
>        IO Port Access    Samples  Samples%     Time%         Avg time
>
>           0xc090:POUT        383   100.00%   100.00%     89.00us ( +-  42.94% )
>
> Total Samples:383, Total events handled time:34085.56us.
>
> And, --vcpu is used to track the specified vcpu and --key is used to sort the
> result:
> # ./perf kvm stat report --event=vmexit --vcpu=0 --key=time
>
> Analyze events for VCPU 0:
>
>               VM-EXIT    Samples  Samples%     Time%         Avg time
>
>                   HLT        551     5.05%    94.81%   9501.72us ( +-  12.52% )
>    EXTERNAL_INTERRUPT       1390    12.74%     2.39%     94.80us ( +-  20.92% )
>           APIC_ACCESS       6186    56.68%     2.62%     23.41us ( +-  23.62% )
>        IO_INSTRUCTION         17     0.16%     0.01%     20.39us ( +-  22.33% )
>         EXCEPTION_NMI         94     0.86%     0.01%      6.07us ( +-   7.13% )
>     PENDING_INTERRUPT        199     1.82%     0.02%      5.48us ( +-   4.36% )
>             CR_ACCESS         52     0.48%     0.00%      4.89us ( +-   4.09% )
>         EPT_VIOLATION       2057    18.85%     0.12%      3.15us ( +-   1.33% )
>                 CPUID        368     3.37%     0.02%      2.82us ( +-   2.79% )
>
> Total Samples:10914, Total events handled time:5521782.02us.
>
> [ Dong Hao <haodong@linux.vnet.ibm.com>:
>       - rebase it on current acme's tree
>       - fix the compiling-error on i386
> ]
>
> Signed-off-by: Xiao Guangrong <xiaoguangrong@linux.vnet.ibm.com>
> Signed-off-by: Dong Hao <haodong@linux.vnet.ibm.com>
> ---
>   tools/perf/Documentation/perf-kvm.txt |   30 +-
>   tools/perf/MANIFEST                   |    3 +
>   tools/perf/builtin-kvm.c              |  889 ++++++++++++++++++++++++++++++++-
>   tools/perf/util/header.c              |   54 ++-
>   tools/perf/util/header.h              |    1 +
>   tools/perf/util/thread.h              |    2 +
>   6 files changed, 973 insertions(+), 6 deletions(-)
>
> diff --git a/tools/perf/Documentation/perf-kvm.txt b/tools/perf/Documentation/perf-kvm.txt
> index dd84cb2..d52feef 100644
> --- a/tools/perf/Documentation/perf-kvm.txt
> +++ b/tools/perf/Documentation/perf-kvm.txt
> @@ -12,7 +12,7 @@ SYNOPSIS
>   	[--guestkallsyms=<path> --guestmodules=<path> | --guestvmlinux=<path>]]
>   	{top|record|report|diff|buildid-list}
>   'perf kvm' [--host] [--guest] [--guestkallsyms=<path> --guestmodules=<path>
> -	| --guestvmlinux=<path>] {top|record|report|diff|buildid-list}
> +	| --guestvmlinux=<path>] {top|record|report|diff|buildid-list|stat}
>
>   DESCRIPTION
>   -----------
> @@ -38,6 +38,18 @@ There are a couple of variants of perf kvm:
>     so that other tools can be used to fetch packages with matching symbol tables
>     for use by perf report.
>
> +  'perf kvm stat <command>' to run a command and gather performance counter
> +   statistics.
> +  Especially, perf 'kvm stat record/report' generates a statistical analysis
> +  of KVM events. Currently, vmexit, mmio and ioport events are supported.
> +    'perf kvm stat record <command>' records kvm events and the events between
> +    start and end <command>.
> +    And this command produces a file which contains tracing results of kvm
> +    events.
> +
> +    'perf kvm stat report' reports statistical data which includes events
> +    handled time, samples, and so on.
> +
>   OPTIONS
>   -------
>   -i::
> @@ -68,7 +80,21 @@ OPTIONS
>   --guestvmlinux=<path>::
>   	Guest os kernel vmlinux.
>
> +STAT REPORT OPTIONS
> +-------------------
> +--vcpu=<value>::
> +	analyze events which occures on this vcpu. (default: all vcpus)
> +
> +--events=<value>::
> +	events to be analyzed. Possible values: vmexit, mmio, ioport.
> +	(default: vmexit)
> +-k::
> +--key=<value>::
> +	Sorting key. Possible values: sample (default, sort by samples
> +	number), time (sort by average time).
> +
>   SEE ALSO
>   --------
>   linkperf:perf-top[1], linkperf:perf-record[1], linkperf:perf-report[1],
> -linkperf:perf-diff[1], linkperf:perf-buildid-list[1]
> +linkperf:perf-diff[1], linkperf:perf-buildid-list[1],
> +linkperf:perf-stat[1]
> diff --git a/tools/perf/MANIFEST b/tools/perf/MANIFEST
> index b4b572e..6896648 100644
> --- a/tools/perf/MANIFEST
> +++ b/tools/perf/MANIFEST
> @@ -15,3 +15,6 @@ arch/*/lib/memset*.S
>   include/linux/poison.h
>   include/linux/magic.h
>   include/linux/hw_breakpoint.h
> +arch/x86/include/asm/svm.h
> +arch/x86/include/asm/vmx.h
> +arch/x86/include/asm/kvm_host.h
> diff --git a/tools/perf/builtin-kvm.c b/tools/perf/builtin-kvm.c
> index 9fc6e0f..5396444 100644
> --- a/tools/perf/builtin-kvm.c
> +++ b/tools/perf/builtin-kvm.c
> @@ -1,6 +1,7 @@
>   #include "builtin.h"
>   #include "perf.h"
>
> +#include "util/evsel.h"
>   #include "util/util.h"
>   #include "util/cache.h"
>   #include "util/symbol.h"
> @@ -10,8 +11,9 @@
>
>   #include "util/parse-options.h"
>   #include "util/trace-event.h"
> -
>   #include "util/debug.h"
> +#include "util/debugfs.h"
> +#include "util/tool.h"
>
>   #include <sys/prctl.h>
>
> @@ -19,11 +21,890 @@
>   #include <pthread.h>
>   #include <math.h>
>
> -static const char		*file_name;
> +#include "../../arch/x86/include/asm/svm.h"
> +#include "../../arch/x86/include/asm/vmx.h"
> +#include "../../arch/x86/include/asm/kvm_host.h"
> +
> +struct event_key {
> +	#define INVALID_KEY	(~0ULL)
> +	u64 key;
> +	int info;
> +};
> +
> +struct kvm_events_ops {
> +	bool (*is_begin_event)(struct event_format *event, void *data,
> +			       struct event_key *key);
> +	bool (*is_end_event)(struct event_format *event, void *data,
> +			     struct event_key *key);
> +	void (*decode_key)(struct event_key *key, char decode[20]);
> +	const char *name;
> +};
> +
> +static void exit_event_get_key(struct event_format *event, void *data,
> +			       struct event_key *key)
> +{
> +	key->info = 0;
> +	key->key = raw_field_value(event, "exit_reason", data);
> +}
> +
> +static bool kvm_exit_event(struct event_format *event)
> +{
> +	return !strcmp(event->name, "kvm_exit");
> +}
> +
> +static bool exit_event_begin(struct event_format *event, void *data,
> +			     struct event_key *key)
> +{
> +	if (kvm_exit_event(event)) {
> +		exit_event_get_key(event, data, key);
> +		return true;
> +	}
> +
> +	return false;
> +}
> +
> +static bool kvm_entry_event(struct event_format *event)
> +{
> +	return !strcmp(event->name, "kvm_entry");
> +}
> +
> +static bool exit_event_end(struct event_format *event, void *data __unused,
> +			   struct event_key *key __unused)
> +{
> +	return kvm_entry_event(event);
> +}
> +
> +struct exit_reasons_table {
> +	unsigned long exit_code;
> +	const char *reason;
> +};
> +
> +struct exit_reasons_table vmx_exit_reasons[] = {
> +	VMX_EXIT_REASONS
> +};
> +
> +struct exit_reasons_table svm_exit_reasons[] = {
> +	SVM_EXIT_REASONS
> +};
> +
> +static int cpu_isa;
> +
> +static const char *get_exit_reason(u64 exit_code)
> +{
> +	int table_size = ARRAY_SIZE(svm_exit_reasons);
> +	struct exit_reasons_table *table = svm_exit_reasons;
> +
> +	if (cpu_isa == 1) {
> +		table = vmx_exit_reasons;
> +		table_size = ARRAY_SIZE(vmx_exit_reasons);
> +	}
> +
> +	while (table_size--) {
> +		if (table->exit_code == exit_code)
> +			return table->reason;
> +		table++;
> +	}
> +
> +	pr_err("unknown kvm exit code:%lld on %s\n",
> +	       (unsigned long long)exit_code, cpu_isa ? "VMX" : "SVM");
> +	return "UNKNOWN";
> +}
> +
> +static void exit_event_decode_key(struct event_key *key, char decode[20])
> +{
> +	const char *exit_reason = get_exit_reason(key->key);
> +
> +	snprintf(decode, 20, "%s", exit_reason);
> +}
> +
> +static struct kvm_events_ops exit_events = {
> +	.is_begin_event = exit_event_begin,
> +	.is_end_event = exit_event_end,
> +	.decode_key = exit_event_decode_key,
> +	.name = "VM-EXIT"
> +};
> +
> +/*
> + * For the old kernel, we treat:
> + * the time of MMIO write: kvm_mmio(KVM_TRACE_MMIO_WRITE...) -> kvm_entry
> + * the time of MMIO read: kvm_exit -> kvm_mmio(KVM_TRACE_MMIO_READ...).
> + *
> + * For the new kernel, we use kvm_mmio_begin and kvm_io_done to make
> + * things better.
> + */
> +static void mmio_event_get_key(struct event_format *event, void *data,
> +			       struct event_key *key)
> +{
> +	key->key = raw_field_value(event, "gpa", data);
> +	key->info = raw_field_value(event, "type", data);
> +}
> +
> +#define KVM_TRACE_MMIO_READ_UNSATISFIED 0
> +#define KVM_TRACE_MMIO_READ 1
> +#define KVM_TRACE_MMIO_WRITE 2
> +
> +static bool kvm_io_done_event(struct event_format *event)
> +{
> +	return !strcmp(event->name, "kvm_io_done");
> +}
> +
> +static bool mmio_event_begin(struct event_format *event, void *data,
> +			     struct event_key *key)
> +{
> +	/* MMIO read begin in old kernel. */
> +	if (kvm_exit_event(event))
> +		return true;
> +
> +	/* MMIO write begin in old kernel. */
> +	if (!strcmp(event->name, "kvm_mmio") &&
> +	      raw_field_value(event, "type", data) == KVM_TRACE_MMIO_WRITE) {
> +		mmio_event_get_key(event, data, key);
> +		return true;
> +	}
> +
> +	/* MMIO read/write begin in new kernel. */
> +	if (!strcmp(event->name, "kvm_mmio_begin")) {
> +		mmio_event_get_key(event, data, key);
> +		return true;
> +	}
> +
> +	return false;
> +}
> +
> +static bool mmio_event_end(struct event_format *event,  void *data,
> +			   struct event_key *key)
> +{
> +	/* MMIO write end in old kernel. */
> +	if (kvm_entry_event(event))
> +		return true;
> +
> +	/* MMIO read end in the old kernel.*/
> +	if (!strcmp(event->name, "kvm_mmio") &&
> +	      raw_field_value(event, "type", data) == KVM_TRACE_MMIO_READ) {
> +		mmio_event_get_key(event, data, key);
> +		return true;
> +	}
> +
> +	/* MMIO read/write end event in the new kernel.*/
> +	return kvm_io_done_event(event);
> +}
> +
> +static void mmio_event_decode_key(struct event_key *key, char decode[20])
> +{
> +	snprintf(decode, 20, "%#lx:%s", (unsigned long)key->key,
> +			      key->info == KVM_TRACE_MMIO_WRITE ? "W" : "R");
> +}
> +
> +static struct kvm_events_ops mmio_events = {
> +	.is_begin_event = mmio_event_begin,
> +	.is_end_event = mmio_event_end,
> +	.decode_key = mmio_event_decode_key,
> +	.name = "MMIO Access"
> +};
> +
> +/*
> + * For the old kernel, the time of emulation pio access is from kvm_pio to
> + * kvm_entry. In the new kernel, the end time is indicated by kvm_io_done.
> + */
> +static void ioport_event_get_key(struct event_format *event, void *data,
> +				 struct event_key *key)
> +{
> +	key->key = raw_field_value(event, "port", data);
> +	key->info = raw_field_value(event, "rw", data);
> +}
> +
> +static bool ioport_event_begin(struct event_format *event, void *data,
> +			       struct event_key *key)
> +{
> +	if (!strcmp(event->name, "kvm_pio")) {
> +		ioport_event_get_key(event, data, key);
> +		return true;
> +	}
> +
> +	return false;
> +}
> +
> +static bool ioport_event_end(struct event_format *event, void *data __unused,
> +			     struct event_key *key __unused)
> +{
> +	if (kvm_entry_event(event))
> +		return true;
> +
> +	return kvm_io_done_event(event);
> +}
> +
> +static void ioport_event_decode_key(struct event_key *key, char decode[20])
> +{
> +	snprintf(decode, 20, "%#llx:%s", (unsigned long long)key->key,
> +		 key->info ? "POUT" : "PIN");
> +}
> +
> +static struct kvm_events_ops ioport_events = {
> +	.is_begin_event = ioport_event_begin,
> +	.is_end_event = ioport_event_end,
> +	.decode_key = ioport_event_decode_key,
> +	.name = "IO Port Access"
> +};
> +
> +static const char *report_event = "vmexit";
> +struct kvm_events_ops *events_ops;
> +
> +static bool register_kvm_events_ops(void)
> +{
> +	bool ret = true;
> +
> +	if (!strcmp(report_event, "vmexit"))
> +		events_ops = &exit_events;
> +	else if (!strcmp(report_event, "mmio"))
> +		events_ops = &mmio_events;
> +	else if (!strcmp(report_event, "ioport"))
> +		events_ops = &ioport_events;
> +	else {
> +		pr_err("Unknown report event:%s\n", report_event);
> +		ret = false;
> +	}
> +
> +	return ret;
> +}
> +
> +struct event_stats {
> +	u64 count;
> +	u64 time;
> +
> +	/* used to calculate stddev. */
> +	double mean;
> +	double M2;
> +};
> +
> +struct kvm_event {
> +	struct list_head hash_entry;
> +	struct rb_node rb;
> +
> +	struct event_key key;
> +
> +	struct event_stats total;
> +
> +	#define DEFAULT_VCPU_NUM 8
> +	int max_vcpu;
> +	struct event_stats *vcpu;
> +};
> +
> +struct vcpu_event_record {
> +	int vcpu_id;
> +	u64 start_time;
> +	struct kvm_event *last_event;
> +};
> +
> +#define EVENTS_BITS			12
> +#define EVENTS_CACHE_SIZE	(1UL << EVENTS_BITS)
> +
> +static u64 total_time;
> +static u64 total_count;
> +static struct list_head kvm_events_cache[EVENTS_CACHE_SIZE];
> +
> +static void init_kvm_event_record(void)
> +{
> +	int i;
> +
> +	for (i = 0; i < (int)EVENTS_CACHE_SIZE; i++)
> +		INIT_LIST_HEAD(&kvm_events_cache[i]);
> +}
> +
> +static int kvm_events_hash_fn(u64 key)
> +{
> +	return key & (EVENTS_CACHE_SIZE - 1);
> +}
> +
> +static bool kvm_event_expand(struct kvm_event *event, int vcpu_id)
> +{
> +	int old_max_vcpu = event->max_vcpu;
> +
> +	if (vcpu_id < event->max_vcpu)
> +		return true;
> +
> +	while (event->max_vcpu <= vcpu_id)
> +		event->max_vcpu += DEFAULT_VCPU_NUM;
> +
> +	event->vcpu = realloc(event->vcpu,
> +			      event->max_vcpu * sizeof(*event->vcpu));
> +	if (!event->vcpu) {
> +		pr_err("Not enough memory\n");
> +		return false;
> +	}
> +
> +	memset(event->vcpu + old_max_vcpu, 0,
> +	       (event->max_vcpu - old_max_vcpu) * sizeof(*event->vcpu));
> +	return true;
> +}
> +
> +static struct kvm_event *kvm_alloc_init_event(struct event_key *key)
> +{
> +	struct kvm_event *event;
> +
> +	event = zalloc(sizeof(*event));
> +	if (!event) {
> +		pr_err("Not enough memory\n");
> +		return NULL;
> +	}
> +
> +	event->key = *key;
> +	return event;
> +}
> +
> +static struct kvm_event *find_create_kvm_event(struct event_key *key)
> +{
> +	struct kvm_event *event;
> +	struct list_head *head;
> +
> +	BUG_ON(key->key == INVALID_KEY);
> +
> +	head = &kvm_events_cache[kvm_events_hash_fn(key->key)];
> +	list_for_each_entry(event, head, hash_entry)
> +		if (event->key.key == key->key && event->key.info == key->info)
> +			return event;
> +
> +	event = kvm_alloc_init_event(key);
> +	if (!event)
> +		return NULL;
> +
> +	list_add(&event->hash_entry, head);
> +	return event;
> +}
> +
> +static void handle_begin_event(struct vcpu_event_record *vcpu_record,
> +			       struct event_key *key, u64 timestamp)
> +{
> +	struct kvm_event *event = NULL;
> +
> +	if (key->key != INVALID_KEY)
> +		event = find_create_kvm_event(key);
> +
> +	vcpu_record->last_event = event;
> +	vcpu_record->start_time = timestamp;
> +}
> +
> +static void update_event_stats(struct event_stats *stats, u64 time_diff)
> +{
> +	double delta;
> +
> +	stats->count++;
> +	stats->time += time_diff;
> +
> +	delta = time_diff - stats->mean;
> +	stats->mean += delta / stats->count;
> +	stats->M2 += delta*(time_diff - stats->mean);
> +}
> +
> +static double event_stats_stddev(int vcpu_id, struct kvm_event *event)
> +{
> +	struct event_stats *stats = &event->total;
> +	double variance, variance_mean, stddev;
> +
> +	if (vcpu_id != -1)
> +		stats = &event->vcpu[vcpu_id];
> +
> +	BUG_ON(!stats->count);
> +
> +	variance = stats->M2 / (stats->count - 1);
> +	variance_mean = variance / stats->count;
> +	stddev = sqrt(variance_mean);
> +
> +	return stddev * 100 / stats->mean;
> +}
> +
> +static void update_kvm_event(struct kvm_event *event, int vcpu_id,
> +			     u64 time_diff)
> +{
> +	update_event_stats(&event->total, time_diff);
> +
> +	if (!kvm_event_expand(event, vcpu_id))
> +		return;
> +
> +	update_event_stats(&event->vcpu[vcpu_id], time_diff);
> +}
> +
> +static void handle_end_event(struct vcpu_event_record *vcpu_record,
> +			     struct event_key *key, u64 timestamp)
> +{
> +	struct kvm_event *event;
> +	u64 time_begin, time_diff;
> +
> +	event = vcpu_record->last_event;
> +	time_begin = vcpu_record->start_time;
> +
> +	/* The begin event is not caught. */
> +	if (!time_begin)
> +		return;
> +
> +	/*
> +	 * In some case, the 'begin event' only records the start timestamp,
> +	 * the actual event is recognized in the 'end event' (e.g. mmio-event
> +	 * in the old kernel).
> +	 */
> +
> +	/* Both begin and end events did not get the key. */
> +	if (!event && key->key == INVALID_KEY)
> +		return;
> +
> +	if (!event)
> +		event = find_create_kvm_event(key);
> +
> +	if (!event)
> +		return;
> +
> +	vcpu_record->last_event = NULL;
> +	vcpu_record->start_time = 0;
> +
> +	BUG_ON(timestamp < time_begin);
> +
> +	time_diff = timestamp - time_begin;
> +	update_kvm_event(event, vcpu_record->vcpu_id, time_diff);
> +}
> +
> +static struct vcpu_event_record
> +*per_vcpu_record(struct thread *thread, struct event_format *event, void *data)
> +{
> +	/* Only kvm_entry records vcpu id. */
> +	if (!thread->priv && kvm_entry_event(event)) {
> +		struct vcpu_event_record *vcpu_record;
> +
> +		vcpu_record = zalloc(sizeof(struct vcpu_event_record));
> +		if (!vcpu_record) {
> +			pr_err("Not enough memory\n");
> +			return NULL;
> +		}
> +
> +		vcpu_record->vcpu_id = raw_field_value(event, "vcpu_id", data);
> +		thread->priv = vcpu_record;
> +	}
> +
> +	return (struct vcpu_event_record *)thread->priv;
> +}
> +
> +static void handle_kvm_event(struct thread *thread, struct event_format *event,
> +			     void *data, u64 timestamp)
> +{
> +	struct vcpu_event_record *vcpu_record;
> +	struct event_key key = {.key = INVALID_KEY};
> +
> +	vcpu_record = per_vcpu_record(thread, event, data);
> +	if (!vcpu_record)
> +		return;
> +
> +	if (events_ops->is_begin_event(event, data, &key))
> +		return handle_begin_event(vcpu_record, &key, timestamp);
> +
> +	if (events_ops->is_end_event(event, data, &key))
> +		return handle_end_event(vcpu_record, &key, timestamp);
> +}
> +
> +typedef int (*key_cmp_fun)(struct kvm_event*, struct kvm_event*, int);
> +struct kvm_event_key {
> +	const char *name;
> +	key_cmp_fun key;
> +};
> +
> +static int trace_vcpu = -1;
> +#define GET_EVENT_KEY(member)						\
> +static u64 get_event_ ##member(struct kvm_event *event, int vcpu)	\
> +{									\
> +	if (vcpu == -1)							\
> +		return event->total.member;				\
> +									\
> +	if (vcpu >= event->max_vcpu)					\
> +		return 0;						\
> +									\
> +	return event->vcpu[vcpu].member;				\
> +}
> +
> +#define COMPARE_EVENT_KEY(member)					\
> +GET_EVENT_KEY(member)							\
> +static int compare_kvm_event_ ## member(struct kvm_event *one,		\
> +					struct kvm_event *two, int vcpu)\
> +{									\
> +	return get_event_ ##member(one, vcpu) >				\
> +				get_event_ ##member(two, vcpu);		\
> +}
> +
> +GET_EVENT_KEY(time);
> +COMPARE_EVENT_KEY(count);
> +COMPARE_EVENT_KEY(mean);
> +
> +#define DEF_SORT_NAME_KEY(name, compare_key)	\
> +	{ #name, compare_kvm_event_ ## compare_key }
> +
> +static struct kvm_event_key keys[] = {
> +	DEF_SORT_NAME_KEY(sample, count),
> +	DEF_SORT_NAME_KEY(time, mean),
> +	{ NULL, NULL }
> +};
> +
> +static const char *sort_key = "sample";
> +static key_cmp_fun compare;
> +
> +static bool select_key(void)
> +{
> +	int i;
> +
> +	for (i = 0; keys[i].name; i++) {
> +		if (!strcmp(keys[i].name, sort_key)) {
> +			compare = keys[i].key;
> +			return true;
> +		}
> +	}
> +
> +	pr_err("Unknown compare key:%s\n", sort_key);
> +	return false;
> +}
> +
> +static struct rb_root result;
> +static void insert_to_result(struct kvm_event *event, key_cmp_fun bigger,
> +			     int vcpu)
> +{
> +	struct rb_node **rb = &result.rb_node;
> +	struct rb_node *parent = NULL;
> +	struct kvm_event *p;
> +
> +	while (*rb) {
> +		p = container_of(*rb, struct kvm_event, rb);
> +		parent = *rb;
> +
> +		if (bigger(event, p, vcpu))
> +			rb = &(*rb)->rb_left;
> +		else
> +			rb = &(*rb)->rb_right;
> +	}
> +
> +	rb_link_node(&event->rb, parent, rb);
> +	rb_insert_color(&event->rb, &result);
> +}
> +
> +static void update_total_count(struct kvm_event *event, int vcpu)
> +{
> +	total_count += get_event_count(event, vcpu);
> +	total_time += get_event_time(event, vcpu);
> +}
> +
> +static bool event_is_valid(struct kvm_event *event, int vcpu)
> +{
> +	return !!get_event_count(event, vcpu);
> +}
> +
> +static void sort_result(int vcpu)
> +{
> +	unsigned int i;
> +	struct kvm_event *event;
> +
> +	for (i = 0; i < EVENTS_CACHE_SIZE; i++)
> +		list_for_each_entry(event, &kvm_events_cache[i], hash_entry)
> +			if (event_is_valid(event, vcpu)) {
> +				update_total_count(event, vcpu);
> +				insert_to_result(event, compare, vcpu);
> +			}
> +}
> +
> +/* returns left most element of result, and erase it */
> +static struct kvm_event *pop_from_result(void)
> +{
> +	struct rb_node *node = result.rb_node;
> +
> +	if (!node)
> +		return NULL;
> +
> +	while (node->rb_left)
> +		node = node->rb_left;
> +
> +	rb_erase(node, &result);
> +	return container_of(node, struct kvm_event, rb);
> +}
> +
> +static void print_vcpu_info(int vcpu)
> +{
> +	pr_info("Analyze events for ");
> +
> +	if (vcpu == -1)
> +		pr_info("all VCPUs:\n\n");
> +	else
> +		pr_info("VCPU %d:\n\n", vcpu);
> +}
> +
> +static void print_result(int vcpu)
> +{
> +	char decode[20];
> +	struct kvm_event *event;
> +
> +	pr_info("\n\n");
> +	print_vcpu_info(vcpu);
> +	pr_info("%20s ", events_ops->name);
> +	pr_info("%10s ", "Samples");
> +	pr_info("%9s ", "Samples%");
> +
> +	pr_info("%9s ", "Time%");
> +	pr_info("%16s ", "Avg time");
> +	pr_info("\n\n");
> +
> +	while ((event = pop_from_result())) {
> +		u64 ecount, etime;
> +
> +		ecount = get_event_count(event, vcpu);
> +		etime = get_event_time(event, vcpu);
> +
> +		events_ops->decode_key(&event->key, decode);
> +		pr_info("%20s ", decode);
> +		pr_info("%10llu ", (unsigned long long)ecount);
> +		pr_info("%8.2f%% ", (double)ecount / total_count * 100);
> +		pr_info("%8.2f%% ", (double)etime / total_time * 100);
> +		pr_info("%9.2fus ( +-%7.2f%% )", (double)etime / ecount/1e3,
> +			event_stats_stddev(trace_vcpu, event));
> +		pr_info("\n");
> +	}
> +
> +	pr_info("\nTotal Samples:%lld, Total events handled time:%.2fus.\n\n",
> +		(unsigned long long)total_count, total_time / 1e3);
> +}
> +
> +static int process_sample_event(struct perf_tool *tool __used,
> +				union perf_event *event,
> +				struct perf_sample *sample,
> +				struct perf_evsel *evsel,
> +				struct machine *machine)
> +{
> +	struct thread *thread = machine__findnew_thread(machine, sample->tid);
> +
> +	if (thread == NULL) {
> +		pr_debug("problem processing %d event, skipping it.\n",
> +			event->header.type);
> +		return -1;
> +	}
> +
> +	handle_kvm_event(thread, evsel->tp_format, sample->raw_data,
> +			 sample->time);
> +	return 0;
> +}
> +
> +static struct perf_tool eops = {
> +	.sample			= process_sample_event,
> +	.comm			= perf_event__process_comm,
> +	.ordered_samples	= true,
> +};
> +
> +static int get_cpu_isa(struct perf_session *session)
> +{
> +	char *cpuid;
> +	int isa;
> +
> +	cpuid = perf_header__read_feature(session, HEADER_CPUID);
> +
> +	if (!cpuid) {
> +		pr_err("read HEADER_CPUID failed.\n");
> +		return -ENOTSUP;
> +	}
> +
> +	if (strstr(cpuid, "Intel"))
> +		isa = 1;
> +	else if (strstr(cpuid, "AMD"))
> +		isa = 0;
> +	else {
> +		pr_err("CPU %s is not supported.\n", cpuid);
> +		isa = -ENOTSUP;
> +	}
> +
> +	free(cpuid);
> +	return isa;
> +}
> +
> +static const char *file_name;
> +
> +static int read_events(void)
> +{
> +	struct perf_session *kvm_session;
> +	int ret;
> +
> +	kvm_session = perf_session__new(file_name, O_RDONLY, 0, false, &eops);
> +	if (!kvm_session) {
> +		pr_err("Initializing perf session failed\n");
> +		return -EINVAL;
> +	}
> +
> +	if (!perf_session__has_traces(kvm_session, "kvm record"))
> +		return -EINVAL;
> +
> +	/*
> +	 * Do not use 'isa' recorded in kvm_exit tracepoint since it is not
> +	 * traced in the old kernel.
> +	 */
> +	ret = get_cpu_isa(kvm_session);
> +
> +	if (ret < 0)
> +		return ret;
> +
> +	cpu_isa = ret;
> +
> +	return perf_session__process_events(kvm_session, &eops);
> +}
> +
> +static bool verify_vcpu(int vcpu)
> +{
> +	if (vcpu != -1 && vcpu < 0) {
> +		pr_err("Invalid vcpu:%d.\n", vcpu);
> +		return false;
> +	}
> +
> +	return true;
> +}
> +
> +static int kvm_events_report_vcpu(int vcpu)
> +{
> +	int ret = -EINVAL;
> +
> +	if (!verify_vcpu(vcpu))
> +		goto exit;
> +
> +	if (!select_key())
> +		goto exit;
> +
> +	if (!register_kvm_events_ops())
> +		goto exit;
> +
> +	init_kvm_event_record();
> +	setup_pager();
> +
> +	ret = read_events();
> +	if (ret)
> +		goto exit;
> +
> +	sort_result(vcpu);
> +	print_result(vcpu);
> +exit:
> +	return ret;
> +}
> +
> +static const char * const record_args[] = {
> +	"record",
> +	"-R",
> +	"-f",
> +	"-m", "1024",
> +	"-c", "1",
> +	"-e", "kvm:kvm_entry",
> +	"-e", "kvm:kvm_exit",
> +	"-e", "kvm:kvm_mmio",
> +	"-e", "kvm:kvm_pio",
> +};
> +
> +static const char * const new_event[] = {
> +	"kvm_mmio_begin",
> +	"kvm_io_done"
> +};
> +
> +static bool kvm_record_specified_guest(int argc, const char **argv)
> +{
> +	int i;
> +
> +	for (i = 0; i < argc; i++)
> +		if (!strcmp(argv[i], "-p") || !strcmp(argv[i], "--pid"))
> +			return true;
> +
> +	return false;
> +}
> +
> +#define STRDUP_FAIL_EXIT(s)		\
> +	({	char *_p;		\
> +		_p = strdup(s);		\
> +		if (!_p)		\
> +			return -ENOMEM;	\
> +		_p;			\
> +	})
> +
> +static int kvm_events_record(int argc, const char **argv)
> +{
> +	unsigned int rec_argc, i, j;
> +	const char **rec_argv;
> +
> +	rec_argc = ARRAY_SIZE(record_args) + argc + 2;
> +	rec_argc += ARRAY_SIZE(new_event) * 2;
> +	rec_argv = calloc(rec_argc + 1, sizeof(char *));
> +
> +	if (rec_argv == NULL)
> +		return -ENOMEM;
> +
> +	for (i = 0; i < ARRAY_SIZE(record_args); i++)
> +		rec_argv[i] = STRDUP_FAIL_EXIT(record_args[i]);
> +
> +	/*
> +	 * Append "-a" only if "-p"/"--pid" is not specified since they
> +	 * are mutually exclusive.
> +	 */
> +	if (!kvm_record_specified_guest(argc, argv))
> +		rec_argv[i++] = STRDUP_FAIL_EXIT("-a");
> +
> +	rec_argv[i++] = STRDUP_FAIL_EXIT("-o");
> +	rec_argv[i++] = STRDUP_FAIL_EXIT(file_name);
> +
> +	for (j = 0; j < ARRAY_SIZE(new_event); j++)
> +		if (is_valid_tracepoint(new_event[j])) {
> +			char event[256];
> +
> +			sprintf(event, "kvm:%s", new_event[j]);
> +
> +			rec_argv[i++] = STRDUP_FAIL_EXIT("-e");
> +			rec_argv[i++] = STRDUP_FAIL_EXIT(event);
> +		}
> +
> +	for (j = 1; j < (unsigned int)argc; j++, i++)
> +		rec_argv[i] = argv[j];
> +
> +	return cmd_record(i, rec_argv, NULL);
> +}
> +
> +static const char * const kvm_events_report_usage[] = {
> +	"perf kvm stat report [<options>]",
> +	NULL
> +};
> +
> +static const struct option kvm_events_report_options[] = {
> +	OPT_STRING(0, "event", &report_event, "report event",
> +		    "event for reporting: vmexit, mmio, ioport"),
> +	OPT_INTEGER(0, "vcpu", &trace_vcpu,
> +		    "vcpu id to report"),
> +	OPT_STRING('k', "key", &sort_key, "sort-key",
> +		    "key for sorting: sample(sort by samples number)"
> +		    " time (sort by avg time)"),
> +	OPT_END()
> +};
> +
> +static int kvm_events_report(int argc, const char **argv)
> +{
> +	symbol__init();
> +
> +	if (argc) {
> +		argc = parse_options(argc, argv,
> +				     kvm_events_report_options,
> +				     kvm_events_report_usage, 0);
> +		if (argc)
> +			usage_with_options(kvm_events_report_usage,
> +					   kvm_events_report_options);
> +	}
> +
> +	return kvm_events_report_vcpu(trace_vcpu);
> +}
> +
> +static int kvm_cmd_stat(int argc, const char **argv)
> +{
> +	if (argc > 1) {
> +		if (!strncmp(argv[1], "rec", 3))
> +			return kvm_events_record(argc - 1, argv + 1);
> +
> +		if (!strncmp(argv[1], "rep", 3))
> +			return kvm_events_report(argc - 1 , argv + 1);
> +	}
> +
> +	return cmd_stat(argc, argv, NULL);
> +}
> +
>   static char			name_buffer[256];
>
>   static const char * const kvm_usage[] = {
> -	"perf kvm [<options>] {top|record|report|diff|buildid-list}",
> +	"perf kvm [<options>] {top|record|report|diff|buildid-list|stat}",
>   	NULL
>   };
>
> @@ -135,6 +1016,8 @@ int cmd_kvm(int argc, const char **argv, const char *prefix __used)
>   		return cmd_top(argc, argv, NULL);
>   	else if (!strncmp(argv[0], "buildid-list", 12))
>   		return __cmd_buildid_list(argc, argv);
> +	else if (!strncmp(argv[0], "stat", 4))
> +		return kvm_cmd_stat(argc, argv);
>   	else
>   		usage_with_options(kvm_usage, kvm_options);
>
> diff --git a/tools/perf/util/header.c b/tools/perf/util/header.c
> index 1e5b6aa..c684aba 100644
> --- a/tools/perf/util/header.c
> +++ b/tools/perf/util/header.c
> @@ -1504,9 +1504,15 @@ static int process_build_id(struct perf_file_section *section,
>   	return 0;
>   }
>
> +static char *read_cpuid(struct perf_header *ph, int fd)
> +{
> +	return do_read_string(fd, ph);
> +}
> +
>   struct feature_ops {
>   	int (*write)(int fd, struct perf_header *h, struct perf_evlist *evlist);
>   	void (*print)(struct perf_header *h, int fd, FILE *fp);
> +	char *(*read)(struct perf_header *h, int fd);
>   	int (*process)(struct perf_file_section *section,
>   		       struct perf_header *h, int feat, int fd, void *data);
>   	const char *name;
> @@ -1521,6 +1527,9 @@ struct feature_ops {
>   #define FEAT_OPF(n, func) \
>   	[n] = { .name = #n, .write = write_##func, .print = print_##func, \
>   		.full_only = true }
> +#define FEAT_OPA_R(n, func) \
> +	[n] = { .name = #n, .write = write_##func, .print = print_##func, \
> +		.read  = read_##func }
>
>   /* feature_ops not implemented: */
>   #define print_tracing_data	NULL
> @@ -1535,7 +1544,7 @@ static const struct feature_ops feat_ops[HEADER_LAST_FEATURE] = {
>   	FEAT_OPA(HEADER_ARCH,		arch),
>   	FEAT_OPA(HEADER_NRCPUS,		nrcpus),
>   	FEAT_OPA(HEADER_CPUDESC,	cpudesc),
> -	FEAT_OPA(HEADER_CPUID,		cpuid),
> +	FEAT_OPA_R(HEADER_CPUID,	cpuid),
>   	FEAT_OPA(HEADER_TOTAL_MEM,	total_mem),
>   	FEAT_OPA(HEADER_EVENT_DESC,	event_desc),
>   	FEAT_OPA(HEADER_CMDLINE,	cmdline),
> @@ -1589,6 +1598,49 @@ int perf_header__fprintf_info(struct perf_session *session, FILE *fp, bool full)
>   	return 0;
>   }
>
> +struct header_read_data {
> +	int feat;
> +	char *result;
> +};
> +
> +static int perf_file_section__read_feature(struct perf_file_section *section,
> +					   struct perf_header *ph,
> +					   int feat, int fd, void *data)
> +{
> +	struct header_read_data *hd = data;
> +
> +	if (feat != hd->feat)
> +		return 0;
> +
> +	if (lseek(fd, section->offset, SEEK_SET) == (off_t)-1) {
> +		pr_debug("Failed to lseek to %" PRIu64 " offset for feature "
> +				"%d, continuing...\n", section->offset, feat);
> +		return 0;
> +	}
> +
> +	if (feat >= HEADER_LAST_FEATURE) {
> +		pr_warning("unknown feature %d\n", feat);
> +		return 0;
> +	}
> +
> +	hd->result = feat_ops[feat].read(ph, fd);
> +	return 0;
> +}
> +
> +char *perf_header__read_feature(struct perf_session *session, int feat)
> +{
> +	struct perf_header *header = &session->header;
> +	struct header_read_data hd;
> +	int fd = session->fd;
> +
> +	hd.feat = feat;
> +	hd.result = NULL;
> +
> +	perf_header__process_sections(header, fd, &hd,
> +				  perf_file_section__read_feature);
> +	return hd.result;
> +}
> +
>   static int do_write_feat(int fd, struct perf_header *h, int type,
>   			 struct perf_file_section **p,
>   			 struct perf_evlist *evlist)
> diff --git a/tools/perf/util/header.h b/tools/perf/util/header.h
> index 24962e7..3b65aca 100644
> --- a/tools/perf/util/header.h
> +++ b/tools/perf/util/header.h
> @@ -93,6 +93,7 @@ int perf_header__process_sections(struct perf_header *header, int fd,
>   				  int feat, int fd, void *data));
>
>   int perf_header__fprintf_info(struct perf_session *s, FILE *fp, bool full);
> +char *perf_header__read_feature(struct perf_session *session, int feat);
>
>   int build_id_cache__add_s(const char *sbuild_id, const char *debugdir,
>   			  const char *name, bool is_kallsyms);
> diff --git a/tools/perf/util/thread.h b/tools/perf/util/thread.h
> index 70c2c13..f66610b 100644
> --- a/tools/perf/util/thread.h
> +++ b/tools/perf/util/thread.h
> @@ -16,6 +16,8 @@ struct thread {
>   	bool			comm_set;
>   	char			*comm;
>   	int			comm_len;
> +
> +	void			*priv;
>   };
>
>   struct machine;
>


^ permalink raw reply	[flat|nested] 26+ messages in thread

* [PATCH v7 3/3] KVM: perf: kvm events analysis tool
  2012-08-24  1:15 Dong Hao
@ 2012-08-24  1:15 ` Dong Hao
  2012-08-24 17:53   ` David Ahern
  0 siblings, 1 reply; 26+ messages in thread
From: Dong Hao @ 2012-08-24  1:15 UTC (permalink / raw)
  To: avi, acme, mtosatti, mingo; +Cc: xiaoguangrong, linux-kernel, kvm

From: Xiao Guangrong <xiaoguangrong@linux.vnet.ibm.com>

Add 'perf kvm stat' support to analyze kvm vmexit/mmio/ioport smartly

Usage:
- kvm stat
  run a command and gather performance counter statistics, it is the alias of
  perf stat

- trace kvm events:
  perf kvm stat record, or, if other tracepoints are interesting as well, we
  can append the events like this:
  perf kvm stat record -e timer:*
  If many guests are running, we can track the specified guest by using -p or
  --pid

- show the result:
  perf kvm stat report

The output example is following:
# pgrep qemu-kvm
26071
32253
32564

total 3 guests are running on the host

Then, track the guest whose pid is 26071:
# ./perf kvm stat record -p 26071
^C[ perf record: Woken up 9 times to write data ]
[ perf record: Captured and wrote 24.903 MB perf.data.guest (~1088034 samples) ]

See the vmexit events:
# ./perf kvm stat report --event=vmexit

Analyze events for all VCPUs:

             VM-EXIT    Samples  Samples%     Time%         Avg time

         APIC_ACCESS      65381    66.58%     5.95%     37.72us ( +-   6.54% )
  EXTERNAL_INTERRUPT      16031    16.32%     3.06%     79.11us ( +-   7.34% )
               CPUID       5360     5.46%     0.06%      4.50us ( +-  35.07% )
                 HLT       4496     4.58%    90.75%   8360.34us ( +-   5.22% )
       EPT_VIOLATION       2667     2.72%     0.04%      5.49us ( +-   5.05% )
   PENDING_INTERRUPT       2242     2.28%     0.03%      5.25us ( +-   2.96% )
       EXCEPTION_NMI       1332     1.36%     0.02%      6.53us ( +-   6.51% )
      IO_INSTRUCTION        383     0.39%     0.09%     93.39us ( +-  40.92% )
           CR_ACCESS        310     0.32%     0.00%      6.10us ( +-   3.95% )

Total Samples:98202, Total events handled time:41419293.63us.

See the mmio events:
# ./perf kvm stat report --event=mmio

Analyze events for all VCPUs:

         MMIO Access    Samples  Samples%     Time%         Avg time

        0xfee00380:W      58686    90.21%    15.67%      4.95us ( +-   2.96% )
        0xfee00300:R       2124     3.26%     1.48%     12.93us ( +-  14.75% )
        0xfee00310:W       2124     3.26%     0.34%      3.00us ( +-   1.33% )
        0xfee00300:W       2123     3.26%    82.50%    720.68us ( +-  10.24% )

Total Samples:65057, Total events handled time:1854470.45us.

See the ioport event:
# ./perf kvm stat report --event=ioport

Analyze events for all VCPUs:

      IO Port Access    Samples  Samples%     Time%         Avg time

         0xc090:POUT        383   100.00%   100.00%     89.00us ( +-  42.94% )

Total Samples:383, Total events handled time:34085.56us.

And, --vcpu is used to track the specified vcpu and --key is used to sort the
result:
# ./perf kvm stat report --event=vmexit --vcpu=0 --key=time

Analyze events for VCPU 0:

             VM-EXIT    Samples  Samples%     Time%         Avg time

                 HLT        551     5.05%    94.81%   9501.72us ( +-  12.52% )
  EXTERNAL_INTERRUPT       1390    12.74%     2.39%     94.80us ( +-  20.92% )
         APIC_ACCESS       6186    56.68%     2.62%     23.41us ( +-  23.62% )
      IO_INSTRUCTION         17     0.16%     0.01%     20.39us ( +-  22.33% )
       EXCEPTION_NMI         94     0.86%     0.01%      6.07us ( +-   7.13% )
   PENDING_INTERRUPT        199     1.82%     0.02%      5.48us ( +-   4.36% )
           CR_ACCESS         52     0.48%     0.00%      4.89us ( +-   4.09% )
       EPT_VIOLATION       2057    18.85%     0.12%      3.15us ( +-   1.33% )
               CPUID        368     3.37%     0.02%      2.82us ( +-   2.79% )

Total Samples:10914, Total events handled time:5521782.02us.

[ Dong Hao <haodong@linux.vnet.ibm.com>:
     - rebase it on current acme's tree
     - fix the compiling-error on i386
]

Signed-off-by: Xiao Guangrong <xiaoguangrong@linux.vnet.ibm.com>
Signed-off-by: Dong Hao <haodong@linux.vnet.ibm.com>
---
 tools/perf/Documentation/perf-kvm.txt |   30 +-
 tools/perf/MANIFEST                   |    3 +
 tools/perf/builtin-kvm.c              |  889 ++++++++++++++++++++++++++++++++-
 tools/perf/util/header.c              |   54 ++-
 tools/perf/util/header.h              |    1 +
 tools/perf/util/thread.h              |    2 +
 6 files changed, 973 insertions(+), 6 deletions(-)

diff --git a/tools/perf/Documentation/perf-kvm.txt b/tools/perf/Documentation/perf-kvm.txt
index dd84cb2..d52feef 100644
--- a/tools/perf/Documentation/perf-kvm.txt
+++ b/tools/perf/Documentation/perf-kvm.txt
@@ -12,7 +12,7 @@ SYNOPSIS
 	[--guestkallsyms=<path> --guestmodules=<path> | --guestvmlinux=<path>]]
 	{top|record|report|diff|buildid-list}
 'perf kvm' [--host] [--guest] [--guestkallsyms=<path> --guestmodules=<path>
-	| --guestvmlinux=<path>] {top|record|report|diff|buildid-list}
+	| --guestvmlinux=<path>] {top|record|report|diff|buildid-list|stat}
 
 DESCRIPTION
 -----------
@@ -38,6 +38,18 @@ There are a couple of variants of perf kvm:
   so that other tools can be used to fetch packages with matching symbol tables
   for use by perf report.
 
+  'perf kvm stat <command>' to run a command and gather performance counter
+   statistics.
+  Especially, perf 'kvm stat record/report' generates a statistical analysis
+  of KVM events. Currently, vmexit, mmio and ioport events are supported.
+    'perf kvm stat record <command>' records kvm events and the events between
+    start and end <command>.
+    And this command produces a file which contains tracing results of kvm
+    events.
+
+    'perf kvm stat report' reports statistical data which includes events
+    handled time, samples, and so on.
+
 OPTIONS
 -------
 -i::
@@ -68,7 +80,21 @@ OPTIONS
 --guestvmlinux=<path>::
 	Guest os kernel vmlinux.
 
+STAT REPORT OPTIONS
+-------------------
+--vcpu=<value>::
+	analyze events which occures on this vcpu. (default: all vcpus)
+
+--events=<value>::
+	events to be analyzed. Possible values: vmexit, mmio, ioport.
+	(default: vmexit)
+-k::
+--key=<value>::
+	Sorting key. Possible values: sample (default, sort by samples
+	number), time (sort by average time).
+
 SEE ALSO
 --------
 linkperf:perf-top[1], linkperf:perf-record[1], linkperf:perf-report[1],
-linkperf:perf-diff[1], linkperf:perf-buildid-list[1]
+linkperf:perf-diff[1], linkperf:perf-buildid-list[1],
+linkperf:perf-stat[1]
diff --git a/tools/perf/MANIFEST b/tools/perf/MANIFEST
index b4b572e..6896648 100644
--- a/tools/perf/MANIFEST
+++ b/tools/perf/MANIFEST
@@ -15,3 +15,6 @@ arch/*/lib/memset*.S
 include/linux/poison.h
 include/linux/magic.h
 include/linux/hw_breakpoint.h
+arch/x86/include/asm/svm.h
+arch/x86/include/asm/vmx.h
+arch/x86/include/asm/kvm_host.h
diff --git a/tools/perf/builtin-kvm.c b/tools/perf/builtin-kvm.c
index 9fc6e0f..5396444 100644
--- a/tools/perf/builtin-kvm.c
+++ b/tools/perf/builtin-kvm.c
@@ -1,6 +1,7 @@
 #include "builtin.h"
 #include "perf.h"
 
+#include "util/evsel.h"
 #include "util/util.h"
 #include "util/cache.h"
 #include "util/symbol.h"
@@ -10,8 +11,9 @@
 
 #include "util/parse-options.h"
 #include "util/trace-event.h"
-
 #include "util/debug.h"
+#include "util/debugfs.h"
+#include "util/tool.h"
 
 #include <sys/prctl.h>
 
@@ -19,11 +21,890 @@
 #include <pthread.h>
 #include <math.h>
 
-static const char		*file_name;
+#include "../../arch/x86/include/asm/svm.h"
+#include "../../arch/x86/include/asm/vmx.h"
+#include "../../arch/x86/include/asm/kvm_host.h"
+
+struct event_key {
+	#define INVALID_KEY	(~0ULL)
+	u64 key;
+	int info;
+};
+
+struct kvm_events_ops {
+	bool (*is_begin_event)(struct event_format *event, void *data,
+			       struct event_key *key);
+	bool (*is_end_event)(struct event_format *event, void *data,
+			     struct event_key *key);
+	void (*decode_key)(struct event_key *key, char decode[20]);
+	const char *name;
+};
+
+static void exit_event_get_key(struct event_format *event, void *data,
+			       struct event_key *key)
+{
+	key->info = 0;
+	key->key = raw_field_value(event, "exit_reason", data);
+}
+
+static bool kvm_exit_event(struct event_format *event)
+{
+	return !strcmp(event->name, "kvm_exit");
+}
+
+static bool exit_event_begin(struct event_format *event, void *data,
+			     struct event_key *key)
+{
+	if (kvm_exit_event(event)) {
+		exit_event_get_key(event, data, key);
+		return true;
+	}
+
+	return false;
+}
+
+static bool kvm_entry_event(struct event_format *event)
+{
+	return !strcmp(event->name, "kvm_entry");
+}
+
+static bool exit_event_end(struct event_format *event, void *data __unused,
+			   struct event_key *key __unused)
+{
+	return kvm_entry_event(event);
+}
+
+struct exit_reasons_table {
+	unsigned long exit_code;
+	const char *reason;
+};
+
+struct exit_reasons_table vmx_exit_reasons[] = {
+	VMX_EXIT_REASONS
+};
+
+struct exit_reasons_table svm_exit_reasons[] = {
+	SVM_EXIT_REASONS
+};
+
+static int cpu_isa;
+
+static const char *get_exit_reason(u64 exit_code)
+{
+	int table_size = ARRAY_SIZE(svm_exit_reasons);
+	struct exit_reasons_table *table = svm_exit_reasons;
+
+	if (cpu_isa == 1) {
+		table = vmx_exit_reasons;
+		table_size = ARRAY_SIZE(vmx_exit_reasons);
+	}
+
+	while (table_size--) {
+		if (table->exit_code == exit_code)
+			return table->reason;
+		table++;
+	}
+
+	pr_err("unknown kvm exit code:%lld on %s\n",
+	       (unsigned long long)exit_code, cpu_isa ? "VMX" : "SVM");
+	return "UNKNOWN";
+}
+
+static void exit_event_decode_key(struct event_key *key, char decode[20])
+{
+	const char *exit_reason = get_exit_reason(key->key);
+
+	snprintf(decode, 20, "%s", exit_reason);
+}
+
+static struct kvm_events_ops exit_events = {
+	.is_begin_event = exit_event_begin,
+	.is_end_event = exit_event_end,
+	.decode_key = exit_event_decode_key,
+	.name = "VM-EXIT"
+};
+
+/*
+ * For the old kernel, we treat:
+ * the time of MMIO write: kvm_mmio(KVM_TRACE_MMIO_WRITE...) -> kvm_entry
+ * the time of MMIO read: kvm_exit -> kvm_mmio(KVM_TRACE_MMIO_READ...).
+ *
+ * For the new kernel, we use kvm_mmio_begin and kvm_io_done to make
+ * things better.
+ */
+static void mmio_event_get_key(struct event_format *event, void *data,
+			       struct event_key *key)
+{
+	key->key = raw_field_value(event, "gpa", data);
+	key->info = raw_field_value(event, "type", data);
+}
+
+#define KVM_TRACE_MMIO_READ_UNSATISFIED 0
+#define KVM_TRACE_MMIO_READ 1
+#define KVM_TRACE_MMIO_WRITE 2
+
+static bool kvm_io_done_event(struct event_format *event)
+{
+	return !strcmp(event->name, "kvm_io_done");
+}
+
+static bool mmio_event_begin(struct event_format *event, void *data,
+			     struct event_key *key)
+{
+	/* MMIO read begin in old kernel. */
+	if (kvm_exit_event(event))
+		return true;
+
+	/* MMIO write begin in old kernel. */
+	if (!strcmp(event->name, "kvm_mmio") &&
+	      raw_field_value(event, "type", data) == KVM_TRACE_MMIO_WRITE) {
+		mmio_event_get_key(event, data, key);
+		return true;
+	}
+
+	/* MMIO read/write begin in new kernel. */
+	if (!strcmp(event->name, "kvm_mmio_begin")) {
+		mmio_event_get_key(event, data, key);
+		return true;
+	}
+
+	return false;
+}
+
+static bool mmio_event_end(struct event_format *event,  void *data,
+			   struct event_key *key)
+{
+	/* MMIO write end in old kernel. */
+	if (kvm_entry_event(event))
+		return true;
+
+	/* MMIO read end in the old kernel.*/
+	if (!strcmp(event->name, "kvm_mmio") &&
+	      raw_field_value(event, "type", data) == KVM_TRACE_MMIO_READ) {
+		mmio_event_get_key(event, data, key);
+		return true;
+	}
+
+	/* MMIO read/write end event in the new kernel.*/
+	return kvm_io_done_event(event);
+}
+
+static void mmio_event_decode_key(struct event_key *key, char decode[20])
+{
+	snprintf(decode, 20, "%#lx:%s", (unsigned long)key->key,
+			      key->info == KVM_TRACE_MMIO_WRITE ? "W" : "R");
+}
+
+static struct kvm_events_ops mmio_events = {
+	.is_begin_event = mmio_event_begin,
+	.is_end_event = mmio_event_end,
+	.decode_key = mmio_event_decode_key,
+	.name = "MMIO Access"
+};
+
+/*
+ * For the old kernel, the time of emulation pio access is from kvm_pio to
+ * kvm_entry. In the new kernel, the end time is indicated by kvm_io_done.
+ */
+static void ioport_event_get_key(struct event_format *event, void *data,
+				 struct event_key *key)
+{
+	key->key = raw_field_value(event, "port", data);
+	key->info = raw_field_value(event, "rw", data);
+}
+
+static bool ioport_event_begin(struct event_format *event, void *data,
+			       struct event_key *key)
+{
+	if (!strcmp(event->name, "kvm_pio")) {
+		ioport_event_get_key(event, data, key);
+		return true;
+	}
+
+	return false;
+}
+
+static bool ioport_event_end(struct event_format *event, void *data __unused,
+			     struct event_key *key __unused)
+{
+	if (kvm_entry_event(event))
+		return true;
+
+	return kvm_io_done_event(event);
+}
+
+static void ioport_event_decode_key(struct event_key *key, char decode[20])
+{
+	snprintf(decode, 20, "%#llx:%s", (unsigned long long)key->key,
+		 key->info ? "POUT" : "PIN");
+}
+
+static struct kvm_events_ops ioport_events = {
+	.is_begin_event = ioport_event_begin,
+	.is_end_event = ioport_event_end,
+	.decode_key = ioport_event_decode_key,
+	.name = "IO Port Access"
+};
+
+static const char *report_event = "vmexit";
+struct kvm_events_ops *events_ops;
+
+static bool register_kvm_events_ops(void)
+{
+	bool ret = true;
+
+	if (!strcmp(report_event, "vmexit"))
+		events_ops = &exit_events;
+	else if (!strcmp(report_event, "mmio"))
+		events_ops = &mmio_events;
+	else if (!strcmp(report_event, "ioport"))
+		events_ops = &ioport_events;
+	else {
+		pr_err("Unknown report event:%s\n", report_event);
+		ret = false;
+	}
+
+	return ret;
+}
+
+struct event_stats {
+	u64 count;
+	u64 time;
+
+	/* used to calculate stddev. */
+	double mean;
+	double M2;
+};
+
+struct kvm_event {
+	struct list_head hash_entry;
+	struct rb_node rb;
+
+	struct event_key key;
+
+	struct event_stats total;
+
+	#define DEFAULT_VCPU_NUM 8
+	int max_vcpu;
+	struct event_stats *vcpu;
+};
+
+struct vcpu_event_record {
+	int vcpu_id;
+	u64 start_time;
+	struct kvm_event *last_event;
+};
+
+#define EVENTS_BITS			12
+#define EVENTS_CACHE_SIZE	(1UL << EVENTS_BITS)
+
+static u64 total_time;
+static u64 total_count;
+static struct list_head kvm_events_cache[EVENTS_CACHE_SIZE];
+
+static void init_kvm_event_record(void)
+{
+	int i;
+
+	for (i = 0; i < (int)EVENTS_CACHE_SIZE; i++)
+		INIT_LIST_HEAD(&kvm_events_cache[i]);
+}
+
+static int kvm_events_hash_fn(u64 key)
+{
+	return key & (EVENTS_CACHE_SIZE - 1);
+}
+
+static bool kvm_event_expand(struct kvm_event *event, int vcpu_id)
+{
+	int old_max_vcpu = event->max_vcpu;
+
+	if (vcpu_id < event->max_vcpu)
+		return true;
+
+	while (event->max_vcpu <= vcpu_id)
+		event->max_vcpu += DEFAULT_VCPU_NUM;
+
+	event->vcpu = realloc(event->vcpu,
+			      event->max_vcpu * sizeof(*event->vcpu));
+	if (!event->vcpu) {
+		pr_err("Not enough memory\n");
+		return false;
+	}
+
+	memset(event->vcpu + old_max_vcpu, 0,
+	       (event->max_vcpu - old_max_vcpu) * sizeof(*event->vcpu));
+	return true;
+}
+
+static struct kvm_event *kvm_alloc_init_event(struct event_key *key)
+{
+	struct kvm_event *event;
+
+	event = zalloc(sizeof(*event));
+	if (!event) {
+		pr_err("Not enough memory\n");
+		return NULL;
+	}
+
+	event->key = *key;
+	return event;
+}
+
+static struct kvm_event *find_create_kvm_event(struct event_key *key)
+{
+	struct kvm_event *event;
+	struct list_head *head;
+
+	BUG_ON(key->key == INVALID_KEY);
+
+	head = &kvm_events_cache[kvm_events_hash_fn(key->key)];
+	list_for_each_entry(event, head, hash_entry)
+		if (event->key.key == key->key && event->key.info == key->info)
+			return event;
+
+	event = kvm_alloc_init_event(key);
+	if (!event)
+		return NULL;
+
+	list_add(&event->hash_entry, head);
+	return event;
+}
+
+static void handle_begin_event(struct vcpu_event_record *vcpu_record,
+			       struct event_key *key, u64 timestamp)
+{
+	struct kvm_event *event = NULL;
+
+	if (key->key != INVALID_KEY)
+		event = find_create_kvm_event(key);
+
+	vcpu_record->last_event = event;
+	vcpu_record->start_time = timestamp;
+}
+
+static void update_event_stats(struct event_stats *stats, u64 time_diff)
+{
+	double delta;
+
+	stats->count++;
+	stats->time += time_diff;
+
+	delta = time_diff - stats->mean;
+	stats->mean += delta / stats->count;
+	stats->M2 += delta*(time_diff - stats->mean);
+}
+
+static double event_stats_stddev(int vcpu_id, struct kvm_event *event)
+{
+	struct event_stats *stats = &event->total;
+	double variance, variance_mean, stddev;
+
+	if (vcpu_id != -1)
+		stats = &event->vcpu[vcpu_id];
+
+	BUG_ON(!stats->count);
+
+	variance = stats->M2 / (stats->count - 1);
+	variance_mean = variance / stats->count;
+	stddev = sqrt(variance_mean);
+
+	return stddev * 100 / stats->mean;
+}
+
+static void update_kvm_event(struct kvm_event *event, int vcpu_id,
+			     u64 time_diff)
+{
+	update_event_stats(&event->total, time_diff);
+
+	if (!kvm_event_expand(event, vcpu_id))
+		return;
+
+	update_event_stats(&event->vcpu[vcpu_id], time_diff);
+}
+
+static void handle_end_event(struct vcpu_event_record *vcpu_record,
+			     struct event_key *key, u64 timestamp)
+{
+	struct kvm_event *event;
+	u64 time_begin, time_diff;
+
+	event = vcpu_record->last_event;
+	time_begin = vcpu_record->start_time;
+
+	/* The begin event is not caught. */
+	if (!time_begin)
+		return;
+
+	/*
+	 * In some case, the 'begin event' only records the start timestamp,
+	 * the actual event is recognized in the 'end event' (e.g. mmio-event
+	 * in the old kernel).
+	 */
+
+	/* Both begin and end events did not get the key. */
+	if (!event && key->key == INVALID_KEY)
+		return;
+
+	if (!event)
+		event = find_create_kvm_event(key);
+
+	if (!event)
+		return;
+
+	vcpu_record->last_event = NULL;
+	vcpu_record->start_time = 0;
+
+	BUG_ON(timestamp < time_begin);
+
+	time_diff = timestamp - time_begin;
+	update_kvm_event(event, vcpu_record->vcpu_id, time_diff);
+}
+
+static struct vcpu_event_record
+*per_vcpu_record(struct thread *thread, struct event_format *event, void *data)
+{
+	/* Only kvm_entry records vcpu id. */
+	if (!thread->priv && kvm_entry_event(event)) {
+		struct vcpu_event_record *vcpu_record;
+
+		vcpu_record = zalloc(sizeof(struct vcpu_event_record));
+		if (!vcpu_record) {
+			pr_err("Not enough memory\n");
+			return NULL;
+		}
+
+		vcpu_record->vcpu_id = raw_field_value(event, "vcpu_id", data);
+		thread->priv = vcpu_record;
+	}
+
+	return (struct vcpu_event_record *)thread->priv;
+}
+
+static void handle_kvm_event(struct thread *thread, struct event_format *event,
+			     void *data, u64 timestamp)
+{
+	struct vcpu_event_record *vcpu_record;
+	struct event_key key = {.key = INVALID_KEY};
+
+	vcpu_record = per_vcpu_record(thread, event, data);
+	if (!vcpu_record)
+		return;
+
+	if (events_ops->is_begin_event(event, data, &key))
+		return handle_begin_event(vcpu_record, &key, timestamp);
+
+	if (events_ops->is_end_event(event, data, &key))
+		return handle_end_event(vcpu_record, &key, timestamp);
+}
+
+typedef int (*key_cmp_fun)(struct kvm_event*, struct kvm_event*, int);
+struct kvm_event_key {
+	const char *name;
+	key_cmp_fun key;
+};
+
+static int trace_vcpu = -1;
+#define GET_EVENT_KEY(member)						\
+static u64 get_event_ ##member(struct kvm_event *event, int vcpu)	\
+{									\
+	if (vcpu == -1)							\
+		return event->total.member;				\
+									\
+	if (vcpu >= event->max_vcpu)					\
+		return 0;						\
+									\
+	return event->vcpu[vcpu].member;				\
+}
+
+#define COMPARE_EVENT_KEY(member)					\
+GET_EVENT_KEY(member)							\
+static int compare_kvm_event_ ## member(struct kvm_event *one,		\
+					struct kvm_event *two, int vcpu)\
+{									\
+	return get_event_ ##member(one, vcpu) >				\
+				get_event_ ##member(two, vcpu);		\
+}
+
+GET_EVENT_KEY(time);
+COMPARE_EVENT_KEY(count);
+COMPARE_EVENT_KEY(mean);
+
+#define DEF_SORT_NAME_KEY(name, compare_key)	\
+	{ #name, compare_kvm_event_ ## compare_key }
+
+static struct kvm_event_key keys[] = {
+	DEF_SORT_NAME_KEY(sample, count),
+	DEF_SORT_NAME_KEY(time, mean),
+	{ NULL, NULL }
+};
+
+static const char *sort_key = "sample";
+static key_cmp_fun compare;
+
+static bool select_key(void)
+{
+	int i;
+
+	for (i = 0; keys[i].name; i++) {
+		if (!strcmp(keys[i].name, sort_key)) {
+			compare = keys[i].key;
+			return true;
+		}
+	}
+
+	pr_err("Unknown compare key:%s\n", sort_key);
+	return false;
+}
+
+static struct rb_root result;
+static void insert_to_result(struct kvm_event *event, key_cmp_fun bigger,
+			     int vcpu)
+{
+	struct rb_node **rb = &result.rb_node;
+	struct rb_node *parent = NULL;
+	struct kvm_event *p;
+
+	while (*rb) {
+		p = container_of(*rb, struct kvm_event, rb);
+		parent = *rb;
+
+		if (bigger(event, p, vcpu))
+			rb = &(*rb)->rb_left;
+		else
+			rb = &(*rb)->rb_right;
+	}
+
+	rb_link_node(&event->rb, parent, rb);
+	rb_insert_color(&event->rb, &result);
+}
+
+static void update_total_count(struct kvm_event *event, int vcpu)
+{
+	total_count += get_event_count(event, vcpu);
+	total_time += get_event_time(event, vcpu);
+}
+
+static bool event_is_valid(struct kvm_event *event, int vcpu)
+{
+	return !!get_event_count(event, vcpu);
+}
+
+static void sort_result(int vcpu)
+{
+	unsigned int i;
+	struct kvm_event *event;
+
+	for (i = 0; i < EVENTS_CACHE_SIZE; i++)
+		list_for_each_entry(event, &kvm_events_cache[i], hash_entry)
+			if (event_is_valid(event, vcpu)) {
+				update_total_count(event, vcpu);
+				insert_to_result(event, compare, vcpu);
+			}
+}
+
+/* returns left most element of result, and erase it */
+static struct kvm_event *pop_from_result(void)
+{
+	struct rb_node *node = result.rb_node;
+
+	if (!node)
+		return NULL;
+
+	while (node->rb_left)
+		node = node->rb_left;
+
+	rb_erase(node, &result);
+	return container_of(node, struct kvm_event, rb);
+}
+
+static void print_vcpu_info(int vcpu)
+{
+	pr_info("Analyze events for ");
+
+	if (vcpu == -1)
+		pr_info("all VCPUs:\n\n");
+	else
+		pr_info("VCPU %d:\n\n", vcpu);
+}
+
+static void print_result(int vcpu)
+{
+	char decode[20];
+	struct kvm_event *event;
+
+	pr_info("\n\n");
+	print_vcpu_info(vcpu);
+	pr_info("%20s ", events_ops->name);
+	pr_info("%10s ", "Samples");
+	pr_info("%9s ", "Samples%");
+
+	pr_info("%9s ", "Time%");
+	pr_info("%16s ", "Avg time");
+	pr_info("\n\n");
+
+	while ((event = pop_from_result())) {
+		u64 ecount, etime;
+
+		ecount = get_event_count(event, vcpu);
+		etime = get_event_time(event, vcpu);
+
+		events_ops->decode_key(&event->key, decode);
+		pr_info("%20s ", decode);
+		pr_info("%10llu ", (unsigned long long)ecount);
+		pr_info("%8.2f%% ", (double)ecount / total_count * 100);
+		pr_info("%8.2f%% ", (double)etime / total_time * 100);
+		pr_info("%9.2fus ( +-%7.2f%% )", (double)etime / ecount/1e3,
+			event_stats_stddev(trace_vcpu, event));
+		pr_info("\n");
+	}
+
+	pr_info("\nTotal Samples:%lld, Total events handled time:%.2fus.\n\n",
+		(unsigned long long)total_count, total_time / 1e3);
+}
+
+static int process_sample_event(struct perf_tool *tool __used,
+				union perf_event *event,
+				struct perf_sample *sample,
+				struct perf_evsel *evsel,
+				struct machine *machine)
+{
+	struct thread *thread = machine__findnew_thread(machine, sample->tid);
+
+	if (thread == NULL) {
+		pr_debug("problem processing %d event, skipping it.\n",
+			event->header.type);
+		return -1;
+	}
+
+	handle_kvm_event(thread, evsel->tp_format, sample->raw_data,
+			 sample->time);
+	return 0;
+}
+
+static struct perf_tool eops = {
+	.sample			= process_sample_event,
+	.comm			= perf_event__process_comm,
+	.ordered_samples	= true,
+};
+
+static int get_cpu_isa(struct perf_session *session)
+{
+	char *cpuid;
+	int isa;
+
+	cpuid = perf_header__read_feature(session, HEADER_CPUID);
+
+	if (!cpuid) {
+		pr_err("read HEADER_CPUID failed.\n");
+		return -ENOTSUP;
+	}
+
+	if (strstr(cpuid, "Intel"))
+		isa = 1;
+	else if (strstr(cpuid, "AMD"))
+		isa = 0;
+	else {
+		pr_err("CPU %s is not supported.\n", cpuid);
+		isa = -ENOTSUP;
+	}
+
+	free(cpuid);
+	return isa;
+}
+
+static const char *file_name;
+
+static int read_events(void)
+{
+	struct perf_session *kvm_session;
+	int ret;
+
+	kvm_session = perf_session__new(file_name, O_RDONLY, 0, false, &eops);
+	if (!kvm_session) {
+		pr_err("Initializing perf session failed\n");
+		return -EINVAL;
+	}
+
+	if (!perf_session__has_traces(kvm_session, "kvm record"))
+		return -EINVAL;
+
+	/*
+	 * Do not use 'isa' recorded in kvm_exit tracepoint since it is not
+	 * traced in the old kernel.
+	 */
+	ret = get_cpu_isa(kvm_session);
+
+	if (ret < 0)
+		return ret;
+
+	cpu_isa = ret;
+
+	return perf_session__process_events(kvm_session, &eops);
+}
+
+static bool verify_vcpu(int vcpu)
+{
+	if (vcpu != -1 && vcpu < 0) {
+		pr_err("Invalid vcpu:%d.\n", vcpu);
+		return false;
+	}
+
+	return true;
+}
+
+static int kvm_events_report_vcpu(int vcpu)
+{
+	int ret = -EINVAL;
+
+	if (!verify_vcpu(vcpu))
+		goto exit;
+
+	if (!select_key())
+		goto exit;
+
+	if (!register_kvm_events_ops())
+		goto exit;
+
+	init_kvm_event_record();
+	setup_pager();
+
+	ret = read_events();
+	if (ret)
+		goto exit;
+
+	sort_result(vcpu);
+	print_result(vcpu);
+exit:
+	return ret;
+}
+
+static const char * const record_args[] = {
+	"record",
+	"-R",
+	"-f",
+	"-m", "1024",
+	"-c", "1",
+	"-e", "kvm:kvm_entry",
+	"-e", "kvm:kvm_exit",
+	"-e", "kvm:kvm_mmio",
+	"-e", "kvm:kvm_pio",
+};
+
+static const char * const new_event[] = {
+	"kvm_mmio_begin",
+	"kvm_io_done"
+};
+
+static bool kvm_record_specified_guest(int argc, const char **argv)
+{
+	int i;
+
+	for (i = 0; i < argc; i++)
+		if (!strcmp(argv[i], "-p") || !strcmp(argv[i], "--pid"))
+			return true;
+
+	return false;
+}
+
+#define STRDUP_FAIL_EXIT(s)		\
+	({	char *_p;		\
+		_p = strdup(s);		\
+		if (!_p)		\
+			return -ENOMEM;	\
+		_p;			\
+	})
+
+static int kvm_events_record(int argc, const char **argv)
+{
+	unsigned int rec_argc, i, j;
+	const char **rec_argv;
+
+	rec_argc = ARRAY_SIZE(record_args) + argc + 2;
+	rec_argc += ARRAY_SIZE(new_event) * 2;
+	rec_argv = calloc(rec_argc + 1, sizeof(char *));
+
+	if (rec_argv == NULL)
+		return -ENOMEM;
+
+	for (i = 0; i < ARRAY_SIZE(record_args); i++)
+		rec_argv[i] = STRDUP_FAIL_EXIT(record_args[i]);
+
+	/*
+	 * Append "-a" only if "-p"/"--pid" is not specified since they
+	 * are mutually exclusive.
+	 */
+	if (!kvm_record_specified_guest(argc, argv))
+		rec_argv[i++] = STRDUP_FAIL_EXIT("-a");
+
+	rec_argv[i++] = STRDUP_FAIL_EXIT("-o");
+	rec_argv[i++] = STRDUP_FAIL_EXIT(file_name);
+
+	for (j = 0; j < ARRAY_SIZE(new_event); j++)
+		if (is_valid_tracepoint(new_event[j])) {
+			char event[256];
+
+			sprintf(event, "kvm:%s", new_event[j]);
+
+			rec_argv[i++] = STRDUP_FAIL_EXIT("-e");
+			rec_argv[i++] = STRDUP_FAIL_EXIT(event);
+		}
+
+	for (j = 1; j < (unsigned int)argc; j++, i++)
+		rec_argv[i] = argv[j];
+
+	return cmd_record(i, rec_argv, NULL);
+}
+
+static const char * const kvm_events_report_usage[] = {
+	"perf kvm stat report [<options>]",
+	NULL
+};
+
+static const struct option kvm_events_report_options[] = {
+	OPT_STRING(0, "event", &report_event, "report event",
+		    "event for reporting: vmexit, mmio, ioport"),
+	OPT_INTEGER(0, "vcpu", &trace_vcpu,
+		    "vcpu id to report"),
+	OPT_STRING('k', "key", &sort_key, "sort-key",
+		    "key for sorting: sample(sort by samples number)"
+		    " time (sort by avg time)"),
+	OPT_END()
+};
+
+static int kvm_events_report(int argc, const char **argv)
+{
+	symbol__init();
+
+	if (argc) {
+		argc = parse_options(argc, argv,
+				     kvm_events_report_options,
+				     kvm_events_report_usage, 0);
+		if (argc)
+			usage_with_options(kvm_events_report_usage,
+					   kvm_events_report_options);
+	}
+
+	return kvm_events_report_vcpu(trace_vcpu);
+}
+
+static int kvm_cmd_stat(int argc, const char **argv)
+{
+	if (argc > 1) {
+		if (!strncmp(argv[1], "rec", 3))
+			return kvm_events_record(argc - 1, argv + 1);
+
+		if (!strncmp(argv[1], "rep", 3))
+			return kvm_events_report(argc - 1 , argv + 1);
+	}
+
+	return cmd_stat(argc, argv, NULL);
+}
+
 static char			name_buffer[256];
 
 static const char * const kvm_usage[] = {
-	"perf kvm [<options>] {top|record|report|diff|buildid-list}",
+	"perf kvm [<options>] {top|record|report|diff|buildid-list|stat}",
 	NULL
 };
 
@@ -135,6 +1016,8 @@ int cmd_kvm(int argc, const char **argv, const char *prefix __used)
 		return cmd_top(argc, argv, NULL);
 	else if (!strncmp(argv[0], "buildid-list", 12))
 		return __cmd_buildid_list(argc, argv);
+	else if (!strncmp(argv[0], "stat", 4))
+		return kvm_cmd_stat(argc, argv);
 	else
 		usage_with_options(kvm_usage, kvm_options);
 
diff --git a/tools/perf/util/header.c b/tools/perf/util/header.c
index 1e5b6aa..c684aba 100644
--- a/tools/perf/util/header.c
+++ b/tools/perf/util/header.c
@@ -1504,9 +1504,15 @@ static int process_build_id(struct perf_file_section *section,
 	return 0;
 }
 
+static char *read_cpuid(struct perf_header *ph, int fd)
+{
+	return do_read_string(fd, ph);
+}
+
 struct feature_ops {
 	int (*write)(int fd, struct perf_header *h, struct perf_evlist *evlist);
 	void (*print)(struct perf_header *h, int fd, FILE *fp);
+	char *(*read)(struct perf_header *h, int fd);
 	int (*process)(struct perf_file_section *section,
 		       struct perf_header *h, int feat, int fd, void *data);
 	const char *name;
@@ -1521,6 +1527,9 @@ struct feature_ops {
 #define FEAT_OPF(n, func) \
 	[n] = { .name = #n, .write = write_##func, .print = print_##func, \
 		.full_only = true }
+#define FEAT_OPA_R(n, func) \
+	[n] = { .name = #n, .write = write_##func, .print = print_##func, \
+		.read  = read_##func }
 
 /* feature_ops not implemented: */
 #define print_tracing_data	NULL
@@ -1535,7 +1544,7 @@ static const struct feature_ops feat_ops[HEADER_LAST_FEATURE] = {
 	FEAT_OPA(HEADER_ARCH,		arch),
 	FEAT_OPA(HEADER_NRCPUS,		nrcpus),
 	FEAT_OPA(HEADER_CPUDESC,	cpudesc),
-	FEAT_OPA(HEADER_CPUID,		cpuid),
+	FEAT_OPA_R(HEADER_CPUID,	cpuid),
 	FEAT_OPA(HEADER_TOTAL_MEM,	total_mem),
 	FEAT_OPA(HEADER_EVENT_DESC,	event_desc),
 	FEAT_OPA(HEADER_CMDLINE,	cmdline),
@@ -1589,6 +1598,49 @@ int perf_header__fprintf_info(struct perf_session *session, FILE *fp, bool full)
 	return 0;
 }
 
+struct header_read_data {
+	int feat;
+	char *result;
+};
+
+static int perf_file_section__read_feature(struct perf_file_section *section,
+					   struct perf_header *ph,
+					   int feat, int fd, void *data)
+{
+	struct header_read_data *hd = data;
+
+	if (feat != hd->feat)
+		return 0;
+
+	if (lseek(fd, section->offset, SEEK_SET) == (off_t)-1) {
+		pr_debug("Failed to lseek to %" PRIu64 " offset for feature "
+				"%d, continuing...\n", section->offset, feat);
+		return 0;
+	}
+
+	if (feat >= HEADER_LAST_FEATURE) {
+		pr_warning("unknown feature %d\n", feat);
+		return 0;
+	}
+
+	hd->result = feat_ops[feat].read(ph, fd);
+	return 0;
+}
+
+char *perf_header__read_feature(struct perf_session *session, int feat)
+{
+	struct perf_header *header = &session->header;
+	struct header_read_data hd;
+	int fd = session->fd;
+
+	hd.feat = feat;
+	hd.result = NULL;
+
+	perf_header__process_sections(header, fd, &hd,
+				  perf_file_section__read_feature);
+	return hd.result;
+}
+
 static int do_write_feat(int fd, struct perf_header *h, int type,
 			 struct perf_file_section **p,
 			 struct perf_evlist *evlist)
diff --git a/tools/perf/util/header.h b/tools/perf/util/header.h
index 24962e7..3b65aca 100644
--- a/tools/perf/util/header.h
+++ b/tools/perf/util/header.h
@@ -93,6 +93,7 @@ int perf_header__process_sections(struct perf_header *header, int fd,
 				  int feat, int fd, void *data));
 
 int perf_header__fprintf_info(struct perf_session *s, FILE *fp, bool full);
+char *perf_header__read_feature(struct perf_session *session, int feat);
 
 int build_id_cache__add_s(const char *sbuild_id, const char *debugdir,
 			  const char *name, bool is_kallsyms);
diff --git a/tools/perf/util/thread.h b/tools/perf/util/thread.h
index 70c2c13..f66610b 100644
--- a/tools/perf/util/thread.h
+++ b/tools/perf/util/thread.h
@@ -16,6 +16,8 @@ struct thread {
 	bool			comm_set;
 	char			*comm;
 	int			comm_len;
+
+	void			*priv;
 };
 
 struct machine;
-- 
1.7.2.5


^ permalink raw reply related	[flat|nested] 26+ messages in thread

end of thread, other threads:[~2012-09-14 11:52 UTC | newest]

Thread overview: 26+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2012-08-27  9:51 [PATCH v7 0/3] KVM: perf: kvm events analysis tool Dong Hao
2012-08-27  9:51 ` [PATCH v7 1/3] KVM: x86: export svm/vmx exit code and vector code to userspace Dong Hao
2012-09-03 11:13   ` Avi Kivity
2012-09-04  3:53     ` Xiao Guangrong
2012-08-27  9:51 ` [PATCH v7 2/3] KVM: x86: trace mmio begin and complete Dong Hao
2012-09-03 11:07   ` Avi Kivity
2012-09-04  4:06     ` Xiao Guangrong
2012-08-27  9:51 ` [PATCH v7 3/3] KVM: perf: kvm events analysis tool Dong Hao
2012-08-27 15:53   ` Andrew Jones
2012-08-27 19:34     ` David Ahern
2012-08-28  6:35       ` Andrew Jones
2012-08-28 17:19         ` David Ahern
2012-09-02 13:51     ` don
2012-08-30 18:29   ` David Ahern
2012-09-03  8:48     ` don
2012-09-03 16:04       ` David Ahern
2012-09-13  4:56     ` David Ahern
2012-09-13 13:45       ` Arnaldo Carvalho de Melo
2012-09-13 14:14         ` David Ahern
2012-09-13 14:31           ` Arnaldo Carvalho de Melo
2012-09-14  2:56       ` Xiao Guangrong
2012-09-14 11:51         ` David Ahern
2012-08-27  9:59 ` [PATCH v7 0/3] " Xiao Guangrong
2012-08-27 12:53   ` David Ahern
  -- strict thread matches above, loose matches on Subject: below --
2012-08-24  1:15 Dong Hao
2012-08-24  1:15 ` [PATCH v7 3/3] " Dong Hao
2012-08-24 17:53   ` David Ahern

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).