linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [RFC][PATCH] KVM: perf: a smart tool to analyse kvm events
@ 2012-01-16  9:30 Xiao Guangrong
  2012-01-16  9:31 ` [PATCH 1/3] KVM: trace mmio read event properly Xiao Guangrong
                   ` (4 more replies)
  0 siblings, 5 replies; 25+ messages in thread
From: Xiao Guangrong @ 2012-01-16  9:30 UTC (permalink / raw)
  To: Avi Kivity; +Cc: Marcelo Tosatti, LKML, KVM

This tool is very like xenoprof(if i remember correctly), and traces kvm events
smartly. currently, it supports vmexit/mmio/ioport events.

Usage:
- to trace kvm events:
# ./perf kvm-events record

- show the result
# ./perf kvm-events report

Some output are as follow:
# ./perf kvm-events report
  Warning: Error: expected type 5 but read 4
  Warning: Error: expected type 5 but read 0
  Warning: unknown op '}'


Analyze events for all VCPUs:

             VM-EXIT    Samples      Samples%         Time%        Avg time

         APIC_ACCESS     438107        44.89%         6.20%        17.91us
  EXTERNAL_INTERRUPT     219226        22.46%         8.01%        46.20us
      IO_INSTRUCTION     122651        12.57%         1.88%        19.44us
       EPT_VIOLATION      83110         8.52%         1.36%        20.75us
   PENDING_INTERRUPT      37055         3.80%         0.16%         5.38us
               CPUID      32718         3.35%         0.08%         3.15us
       EXCEPTION_NMI      23601         2.42%         0.17%         8.87us
                 HLT      15424         1.58%        82.12%      6735.06us
           CR_ACCESS       4089         0.42%         0.02%         6.08us

Total Samples:975981, Total events handled time:126502464.88us.

The default event to be analysed is vmexit, we can use --event to specify it,
for example, if we want to trace mmio event:
# ./perf kvm-events report --event mmio
  Warning: Error: expected type 5 but read 4
  Warning: Error: expected type 5 but read 0
  Warning: unknown op '}'


Analyze events for all VCPUs:

         MMIO Access    Samples      Samples%         Time%        Avg time

        0xfee00380:W     196589        64.95%        70.01%         3.83us
        0xfee00310:W      35356        11.68%         6.48%         1.97us
        0xfee00300:W      35356        11.68%        16.37%         4.97us
        0xfee00300:R      35356        11.68%         7.14%         2.17us

Total Samples:302657, Total events handled time:1074746.01us.

We can use --vcpu to specify which vcpu is traced:
root@localhost perf]# ./perf kvm-events report --event mmio --vcpu 1
  Warning: Error: expected type 5 but read 4
  Warning: Error: expected type 5 but read 0
  Warning: unknown op '}'


Analyze events for VCPU 1:

         MMIO Access    Samples      Samples%         Time%        Avg time

        0xfee00380:W      58041        71.20%        74.90%         3.70us
        0xfee00310:W       7826         9.60%         5.28%         1.93us
        0xfee00300:W       7826         9.60%        13.82%         5.06us
        0xfee00300:R       7826         9.60%         6.01%         2.20us

Total Samples:81519, Total events handled time:286577.81us.

And, '--key' is used to sort the result, the possible value sample (default,
the result is sorted by samples number), time(the result is sorted by time%):
# ./perf kvm-events report --key time
  Warning: Error: expected type 5 but read 4
  Warning: Error: expected type 5 but read 0
  Warning: unknown op '}'


Analyze events for all VCPUs:

             VM-EXIT    Samples      Samples%         Time%        Avg time

                 HLT      15424         1.58%        82.12%      6735.06us
  EXTERNAL_INTERRUPT     219226        22.46%         8.01%        46.20us
         APIC_ACCESS     438107        44.89%         6.20%        17.91us
      IO_INSTRUCTION     122651        12.57%         1.88%        19.44us
       EPT_VIOLATION      83110         8.52%         1.36%        20.75us
       EXCEPTION_NMI      23601         2.42%         0.17%         8.87us
   PENDING_INTERRUPT      37055         3.80%         0.16%         5.38us
               CPUID      32718         3.35%         0.08%         3.15us
           CR_ACCESS       4089         0.42%         0.02%         6.08us

Total Samples:975981, Total events handled time:126502464.88us.

I hope guys will like it and any comments are welcome! :)


^ permalink raw reply	[flat|nested] 25+ messages in thread

* [PATCH 1/3] KVM: trace mmio read event properly
  2012-01-16  9:30 [RFC][PATCH] KVM: perf: a smart tool to analyse kvm events Xiao Guangrong
@ 2012-01-16  9:31 ` Xiao Guangrong
  2012-01-16 10:18   ` Avi Kivity
  2012-01-16  9:32 ` [PATCH 2/3] KVM: improve trace events of vmexit/mmio/ioport Xiao Guangrong
                   ` (3 subsequent siblings)
  4 siblings, 1 reply; 25+ messages in thread
From: Xiao Guangrong @ 2012-01-16  9:31 UTC (permalink / raw)
  To: Xiao Guangrong; +Cc: Avi Kivity, Marcelo Tosatti, LKML, KVM

In current code, we use KVM_TRACE_MMIO_READ to trace mmio read event which
only can be completed immediately, instead of it, we trace the time when
read event occur, then cooperate with then later patch, we can know the time
of mmio read emulation

Signed-off-by: Xiao Guangrong <xiaoguangrong@linux.vnet.ibm.com>
---
 arch/x86/kvm/x86.c |    6 ++----
 1 files changed, 2 insertions(+), 4 deletions(-)

diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c
index c95ca2d..0d41cfc 100644
--- a/arch/x86/kvm/x86.c
+++ b/arch/x86/kvm/x86.c
@@ -3432,7 +3432,6 @@ static int vcpu_mmio_read(struct kvm_vcpu *vcpu, gpa_t addr, int len, void *v)
 		      !kvm_iodevice_read(&vcpu->arch.apic->dev, addr, n, v))
 		    && kvm_io_bus_read(vcpu->kvm, KVM_MMIO_BUS, addr, n, v))
 			break;
-		trace_kvm_mmio(KVM_TRACE_MMIO_READ, n, addr, *(u64 *)v);
 		handled += n;
 		addr += n;
 		len -= n;
@@ -3658,8 +3657,6 @@ static int read_prepare(struct kvm_vcpu *vcpu, void *val, int bytes)
 {
 	if (vcpu->mmio_read_completed) {
 		memcpy(val, vcpu->mmio_data, bytes);
-		trace_kvm_mmio(KVM_TRACE_MMIO_READ, bytes,
-			       vcpu->mmio_phys_addr, *(u64 *)val);
 		vcpu->mmio_read_completed = 0;
 		return 1;
 	}
@@ -3681,7 +3678,6 @@ static int write_emulate(struct kvm_vcpu *vcpu, gpa_t gpa,

 static int write_mmio(struct kvm_vcpu *vcpu, gpa_t gpa, int bytes, void *val)
 {
-	trace_kvm_mmio(KVM_TRACE_MMIO_WRITE, bytes, gpa, *(u64 *)val);
 	return vcpu_mmio_write(vcpu, gpa, bytes, val);
 }

@@ -3744,6 +3740,8 @@ mmio:
 	/*
 	 * Is this MMIO handled locally?
 	 */
+	trace_kvm_mmio(write ? KVM_TRACE_MMIO_WRITE : KVM_TRACE_MMIO_READ,
+			bytes, gpa, *(u64 *)val);
 	handled = ops->read_write_mmio(vcpu, gpa, bytes, val);
 	if (handled == bytes)
 		return X86EMUL_CONTINUE;
-- 
1.7.7.5


^ permalink raw reply related	[flat|nested] 25+ messages in thread

* [PATCH 2/3] KVM: improve trace events of vmexit/mmio/ioport
  2012-01-16  9:30 [RFC][PATCH] KVM: perf: a smart tool to analyse kvm events Xiao Guangrong
  2012-01-16  9:31 ` [PATCH 1/3] KVM: trace mmio read event properly Xiao Guangrong
@ 2012-01-16  9:32 ` Xiao Guangrong
  2012-01-16  9:38   ` Avi Kivity
  2012-01-16  9:32 ` [PATCH 3/3] KVM: perf: kvm events analysis tool Xiao Guangrong
                   ` (2 subsequent siblings)
  4 siblings, 1 reply; 25+ messages in thread
From: Xiao Guangrong @ 2012-01-16  9:32 UTC (permalink / raw)
  To: Xiao Guangrong; +Cc: Avi Kivity, Marcelo Tosatti, LKML, KVM

- trace vcpu_id for these events
- add kvm_mmio_done to trace the time when mmio/ioport emulation is completed

Signed-off-by: Xiao Guangrong <xiaoguangrong@linux.vnet.ibm.com>
---
 arch/x86/kvm/trace.h       |   33 ++++++++++++++++++++++++++-------
 arch/x86/kvm/x86.c         |   19 +++++++++++++------
 include/trace/events/kvm.h |   32 +++++++++++++++++++++++++-------
 3 files changed, 64 insertions(+), 20 deletions(-)

diff --git a/arch/x86/kvm/trace.h b/arch/x86/kvm/trace.h
index 911d264..e556458 100644
--- a/arch/x86/kvm/trace.h
+++ b/arch/x86/kvm/trace.h
@@ -91,11 +91,12 @@ TRACE_EVENT(kvm_hv_hypercall,
  * Tracepoint for PIO.
  */
 TRACE_EVENT(kvm_pio,
-	TP_PROTO(unsigned int rw, unsigned int port, unsigned int size,
-		 unsigned int count),
-	TP_ARGS(rw, port, size, count),
+	TP_PROTO(unsigned int vcpu_id, unsigned int rw, unsigned int port,
+		 unsigned int size, unsigned int count),
+	TP_ARGS(vcpu_id, rw, port, size, count),

 	TP_STRUCT__entry(
+		__field(	unsigned int,	vcpu_id		)
 		__field(	unsigned int, 	rw		)
 		__field(	unsigned int, 	port		)
 		__field(	unsigned int, 	size		)
@@ -103,17 +104,33 @@ TRACE_EVENT(kvm_pio,
 	),

 	TP_fast_assign(
+		__entry->vcpu_id	= vcpu_id;
 		__entry->rw		= rw;
 		__entry->port		= port;
 		__entry->size		= size;
 		__entry->count		= count;
 	),

-	TP_printk("pio_%s at 0x%x size %d count %d",
-		  __entry->rw ? "write" : "read",
+	TP_printk("vcpu %u pio_%s at 0x%x size %d count %d",
+		  __entry->vcpu_id, __entry->rw ? "write" : "read",
 		  __entry->port, __entry->size, __entry->count)
 );

+TRACE_EVENT(kvm_pio_done,
+	TP_PROTO(unsigned int vcpu_id),
+	TP_ARGS(vcpu_id),
+
+	TP_STRUCT__entry(
+		__field(	unsigned int,	vcpu_id		)
+	),
+
+	TP_fast_assign(
+		__entry->vcpu_id	= vcpu_id;
+	),
+
+	TP_printk("vcpu %u", __entry->vcpu_id)
+);
+
 /*
  * Tracepoint for cpuid.
  */
@@ -280,6 +297,7 @@ TRACE_EVENT(kvm_exit,
 	TP_ARGS(exit_reason, vcpu, isa),

 	TP_STRUCT__entry(
+		__field(	unsigned int,	vcpu_id		)
 		__field(	unsigned int,	exit_reason	)
 		__field(	unsigned long,	guest_rip	)
 		__field(	u32,	        isa             )
@@ -288,6 +306,7 @@ TRACE_EVENT(kvm_exit,
 	),

 	TP_fast_assign(
+		__entry->vcpu_id	= vcpu->vcpu_id;
 		__entry->exit_reason	= exit_reason;
 		__entry->guest_rip	= kvm_rip_read(vcpu);
 		__entry->isa            = isa;
@@ -295,8 +314,8 @@ TRACE_EVENT(kvm_exit,
 					   &__entry->info2);
 	),

-	TP_printk("reason %s rip 0x%lx info %llx %llx",
-		 (__entry->isa == KVM_ISA_VMX) ?
+	TP_printk("vcpu %u reason %s rip 0x%lx info %llx %llx",
+		 __entry->vcpu_id, (__entry->isa == KVM_ISA_VMX) ?
 		 __print_symbolic(__entry->exit_reason, VMX_EXIT_REASONS) :
 		 __print_symbolic(__entry->exit_reason, SVM_EXIT_REASONS),
 		 __entry->guest_rip, __entry->info1, __entry->info2)
diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c
index 0d41cfc..cf54478 100644
--- a/arch/x86/kvm/x86.c
+++ b/arch/x86/kvm/x86.c
@@ -3684,7 +3684,8 @@ static int write_mmio(struct kvm_vcpu *vcpu, gpa_t gpa, int bytes, void *val)
 static int read_exit_mmio(struct kvm_vcpu *vcpu, gpa_t gpa,
 			  void *val, int bytes)
 {
-	trace_kvm_mmio(KVM_TRACE_MMIO_READ_UNSATISFIED, bytes, gpa, 0);
+	trace_kvm_mmio(vcpu->vcpu_id, KVM_TRACE_MMIO_READ_UNSATISFIED,
+		       bytes, gpa, 0);
 	return X86EMUL_IO_NEEDED;
 }

@@ -3740,11 +3741,14 @@ mmio:
 	/*
 	 * Is this MMIO handled locally?
 	 */
-	trace_kvm_mmio(write ? KVM_TRACE_MMIO_WRITE : KVM_TRACE_MMIO_READ,
-			bytes, gpa, *(u64 *)val);
+	trace_kvm_mmio(vcpu->vcpu_id,
+		       write ? KVM_TRACE_MMIO_WRITE : KVM_TRACE_MMIO_READ,
+		       bytes, gpa, *(u64 *)val);
 	handled = ops->read_write_mmio(vcpu, gpa, bytes, val);
-	if (handled == bytes)
+	if (handled == bytes) {
+		trace_kvm_mmio_done(vcpu->vcpu_id);
 		return X86EMUL_CONTINUE;
+	}

 	gpa += handled;
 	bytes -= handled;
@@ -3902,7 +3906,7 @@ static int emulator_pio_in_out(struct kvm_vcpu *vcpu, int size,
 			       unsigned short port, void *val,
 			       unsigned int count, bool in)
 {
-	trace_kvm_pio(!in, port, size, count);
+	trace_kvm_pio(vcpu->vcpu_id, !in, port, size, count);

 	vcpu->arch.pio.port = port;
 	vcpu->arch.pio.in = in;
@@ -3910,6 +3914,7 @@ static int emulator_pio_in_out(struct kvm_vcpu *vcpu, int size,
 	vcpu->arch.pio.size = size;

 	if (!kernel_pio(vcpu, vcpu->arch.pio_data)) {
+		trace_kvm_mmio_done(vcpu->vcpu_id);
 		vcpu->arch.pio.count = 0;
 		return 1;
 	}
@@ -5399,7 +5404,7 @@ static int complete_mmio(struct kvm_vcpu *vcpu)
 			return 0;
 		}
 		if (vcpu->mmio_is_write)
-			return 1;
+			goto exit;
 		vcpu->mmio_read_completed = 1;
 	}
 	vcpu->srcu_idx = srcu_read_lock(&vcpu->kvm->srcu);
@@ -5407,6 +5412,8 @@ static int complete_mmio(struct kvm_vcpu *vcpu)
 	srcu_read_unlock(&vcpu->kvm->srcu, vcpu->srcu_idx);
 	if (r != EMULATE_DONE)
 		return 0;
+exit:
+	trace_kvm_mmio_done(vcpu->vcpu_id);
 	return 1;
 }

diff --git a/include/trace/events/kvm.h b/include/trace/events/kvm.h
index 46e3cd8..20b9f17 100644
--- a/include/trace/events/kvm.h
+++ b/include/trace/events/kvm.h
@@ -152,28 +152,46 @@ TRACE_EVENT(kvm_ack_irq,
 	{ KVM_TRACE_MMIO_WRITE, "write" }

 TRACE_EVENT(kvm_mmio,
-	TP_PROTO(int type, int len, u64 gpa, u64 val),
-	TP_ARGS(type, len, gpa, val),
+	TP_PROTO(unsigned int vcpu_id, int type, int len, u64 gpa, u64 val),
+	TP_ARGS(vcpu_id, type, len, gpa, val),

 	TP_STRUCT__entry(
-		__field(	u32,	type		)
-		__field(	u32,	len		)
-		__field(	u64,	gpa		)
-		__field(	u64,	val		)
+		__field(	unsigned int,	vcpu_id	)
+		__field(	u32,		type	)
+		__field(	u32,		len	)
+		__field(	u64,		gpa	)
+		__field(	u64,		val	)
 	),

 	TP_fast_assign(
+		__entry->vcpu_id	= vcpu_id;
 		__entry->type		= type;
 		__entry->len		= len;
 		__entry->gpa		= gpa;
 		__entry->val		= val;
 	),

-	TP_printk("mmio %s len %u gpa 0x%llx val 0x%llx",
+	TP_printk("vcpu %u mmio %s len %u gpa 0x%llx val 0x%llx",
+		  __entry->vcpu_id,
 		  __print_symbolic(__entry->type, kvm_trace_symbol_mmio),
 		  __entry->len, __entry->gpa, __entry->val)
 );

+TRACE_EVENT(kvm_mmio_done,
+	TP_PROTO(unsigned int vcpu_id),
+	TP_ARGS(vcpu_id),
+
+	TP_STRUCT__entry(
+		__field(	unsigned int,	vcpu_id		)
+	),
+
+	TP_fast_assign(
+		__entry->vcpu_id	= vcpu_id;
+	),
+
+	TP_printk("vcpu %u", __entry->vcpu_id)
+);
+
 #define kvm_fpu_load_symbol	\
 	{0, "unload"},		\
 	{1, "load"}
-- 
1.7.7.5


^ permalink raw reply related	[flat|nested] 25+ messages in thread

* [PATCH 3/3] KVM: perf: kvm events analysis tool
  2012-01-16  9:30 [RFC][PATCH] KVM: perf: a smart tool to analyse kvm events Xiao Guangrong
  2012-01-16  9:31 ` [PATCH 1/3] KVM: trace mmio read event properly Xiao Guangrong
  2012-01-16  9:32 ` [PATCH 2/3] KVM: improve trace events of vmexit/mmio/ioport Xiao Guangrong
@ 2012-01-16  9:32 ` Xiao Guangrong
  2012-01-16 10:04   ` Avi Kivity
  2012-01-16 10:08   ` Stefan Hajnoczi
  2012-01-16 10:11 ` [RFC][PATCH] KVM: perf: a smart tool to analyse kvm events Avi Kivity
  2012-01-16 22:53 ` David Ahern
  4 siblings, 2 replies; 25+ messages in thread
From: Xiao Guangrong @ 2012-01-16  9:32 UTC (permalink / raw)
  To: Xiao Guangrong; +Cc: Avi Kivity, Marcelo Tosatti, LKML, KVM

Add 'perf kvm-events' support to analyze kvm vmexit/mmio/ioport smartly

Usage:
	perf kvm-events record
	perf kvm-events report

Signed-off-by: Xiao Guangrong <xiaoguangrong@linux.vnet.ibm.com>
---
 tools/perf/Documentation/perf-kvm-events.txt |   54 ++
 tools/perf/Makefile                          |    1 +
 tools/perf/builtin-kvm-events.c              |  860 ++++++++++++++++++++++++++
 tools/perf/builtin.h                         |    1 +
 tools/perf/perf.c                            |    1 +
 5 files changed, 917 insertions(+), 0 deletions(-)
 create mode 100644 tools/perf/Documentation/perf-kvm-events.txt
 create mode 100644 tools/perf/builtin-kvm-events.c

diff --git a/tools/perf/Documentation/perf-kvm-events.txt b/tools/perf/Documentation/perf-kvm-events.txt
new file mode 100644
index 0000000..73bcb82
--- /dev/null
+++ b/tools/perf/Documentation/perf-kvm-events.txt
@@ -0,0 +1,54 @@
+perf-kvm-events(1)
+============
+
+NAME
+----
+perf-kvm-events - Analyze kvm events
+
+SYNOPSIS
+--------
+[verse]
+'perf kvm-events' {record|report}
+
+DESCRIPTION
+-----------
+You can analyze some crucial events and statistics with this
+'perf kvm-events' command.
+
+  'perf kvm-events record <command>' records kvm events
+  between start and end <command>. And this command
+  produces the file "perf.data" which contains tracing
+  results of kvm events.
+
+  'perf kvm-events report' reports statistical data.
+
+COMMON OPTIONS
+--------------
+
+-i::
+--input=<file>::
+        Input file name. (default: perf.data unless stdin is a fifo)
+
+-v::
+--verbose::
+        Be more verbose (show symbol address, etc).
+
+-D::
+--dump-raw-trace::
+        Dump raw trace in ASCII.
+
+REPORT OPTIONS
+--------------
+--vcpu=<value>::
+	analyze events which occures on this vcpu
+
+--events=<value>::
+	events to be analyzed. Possible values: vmexit, mmio, ioport.
+-k::
+--key=<value>::
+        Sorting key. Possible values: sample(default, sort by samples number),
+time(sort by time%).
+
+SEE ALSO
+--------
+linkperf:perf[1]
diff --git a/tools/perf/Makefile b/tools/perf/Makefile
index ac86d67..ee43451 100644
--- a/tools/perf/Makefile
+++ b/tools/perf/Makefile
@@ -382,6 +382,7 @@ BUILTIN_OBJS += $(OUTPUT)builtin-probe.o
 BUILTIN_OBJS += $(OUTPUT)builtin-kmem.o
 BUILTIN_OBJS += $(OUTPUT)builtin-lock.o
 BUILTIN_OBJS += $(OUTPUT)builtin-kvm.o
+BUILTIN_OBJS += $(OUTPUT)builtin-kvm-events.o
 BUILTIN_OBJS += $(OUTPUT)builtin-test.o
 BUILTIN_OBJS += $(OUTPUT)builtin-inject.o

diff --git a/tools/perf/builtin-kvm-events.c b/tools/perf/builtin-kvm-events.c
new file mode 100644
index 0000000..55dc680
--- /dev/null
+++ b/tools/perf/builtin-kvm-events.c
@@ -0,0 +1,860 @@
+#include "builtin.h"
+#include "perf.h"
+#include "util/util.h"
+#include "util/cache.h"
+#include "util/symbol.h"
+#include "util/thread.h"
+#include "util/header.h"
+#include "util/parse-options.h"
+#include "util/trace-event.h"
+#include "util/debug.h"
+#include "util/session.h"
+#include "util/tool.h"
+
+#include <linux/hash.h>
+
+/*
+ * Todo: improve the print format of kvm_exit to let it is easier
+ * parsed by perf, then we can get the exit reason from print
+ * format directly.
+ */
+#define EXIT_REASON_EXCEPTION_NMI       0
+#define EXIT_REASON_EXTERNAL_INTERRUPT  1
+#define EXIT_REASON_TRIPLE_FAULT        2
+
+#define EXIT_REASON_PENDING_INTERRUPT   7
+#define EXIT_REASON_NMI_WINDOW		8
+#define EXIT_REASON_TASK_SWITCH         9
+#define EXIT_REASON_CPUID               10
+#define EXIT_REASON_HLT                 12
+#define EXIT_REASON_INVD                13
+#define EXIT_REASON_INVLPG              14
+#define EXIT_REASON_RDPMC               15
+#define EXIT_REASON_RDTSC               16
+#define EXIT_REASON_VMCALL              18
+#define EXIT_REASON_VMCLEAR             19
+#define EXIT_REASON_VMLAUNCH            20
+#define EXIT_REASON_VMPTRLD             21
+#define EXIT_REASON_VMPTRST             22
+#define EXIT_REASON_VMREAD              23
+#define EXIT_REASON_VMRESUME            24
+#define EXIT_REASON_VMWRITE             25
+#define EXIT_REASON_VMOFF               26
+#define EXIT_REASON_VMON                27
+#define EXIT_REASON_CR_ACCESS           28
+#define EXIT_REASON_DR_ACCESS           29
+#define EXIT_REASON_IO_INSTRUCTION      30
+#define EXIT_REASON_MSR_READ            31
+#define EXIT_REASON_MSR_WRITE           32
+#define EXIT_REASON_INVALID_STATE	33
+#define EXIT_REASON_MWAIT_INSTRUCTION   36
+#define EXIT_REASON_MONITOR_INSTRUCTION 39
+#define EXIT_REASON_PAUSE_INSTRUCTION   40
+#define EXIT_REASON_MCE_DURING_VMENTRY	 41
+#define EXIT_REASON_TPR_BELOW_THRESHOLD 43
+#define EXIT_REASON_APIC_ACCESS         44
+#define EXIT_REASON_EPT_VIOLATION       48
+#define EXIT_REASON_EPT_MISCONFIG       49
+#define EXIT_REASON_WBINVD		54
+#define EXIT_REASON_XSETBV		55
+
+#define SVM_EXIT_READ_CR0	0x000
+#define SVM_EXIT_READ_CR3	0x003
+#define SVM_EXIT_READ_CR4	0x004
+#define SVM_EXIT_READ_CR8	0x008
+#define SVM_EXIT_WRITE_CR0	0x010
+#define SVM_EXIT_WRITE_CR3	0x013
+#define SVM_EXIT_WRITE_CR4	0x014
+#define SVM_EXIT_WRITE_CR8	0x018
+#define SVM_EXIT_READ_DR0	0x020
+#define SVM_EXIT_READ_DR1	0x021
+#define SVM_EXIT_READ_DR2	0x022
+#define SVM_EXIT_READ_DR3	0x023
+#define SVM_EXIT_READ_DR4	0x024
+#define SVM_EXIT_READ_DR5	0x025
+#define SVM_EXIT_READ_DR6	0x026
+#define SVM_EXIT_READ_DR7	0x027
+#define SVM_EXIT_WRITE_DR0	0x030
+#define SVM_EXIT_WRITE_DR1	0x031
+#define SVM_EXIT_WRITE_DR2	0x032
+#define SVM_EXIT_WRITE_DR3	0x033
+#define SVM_EXIT_WRITE_DR4	0x034
+#define SVM_EXIT_WRITE_DR5	0x035
+#define SVM_EXIT_WRITE_DR6	0x036
+#define SVM_EXIT_WRITE_DR7	0x037
+#define SVM_EXIT_EXCP_BASE	0x040
+#define SVM_EXIT_INTR		0x060
+#define SVM_EXIT_NMI		0x061
+#define SVM_EXIT_SMI		0x062
+#define SVM_EXIT_INIT		0x063
+#define SVM_EXIT_VINTR		0x064
+#define SVM_EXIT_CR0_SEL_WRITE	0x065
+#define SVM_EXIT_IDTR_READ	0x066
+#define SVM_EXIT_GDTR_READ	0x067
+#define SVM_EXIT_LDTR_READ	0x068
+#define SVM_EXIT_TR_READ	0x069
+#define SVM_EXIT_IDTR_WRITE	0x06a
+#define SVM_EXIT_GDTR_WRITE	0x06b
+#define SVM_EXIT_LDTR_WRITE	0x06c
+#define SVM_EXIT_TR_WRITE	0x06d
+#define SVM_EXIT_RDTSC	0x06e
+#define SVM_EXIT_RDPMC	0x06f
+#define SVM_EXIT_PUSHF	0x070
+#define SVM_EXIT_POPF	0x071
+#define SVM_EXIT_CPUID	0x072
+#define SVM_EXIT_RSM	0x073
+#define SVM_EXIT_IRET	0x074
+#define SVM_EXIT_SWINT	0x075
+#define SVM_EXIT_INVD	0x076
+#define SVM_EXIT_PAUSE	0x077
+#define SVM_EXIT_HLT	0x078
+#define SVM_EXIT_INVLPG	0x079
+#define SVM_EXIT_INVLPGA	0x07a
+#define SVM_EXIT_IOIO	0x07b
+#define SVM_EXIT_MSR	0x07c
+#define SVM_EXIT_TASK_SWITCH	0x07d
+#define SVM_EXIT_FERR_FREEZE	0x07e
+#define SVM_EXIT_SHUTDOWN	0x07f
+#define SVM_EXIT_VMRUN	0x080
+#define SVM_EXIT_VMMCALL	0x081
+#define SVM_EXIT_VMLOAD	0x082
+#define SVM_EXIT_VMSAVE	0x083
+#define SVM_EXIT_STGI	0x084
+#define SVM_EXIT_CLGI	0x085
+#define SVM_EXIT_SKINIT	0x086
+#define SVM_EXIT_RDTSCP	0x087
+#define SVM_EXIT_ICEBP	0x088
+#define SVM_EXIT_WBINVD	0x089
+#define SVM_EXIT_MONITOR	0x08a
+#define SVM_EXIT_MWAIT	0x08b
+#define SVM_EXIT_MWAIT_COND	0x08c
+#define SVM_EXIT_XSETBV	0x08d
+#define SVM_EXIT_NPF	0x400
+
+#define DE_VECTOR 0
+#define DB_VECTOR 1
+#define BP_VECTOR 3
+#define OF_VECTOR 4
+#define BR_VECTOR 5
+#define UD_VECTOR 6
+#define NM_VECTOR 7
+#define DF_VECTOR 8
+#define TS_VECTOR 10
+#define NP_VECTOR 11
+#define SS_VECTOR 12
+#define GP_VECTOR 13
+#define PF_VECTOR 14
+#define MF_VECTOR 16
+#define MC_VECTOR 18
+
+
+#define VMX_EXIT_REASONS \
+	{ EXIT_REASON_EXCEPTION_NMI,		"EXCEPTION_NMI" }, \
+	{ EXIT_REASON_EXTERNAL_INTERRUPT,	"EXTERNAL_INTERRUPT" }, \
+	{ EXIT_REASON_TRIPLE_FAULT,		"TRIPLE_FAULT" }, \
+	{ EXIT_REASON_PENDING_INTERRUPT,	"PENDING_INTERRUPT" }, \
+	{ EXIT_REASON_NMI_WINDOW,		"NMI_WINDOW" }, \
+	{ EXIT_REASON_TASK_SWITCH,		"TASK_SWITCH" }, \
+	{ EXIT_REASON_CPUID,			"CPUID" }, \
+	{ EXIT_REASON_HLT,			"HLT" }, \
+	{ EXIT_REASON_INVLPG,			"INVLPG" }, \
+	{ EXIT_REASON_RDPMC,			"RDPMC" }, \
+	{ EXIT_REASON_RDTSC,			"RDTSC" }, \
+	{ EXIT_REASON_VMCALL,			"VMCALL" }, \
+	{ EXIT_REASON_VMCLEAR,			"VMCLEAR" }, \
+	{ EXIT_REASON_VMLAUNCH,			"VMLAUNCH" }, \
+	{ EXIT_REASON_VMPTRLD,			"VMPTRLD" }, \
+	{ EXIT_REASON_VMPTRST,			"VMPTRST" }, \
+	{ EXIT_REASON_VMREAD,			"VMREAD" }, \
+	{ EXIT_REASON_VMRESUME,			"VMRESUME" }, \
+	{ EXIT_REASON_VMWRITE,			"VMWRITE" }, \
+	{ EXIT_REASON_VMOFF,			"VMOFF" }, \
+	{ EXIT_REASON_VMON,			"VMON" }, \
+	{ EXIT_REASON_CR_ACCESS,		"CR_ACCESS" }, \
+	{ EXIT_REASON_DR_ACCESS,		"DR_ACCESS" }, \
+	{ EXIT_REASON_IO_INSTRUCTION,		"IO_INSTRUCTION" }, \
+	{ EXIT_REASON_MSR_READ,			"MSR_READ" }, \
+	{ EXIT_REASON_MSR_WRITE,		"MSR_WRITE" }, \
+	{ EXIT_REASON_MWAIT_INSTRUCTION,	"MWAIT_INSTRUCTION" }, \
+	{ EXIT_REASON_MONITOR_INSTRUCTION,	"MONITOR_INSTRUCTION" }, \
+	{ EXIT_REASON_PAUSE_INSTRUCTION,	"PAUSE_INSTRUCTION" }, \
+	{ EXIT_REASON_MCE_DURING_VMENTRY,	"MCE_DURING_VMENTRY" }, \
+	{ EXIT_REASON_TPR_BELOW_THRESHOLD,	"TPR_BELOW_THRESHOLD" },\
+	{ EXIT_REASON_APIC_ACCESS,		"APIC_ACCESS" }, \
+	{ EXIT_REASON_EPT_VIOLATION,		"EPT_VIOLATION" }, \
+	{ EXIT_REASON_EPT_MISCONFIG,		"EPT_MISCONFIG" }, \
+	{ EXIT_REASON_WBINVD,			"WBINVD" }
+
+#define SVM_EXIT_REASONS \
+	{ SVM_EXIT_READ_CR0,			"read_cr0" }, \
+	{ SVM_EXIT_READ_CR3,			"read_cr3" }, \
+	{ SVM_EXIT_READ_CR4,			"read_cr4" }, \
+	{ SVM_EXIT_READ_CR8,			"read_cr8" }, \
+	{ SVM_EXIT_WRITE_CR0,			"write_cr0" }, \
+	{ SVM_EXIT_WRITE_CR3,			"write_cr3" }, \
+	{ SVM_EXIT_WRITE_CR4,			"write_cr4" }, \
+	{ SVM_EXIT_WRITE_CR8,			"write_cr8" }, \
+	{ SVM_EXIT_READ_DR0,			"read_dr0" }, \
+	{ SVM_EXIT_READ_DR1,			"read_dr1" }, \
+	{ SVM_EXIT_READ_DR2,			"read_dr2" }, \
+	{ SVM_EXIT_READ_DR3,			"read_dr3" }, \
+	{ SVM_EXIT_WRITE_DR0,			"write_dr0" }, \
+	{ SVM_EXIT_WRITE_DR1,			"write_dr1" }, \
+	{ SVM_EXIT_WRITE_DR2,			"write_dr2" }, \
+	{ SVM_EXIT_WRITE_DR3,			"write_dr3" }, \
+	{ SVM_EXIT_WRITE_DR5,			"write_dr5" }, \
+	{ SVM_EXIT_WRITE_DR7,			"write_dr7" }, \
+	{ SVM_EXIT_EXCP_BASE + DB_VECTOR,	"DB excp" }, \
+	{ SVM_EXIT_EXCP_BASE + BP_VECTOR,	"BP excp" }, \
+	{ SVM_EXIT_EXCP_BASE + UD_VECTOR,	"UD excp" }, \
+	{ SVM_EXIT_EXCP_BASE + PF_VECTOR,	"PF excp" }, \
+	{ SVM_EXIT_EXCP_BASE + NM_VECTOR,	"NM excp" }, \
+	{ SVM_EXIT_EXCP_BASE + MC_VECTOR,	"MC excp" }, \
+	{ SVM_EXIT_INTR,			"interrupt" }, \
+	{ SVM_EXIT_NMI,				"nmi" }, \
+	{ SVM_EXIT_SMI,				"smi" }, \
+	{ SVM_EXIT_INIT,			"init" }, \
+	{ SVM_EXIT_VINTR,			"vintr" }, \
+	{ SVM_EXIT_CPUID,			"cpuid" }, \
+	{ SVM_EXIT_INVD,			"invd" }, \
+	{ SVM_EXIT_HLT,				"hlt" }, \
+	{ SVM_EXIT_INVLPG,			"invlpg" }, \
+	{ SVM_EXIT_INVLPGA,			"invlpga" }, \
+	{ SVM_EXIT_IOIO,			"io" }, \
+	{ SVM_EXIT_MSR,				"msr" }, \
+	{ SVM_EXIT_TASK_SWITCH,			"task_switch" }, \
+	{ SVM_EXIT_SHUTDOWN,			"shutdown" }, \
+	{ SVM_EXIT_VMRUN,			"vmrun" }, \
+	{ SVM_EXIT_VMMCALL,			"hypercall" }, \
+	{ SVM_EXIT_VMLOAD,			"vmload" }, \
+	{ SVM_EXIT_VMSAVE,			"vmsave" }, \
+	{ SVM_EXIT_STGI,			"stgi" }, \
+	{ SVM_EXIT_CLGI,			"clgi" }, \
+	{ SVM_EXIT_SKINIT,			"skinit" }, \
+	{ SVM_EXIT_WBINVD,			"wbinvd" }, \
+	{ SVM_EXIT_MONITOR,			"monitor" }, \
+	{ SVM_EXIT_MWAIT,			"mwait" }, \
+	{ SVM_EXIT_XSETBV,			"xsetbv" }, \
+	{ SVM_EXIT_NPF,				"npf" }
+
+struct exit_reasons_table {
+	unsigned long exit_code;
+	const char *reason;
+};
+
+struct exit_reasons_table vmx_exit_reasons[] = {
+	VMX_EXIT_REASONS
+};
+
+struct exit_reasons_table svm_exit_reasons[] = {
+	SVM_EXIT_REASONS
+};
+
+static const char *get_exit_reason(long isa, u64 exit_code)
+{
+	int table_size = ARRAY_SIZE(svm_exit_reasons);
+	struct exit_reasons_table *table = svm_exit_reasons;
+
+
+	if (isa == 1) {
+		table = vmx_exit_reasons;
+		table_size = ARRAY_SIZE(vmx_exit_reasons);
+	}
+
+	while (table_size--) {
+		if (table->exit_code == exit_code)
+			return table->reason;
+		table++;
+	}
+
+	die("unkonw kvm exit code:%ld on %s\n", exit_code, isa == 1 ?
+						"VMX" : "SVM");
+}
+
+struct event_key {
+	u64 key;
+	int info;
+};
+
+struct kvm_events_ops {
+	bool (*is_begain_event)(struct event *event, void *data);
+	bool (*is_end_event)(struct event *event);
+	struct event_key (*get_key)(struct event *event, void *data);
+	void (*decode_key)(struct event_key *key, char decode[20]);
+	const char *name;
+};
+
+static bool exit_event_begain(struct event *event, void *data __unused)
+{
+	return !strcmp(event->name, "kvm_exit");
+}
+
+static bool exit_event_end(struct event *event)
+{
+	return !strcmp(event->name, "kvm_entry");
+}
+
+static struct event_key exit_event_get_key(struct event *event, void *data)
+{
+	struct event_key key;
+
+	key.key = raw_field_value(event, "exit_reason", data);
+	key.info = raw_field_value(event, "isa", data);
+	return key;
+}
+
+static void exit_event_decode_key(struct event_key *key, char decode[20])
+{
+	const char *exit_reason = get_exit_reason(key->info, key->key);
+
+	memset(decode, 0, 20);
+	strncpy(decode, exit_reason, 20);
+}
+
+static struct kvm_events_ops exit_events = {
+	.is_begain_event = exit_event_begain,
+	.is_end_event = exit_event_end,
+	.get_key = exit_event_get_key,
+	.decode_key = exit_event_decode_key,
+	.name = "VM-EXIT"
+};
+
+#define KVM_TRACE_MMIO_READ_UNSATISFIED 0
+#define KVM_TRACE_MMIO_READ 1
+#define KVM_TRACE_MMIO_WRITE 2
+static bool mmio_event_begain(struct event *event, void *data)
+{
+	if (!strcmp(event->name, "kvm_mmio")) {
+		long type = raw_field_value(event, "type", data);
+
+		if (type != KVM_TRACE_MMIO_READ_UNSATISFIED)
+			return true;
+	};
+
+	return false;
+}
+
+static bool mmio_event_end(struct event *event)
+{
+	return !strcmp(event->name, "kvm_mmio_done");
+}
+
+static struct event_key mmio_event_get_key(struct event *event, void *data)
+{
+	struct event_key key;
+
+	key.key = raw_field_value(event, "gpa", data);
+	key.info = raw_field_value(event, "type", data);
+
+	return key;
+}
+
+static void mmio_event_decode_key(struct event_key *key, char decode[20])
+{
+	memset(decode, 0, 20);
+	sprintf(decode, "%#lx:%s", key->key,
+		key->info == KVM_TRACE_MMIO_READ ? "R" : "W");
+}
+
+static struct kvm_events_ops mmio_events = {
+	.is_begain_event = mmio_event_begain,
+	.is_end_event = mmio_event_end,
+	.get_key = mmio_event_get_key,
+	.decode_key = mmio_event_decode_key,
+	.name = "MMIO Access"
+};
+
+static bool ioport_event_begain(struct event *event, void *data __unused)
+{
+	return !strcmp(event->name, "kvm_pio");
+}
+
+static struct event_key ioport_event_get_key(struct event *event, void *data)
+{
+	struct event_key key;
+
+	key.key = raw_field_value(event, "port", data);
+	key.info = raw_field_value(event, "rw", data);
+
+	return key;
+}
+
+static void ioport_event_decode_key(struct event_key *key, char decode[20])
+{
+	memset(decode, 0, 20);
+	sprintf(decode, "%#lx:%s", key->key, key->info ? "POUT" : "PIN");
+}
+
+static struct kvm_events_ops ioport_events = {
+	.is_begain_event = ioport_event_begain,
+	.is_end_event = mmio_event_end,
+	.get_key = ioport_event_get_key,
+	.decode_key = ioport_event_decode_key,
+	.name = "IO Port Access"
+};
+
+static const char *report_event = "vmexit";
+struct kvm_events_ops *events_ops;
+
+static void register_kvm_events_ops(void)
+{
+	if (!strcmp(report_event, "vmexit"))
+		events_ops = &exit_events;
+	else if (!strcmp(report_event, "mmio"))
+		events_ops = &mmio_events;
+	else if (!strcmp(report_event, "ioport"))
+		events_ops = &ioport_events;
+	else
+		die("Unknown report event:%s\n", report_event);
+}
+
+#define KVM_MAX_VCPUS 254
+
+struct kvm_event {
+	struct list_head hash_entry;
+	struct rb_node rb;
+
+	u64 count;
+	u64 time;
+	struct {
+		u64 count;
+		u64 time;
+	} vcpu[KVM_MAX_VCPUS];
+
+	struct event_key key;
+};
+
+struct {
+	u64 start_time;
+	struct kvm_event *last_event;
+} vcpu_event_record[KVM_MAX_VCPUS] ;
+
+#define EVENTS_BITS			12
+#define EVENTS_CACHE_SIZE	(1UL << EVENTS_BITS)
+
+static u64 total_time;
+static u64 total_count;
+static struct list_head kvm_events_cache[EVENTS_CACHE_SIZE];
+
+static void init_kvm_event_record(void)
+{
+	int i;
+
+	for (i = 0; i < KVM_MAX_VCPUS; i++)
+		vcpu_event_record[i].last_event = NULL;
+
+	for (i = 0; i < (int)EVENTS_CACHE_SIZE; i++)
+		INIT_LIST_HEAD(&kvm_events_cache[i]);
+}
+
+static int kvm_events_hash_fn(u64 key)
+{
+	return hash_long((unsigned long)key, EVENTS_BITS);
+}
+
+static void kvm_event_init(struct kvm_event *event, struct event_key *key)
+{
+	event->key = *key;
+}
+
+static struct kvm_event *find_create_kvm_event(struct event_key *key)
+{
+	struct kvm_event *event;
+	struct list_head *head;
+
+	head = &kvm_events_cache[kvm_events_hash_fn(key->key)];
+	list_for_each_entry(event, head, hash_entry)
+		if (event->key.key == key->key && event->key.info == key->info)
+			return event;
+
+	event = zalloc(sizeof(*event));
+	if (!event)
+		die("Not enough memory\n");
+
+	kvm_event_init(event, key);
+	list_add(&event->hash_entry, head);
+	return event;
+}
+
+static void handle_begain_event(struct event_key *key, int vcpu_id,
+				u64 timestamp)
+{
+	struct kvm_event *event = find_create_kvm_event(key);
+
+	vcpu_event_record[vcpu_id].last_event = event;
+	vcpu_event_record[vcpu_id].start_time = timestamp;
+}
+
+static void update_kvm_event(struct kvm_event *event, int vcpu_id,
+			     u64 time_diff)
+{
+	event->count++;
+	event->time += time_diff;
+	event->vcpu[vcpu_id].count++;
+	event->vcpu[vcpu_id].time += time_diff;
+}
+
+static void handle_end_event(int vcpu_id, u64 timestamp)
+{
+	u64 time_begain, time_diff;
+	struct kvm_event *event;
+
+	event = vcpu_event_record[vcpu_id].last_event;
+	if (!event)
+		return;
+
+	time_begain = vcpu_event_record[vcpu_id].start_time;
+
+	vcpu_event_record[vcpu_id].last_event = NULL;
+	vcpu_event_record[vcpu_id].start_time = 0;
+
+	BUG_ON(!time_begain || timestamp < time_begain);
+
+	time_diff = timestamp - time_begain;
+	update_kvm_event(event, vcpu_id, time_diff);
+}
+
+static int get_vcpu_id(struct event *event, void *data)
+{
+	return raw_field_value(event, "vcpu_id", data);
+}
+
+static void handle_kvm_event(struct event *event, void *data, u64 timestamp)
+{
+	int vcpu_id;
+
+	if (events_ops->is_begain_event(event, data)) {
+		struct event_key key = events_ops->get_key(event, data);
+
+		vcpu_id = get_vcpu_id(event, data);
+		return handle_begain_event(&key, vcpu_id, timestamp);
+	}
+
+	if (events_ops->is_end_event(event)) {
+		vcpu_id = get_vcpu_id(event, data);
+		return handle_end_event(vcpu_id, timestamp);
+	}
+}
+
+typedef int (*key_cmp_fun)(struct kvm_event*, struct kvm_event*, int);
+struct kvm_event_key {
+	const char *name;
+	key_cmp_fun key;
+};
+
+static int trace_vcpu = -1;
+#define COMPARE_EVENT_KEY(member)					\
+static u64 get_event_ ##member(struct kvm_event *event, int vcpu)	\
+{									\
+	if (vcpu == -1)							\
+		return event->member;					\
+									\
+	return event->vcpu[vcpu].member;				\
+}									\
+									\
+static int compare_kvm_event_ ## member(struct kvm_event *one,		\
+					struct kvm_event *two, int vcpu)\
+{									\
+	return get_event_ ##member(one, vcpu) > get_event_ ##member(two, vcpu);\
+}
+
+COMPARE_EVENT_KEY(count);
+COMPARE_EVENT_KEY(time);
+
+#define DEF_SORT_NAME_KEY(name, compare_key)	\
+	{ #name, compare_kvm_event_ ## compare_key }
+
+static struct kvm_event_key keys[] = {
+	DEF_SORT_NAME_KEY(sample, count),
+	DEF_SORT_NAME_KEY(time, time),
+	{ NULL, NULL }
+};
+
+static const char *sort_key = "sample";
+static key_cmp_fun compare;
+
+static void select_key(void)
+{
+	int i;
+
+	for (i = 0; keys[i].name; i++) {
+		if (!strcmp(keys[i].name, sort_key)) {
+			compare = keys[i].key;
+			return;
+		}
+	}
+
+	die("Unknown compare key:%s\n", sort_key);
+}
+
+static struct rb_root result;
+static void insert_to_result(struct kvm_event *event, key_cmp_fun bigger,
+			     int vcpu)
+{
+	struct rb_node **rb = &result.rb_node;
+	struct rb_node *parent = NULL;
+	struct kvm_event *p;
+
+	while (*rb) {
+		p = container_of(*rb, struct kvm_event, rb);
+		parent = *rb;
+
+		if (bigger(event, p, vcpu))
+			rb = &(*rb)->rb_left;
+		else
+			rb = &(*rb)->rb_right;
+	}
+
+	rb_link_node(&event->rb, parent, rb);
+	rb_insert_color(&event->rb, &result);
+}
+
+static void update_total_count(struct kvm_event *event, int vcpu)
+{
+	total_count += get_event_count(event, vcpu);
+	total_time += get_event_time(event, vcpu);
+}
+
+static bool event_is_valid(struct kvm_event *event, int vcpu)
+{
+	return get_event_count(event, vcpu);
+}
+
+static void sort_result(int vcpu)
+{
+	unsigned int i;
+	struct kvm_event *event;
+
+	for (i = 0; i < EVENTS_CACHE_SIZE; i++)
+		list_for_each_entry(event, &kvm_events_cache[i], hash_entry)
+			if (event_is_valid(event, vcpu)) {
+				update_total_count(event, vcpu);
+				insert_to_result(event, compare, vcpu);
+			}
+}
+
+/* returns left most element of result, and erase it */
+static struct kvm_event *pop_from_result(void)
+{
+	struct rb_node *node = result.rb_node;
+
+	if (!node)
+		return NULL;
+
+	while (node->rb_left)
+		node = node->rb_left;
+
+	rb_erase(node, &result);
+	return container_of(node, struct kvm_event, rb);
+}
+
+static void print_vcpu_info(int vcpu)
+{
+	pr_info("Analyze events for ");
+
+	if (vcpu == -1)
+		pr_info("all VCPUs:\n\n");
+	else
+		pr_info("VCPU %d:\n\n", vcpu);
+}
+
+static void print_result(int vcpu)
+{
+	char decode[20];
+	struct kvm_event *event;
+
+	pr_info("\n\n");
+	print_vcpu_info(vcpu);
+	pr_info("%20s ", events_ops->name);
+	pr_info("%10s ", "Samples");
+	pr_info("%13s ", "Samples%");
+
+	pr_info("%13s ", "Time%");
+	pr_info("%15s ", "Avg time");
+	pr_info("\n\n");
+
+	while ((event = pop_from_result())) {
+		u64 ecount, etime;
+
+		ecount = get_event_count(event, vcpu);
+		etime = get_event_time(event, vcpu);
+
+		events_ops->decode_key(&event->key, decode);
+		pr_info("%20s ", decode);
+		pr_info("%10lu ", ecount);
+		pr_info("%12.2f%% ", (double)ecount / total_count * 100);
+		pr_info("%12.2f%%", (double)etime / total_time * 100);
+		pr_info("%13.2fus ", (double)etime / ecount/1e3);
+		pr_info("\n");
+	}
+
+	pr_info("\nTotal Samples:%ld, Total events handled time:%.2fus.\n\n",
+		total_count, total_time / 1e3);
+}
+
+static void process_raw_event(void *data, int __unused cpu, u64 timestamp,
+			      struct thread *thread __unused)
+{
+	struct event *event;
+	int type;
+
+	type = trace_parse_common_type(data);
+	event = trace_find_event(type);
+
+	return handle_kvm_event(event, data, timestamp);
+}
+
+static int process_sample_event(struct perf_tool *tool __used,
+				union perf_event *event,
+				struct perf_sample *sample,
+				struct perf_evsel *evsel __used,
+				struct machine *machine)
+{
+	struct thread *thread = machine__findnew_thread(machine, sample->tid);
+
+	if (thread == NULL) {
+		pr_debug("problem processing %d event, skipping it.\n",
+			event->header.type);
+		return -1;
+	}
+
+	process_raw_event(sample->raw_data, sample->cpu, sample->time, thread);
+
+	return 0;
+}
+
+static struct perf_tool eops = {
+	.sample			= process_sample_event,
+	.comm			= perf_event__process_comm,
+	.ordered_samples	= true,
+};
+
+static char const *input_name = "perf.data";
+
+static int read_events(void)
+{
+	struct perf_session *session;
+
+	session = perf_session__new(input_name, O_RDONLY, 0, false, &eops);
+	if (!session)
+		die("Initializing perf session failed\n");
+
+	if (!perf_session__has_traces(session, "kvm record"))
+		return -1;
+
+	return perf_session__process_events(session, &eops);
+}
+
+static void verify_vcpu(int vcpu)
+{
+	if (vcpu != -1 && (vcpu > KVM_MAX_VCPUS || vcpu < 0))
+		die("Unknow vcpu:%d.\n", vcpu);
+
+}
+
+static int kvm_events_report(int vcpu)
+{
+	verify_vcpu(vcpu);
+	select_key();
+	register_kvm_events_ops();
+	init_kvm_event_record();
+	setup_pager();
+	read_events();
+	sort_result(vcpu);
+	print_result(vcpu);
+	return 0;
+}
+
+static const char * const record_args[] = {
+	"record",
+	"-a",
+	"-R",
+	"-f",
+	"-m", "1024",
+	"-c", "1",
+	"-e", "kvm:kvm_entry",
+	"-e", "kvm:kvm_exit",
+	"-e", "kvm:kvm_mmio",
+	"-e", "kvm:kvm_mmio_done",
+	"-e", "kvm:kvm_pio"
+};
+
+static int kvm_events_record(int argc, const char **argv)
+{
+	unsigned int rec_argc, i, j;
+	const char **rec_argv;
+
+	rec_argc = ARRAY_SIZE(record_args) + argc - 1;
+	rec_argv = calloc(rec_argc + 1, sizeof(char *));
+
+	if (rec_argv == NULL)
+		return -ENOMEM;
+
+	for (i = 0; i < ARRAY_SIZE(record_args); i++)
+		rec_argv[i] = strdup(record_args[i]);
+
+	for (j = 1; j < (unsigned int)argc; j++, i++)
+		rec_argv[i] = argv[j];
+
+	BUG_ON(i != rec_argc);
+
+	return cmd_record(i, rec_argv, NULL);
+}
+
+static const char * const kvm_events_report_usage[] = {
+	"perf kvm events report [<options>]",
+	NULL
+};
+
+static const struct option kvm_events_report_options[] = {
+	OPT_STRING(0, "event", &report_event, "reprot event",
+		    "event for reporting: vmexit, mmio, ioport"),
+	OPT_INTEGER(0, "vcpu", &trace_vcpu,
+		    "vcpu id to report"),
+	OPT_STRING('k', "key", &sort_key, "sort-key",
+		    "key for sorting: sample(sort by samples number)"
+		    " time (sort by time%)"),
+	OPT_END()
+};
+
+static const char * const kvm_events_usage[] = {
+	"perf kvm events [<options>] {record|report}",
+	NULL
+};
+
+static const struct option kvm_events_options[] = {
+	OPT_STRING('i', "input", &input_name, "file", "input file name"),
+	OPT_INCR('v', "verbose", &verbose,
+		 "be more verbose (show symbol address, etc)"),
+	OPT_BOOLEAN('D', "dump-raw-trace", &dump_trace,
+		 "dump raw trace in ASCII"),
+	OPT_END()
+};
+
+int cmd_kvm_events(int argc, const char **argv, const char *prefix __used)
+{
+	argc = parse_options(argc, argv, kvm_events_options, kvm_events_usage,
+			     PARSE_OPT_STOP_AT_NON_OPTION);
+	if (!argc)
+		usage_with_options(kvm_events_usage, kvm_events_options);
+
+	symbol__init();
+
+	if (!strncmp(argv[0], "rec", 3))
+		return kvm_events_record(argc, argv);
+
+	if (!strncmp(argv[0], "report", 6)) {
+		if (argc) {
+			argc = parse_options(argc, argv,
+					     kvm_events_report_options,
+					     kvm_events_report_usage, 0);
+			if (argc)
+				usage_with_options(kvm_events_report_usage,
+						   kvm_events_report_options);
+		}
+		return kvm_events_report(trace_vcpu);
+	}
+
+	usage_with_options(kvm_events_usage, kvm_events_options);
+	return 0;
+}
diff --git a/tools/perf/builtin.h b/tools/perf/builtin.h
index b382bd5..fb19e3d 100644
--- a/tools/perf/builtin.h
+++ b/tools/perf/builtin.h
@@ -33,6 +33,7 @@ extern int cmd_probe(int argc, const char **argv, const char *prefix);
 extern int cmd_kmem(int argc, const char **argv, const char *prefix);
 extern int cmd_lock(int argc, const char **argv, const char *prefix);
 extern int cmd_kvm(int argc, const char **argv, const char *prefix);
+extern int cmd_kvm_events(int argc, const char **argv, const char *prefix);
 extern int cmd_test(int argc, const char **argv, const char *prefix);
 extern int cmd_inject(int argc, const char **argv, const char *prefix);

diff --git a/tools/perf/perf.c b/tools/perf/perf.c
index 2b2e225..ab85ea5 100644
--- a/tools/perf/perf.c
+++ b/tools/perf/perf.c
@@ -317,6 +317,7 @@ static void handle_internal_command(int argc, const char **argv)
 		{ "kmem",	cmd_kmem,	0 },
 		{ "lock",	cmd_lock,	0 },
 		{ "kvm",	cmd_kvm,	0 },
+		{ "kvm-events", cmd_kvm_events, 0},
 		{ "test",	cmd_test,	0 },
 		{ "inject",	cmd_inject,	0 },
 	};
-- 
1.7.7.5


^ permalink raw reply related	[flat|nested] 25+ messages in thread

* Re: [PATCH 2/3] KVM: improve trace events of vmexit/mmio/ioport
  2012-01-16  9:32 ` [PATCH 2/3] KVM: improve trace events of vmexit/mmio/ioport Xiao Guangrong
@ 2012-01-16  9:38   ` Avi Kivity
  2012-01-17  2:28     ` Xiao Guangrong
  0 siblings, 1 reply; 25+ messages in thread
From: Avi Kivity @ 2012-01-16  9:38 UTC (permalink / raw)
  To: Xiao Guangrong; +Cc: Marcelo Tosatti, LKML, KVM

On 01/16/2012 11:32 AM, Xiao Guangrong wrote:
> - trace vcpu_id for these events

We can infer the vcpu id from the kvm_entry tracepoints, no?

> - add kvm_mmio_done to trace the time when mmio/ioport emulation is completed

ditto?


Relying on the existing tracepoints will make the tool work on older
kernels.


-- 
error compiling committee.c: too many arguments to function


^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: [PATCH 3/3] KVM: perf: kvm events analysis tool
  2012-01-16  9:32 ` [PATCH 3/3] KVM: perf: kvm events analysis tool Xiao Guangrong
@ 2012-01-16 10:04   ` Avi Kivity
  2012-01-17  2:30     ` Xiao Guangrong
  2012-01-16 10:08   ` Stefan Hajnoczi
  1 sibling, 1 reply; 25+ messages in thread
From: Avi Kivity @ 2012-01-16 10:04 UTC (permalink / raw)
  To: Xiao Guangrong; +Cc: Marcelo Tosatti, LKML, KVM

On 01/16/2012 11:32 AM, Xiao Guangrong wrote:
> Add 'perf kvm-events' support to analyze kvm vmexit/mmio/ioport smartly
>
> Usage:
> 	perf kvm-events record

Why not 'perf record -e kvm'?

> 	perf kvm-events report



> +static const char *get_exit_reason(long isa, u64 exit_code)
> +{
> +	int table_size = ARRAY_SIZE(svm_exit_reasons);
> +	struct exit_reasons_table *table = svm_exit_reasons;
> +
> +
> +	if (isa == 1) {
> +		table = vmx_exit_reasons;
> +		table_size = ARRAY_SIZE(vmx_exit_reasons);
> +	}
> +
> +	while (table_size--) {
> +		if (table->exit_code == exit_code)
> +			return table->reason;

... table[exit_code] ...

> +		table++;
> +	}
> +
> +	die("unkonw kvm exit code:%ld on %s\n", exit_code, isa == 1 ?
> +						"VMX" : "SVM");

"unknown"

> +
> +struct kvm_events_ops {
> +	bool (*is_begain_event)(struct event *event, void *data);

begin

> +	bool (*is_end_event)(struct event *event);
> +	struct event_key (*get_key)(struct event *event, void *data);
> +	void (*decode_key)(struct event_key *key, char decode[20]);
> +	const char *name;
> +};
> +
> +
> +static struct event_key exit_event_get_key(struct event *event, void *data)
> +{
> +	struct event_key key;
> +
> +	key.key = raw_field_value(event, "exit_reason", data);
> +	key.info = raw_field_value(event, "isa", data);

"isa" is not available on all kernel versions; need to fall back to
/proc/cpuid detection.

> +	return key;
> +}
>

-- 
error compiling committee.c: too many arguments to function


^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: [PATCH 3/3] KVM: perf: kvm events analysis tool
  2012-01-16  9:32 ` [PATCH 3/3] KVM: perf: kvm events analysis tool Xiao Guangrong
  2012-01-16 10:04   ` Avi Kivity
@ 2012-01-16 10:08   ` Stefan Hajnoczi
  2012-01-17  2:37     ` Xiao Guangrong
  2012-01-17 11:59     ` Marcelo Tosatti
  1 sibling, 2 replies; 25+ messages in thread
From: Stefan Hajnoczi @ 2012-01-16 10:08 UTC (permalink / raw)
  To: Xiao Guangrong; +Cc: Avi Kivity, Marcelo Tosatti, LKML, KVM

On Mon, Jan 16, 2012 at 9:32 AM, Xiao Guangrong
<xiaoguangrong@linux.vnet.ibm.com> wrote:
> +DESCRIPTION
> +-----------
> +You can analyze some crucial events and statistics with this
> +'perf kvm-events' command.

This line is very general and does not explain which events/statistics
can be collected or how you can use that information.  I suggest
making this description more specific.  Explain that this subcommand
observers kvm.ko tracepoints and annotates/decodes them with
additional information (this is why I would use this command and not
raw perf record -e kvm:\*).

> +       { SVM_EXIT_MONITOR,                     "monitor" }, \
> +       { SVM_EXIT_MWAIT,                       "mwait" }, \
> +       { SVM_EXIT_XSETBV,                      "xsetbv" }, \
> +       { SVM_EXIT_NPF,                         "npf" }

All this copy-paste could be avoided by sharing this stuff with the
arch/x86/kvm/ code.

> +static void exit_event_decode_key(struct event_key *key, char decode[20])
> +{
> +       const char *exit_reason = get_exit_reason(key->info, key->key);
> +
> +       memset(decode, 0, 20);
> +       strncpy(decode, exit_reason, 20);

This is a bad pattern to follow when using strncpy(3) because if there
was a strlen(exit_reason) == 20 string then decode[] would not be
NUL-terminated.  Right now it's safe but it's better to just use
strlcpy() and drop the memset(3).

> +}
> +
> +static struct kvm_events_ops exit_events = {
> +       .is_begain_event = exit_event_begain,
> +       .is_end_event = exit_event_end,
> +       .get_key = exit_event_get_key,
> +       .decode_key = exit_event_decode_key,
> +       .name = "VM-EXIT"
> +};
> +
> +#define KVM_TRACE_MMIO_READ_UNSATISFIED 0
> +#define KVM_TRACE_MMIO_READ 1
> +#define KVM_TRACE_MMIO_WRITE 2
> +static bool mmio_event_begain(struct event *event, void *data)
> +{
> +       if (!strcmp(event->name, "kvm_mmio")) {
> +               long type = raw_field_value(event, "type", data);
> +
> +               if (type != KVM_TRACE_MMIO_READ_UNSATISFIED)
> +                       return true;
> +       };
> +
> +       return false;
> +}
> +
> +static bool mmio_event_end(struct event *event)
> +{
> +       return !strcmp(event->name, "kvm_mmio_done");
> +}
> +
> +static struct event_key mmio_event_get_key(struct event *event, void *data)
> +{
> +       struct event_key key;
> +
> +       key.key = raw_field_value(event, "gpa", data);
> +       key.info = raw_field_value(event, "type", data);
> +
> +       return key;
> +}
> +
> +static void mmio_event_decode_key(struct event_key *key, char decode[20])
> +{
> +       memset(decode, 0, 20);
> +       sprintf(decode, "%#lx:%s", key->key,
> +               key->info == KVM_TRACE_MMIO_READ ? "R" : "W");

Please drop the memset and use snprintf(3) instead of sprintf(3).  It
places the NUL-terminator and ensures you don't exceed the buffer
size.

Same pattern below.

^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: [RFC][PATCH] KVM: perf: a smart tool to analyse kvm events
  2012-01-16  9:30 [RFC][PATCH] KVM: perf: a smart tool to analyse kvm events Xiao Guangrong
                   ` (2 preceding siblings ...)
  2012-01-16  9:32 ` [PATCH 3/3] KVM: perf: kvm events analysis tool Xiao Guangrong
@ 2012-01-16 10:11 ` Avi Kivity
  2012-01-17  2:30   ` Xiao Guangrong
  2012-01-16 22:53 ` David Ahern
  4 siblings, 1 reply; 25+ messages in thread
From: Avi Kivity @ 2012-01-16 10:11 UTC (permalink / raw)
  To: Xiao Guangrong; +Cc: Marcelo Tosatti, LKML, KVM

On 01/16/2012 11:30 AM, Xiao Guangrong wrote:
> This tool is very like xenoprof(if i remember correctly), and traces kvm events
> smartly. currently, it supports vmexit/mmio/ioport events.
>
> Usage:
> - to trace kvm events:
> # ./perf kvm-events record
>
> - show the result
> # ./perf kvm-events report
>
> Some output are as follow:
> # ./perf kvm-events report
>   Warning: Error: expected type 5 but read 4
>   Warning: Error: expected type 5 but read 0
>   Warning: unknown op '}'
>
>
> Analyze events for all VCPUs:
>
>              VM-EXIT    Samples      Samples%         Time%        Avg time
>
>          APIC_ACCESS     438107        44.89%         6.20%        17.91us
>   EXTERNAL_INTERRUPT     219226        22.46%         8.01%        46.20us
>       IO_INSTRUCTION     122651        12.57%         1.88%        19.44us
>        EPT_VIOLATION      83110         8.52%         1.36%        20.75us
>    PENDING_INTERRUPT      37055         3.80%         0.16%         5.38us
>                CPUID      32718         3.35%         0.08%         3.15us
>        EXCEPTION_NMI      23601         2.42%         0.17%         8.87us
>                  HLT      15424         1.58%        82.12%      6735.06us
>            CR_ACCESS       4089         0.42%         0.02%         6.08us
>
> Total Samples:975981, Total events handled time:126502464.88us.

Nice!  If we can have a live version as well, this can replace kvm_stat.

The average numbers are really high.  Like a factor of 3x-4x off.  Would
be good to print the standard deviation and see why.  Maybe it's due to
the tracing overhead.

> The default event to be analysed is vmexit, we can use --event to specify it,
> for example, if we want to trace mmio event:
> # ./perf kvm-events report --event mmio
>   Warning: Error: expected type 5 but read 4
>   Warning: Error: expected type 5 but read 0
>   Warning: unknown op '}'
>
>
> Analyze events for all VCPUs:
>
>          MMIO Access    Samples      Samples%         Time%        Avg time
>
>         0xfee00380:W     196589        64.95%        70.01%         3.83us
>         0xfee00310:W      35356        11.68%         6.48%         1.97us
>         0xfee00300:W      35356        11.68%        16.37%         4.97us
>         0xfee00300:R      35356        11.68%         7.14%         2.17us

These are more reasonable (though still high - 5us for an ICR write?)

>
> Total Samples:302657, Total events handled time:1074746.01us.
>
> We can use --vcpu to specify which vcpu is traced:
> root@localhost perf]# ./perf kvm-events report --event mmio --vcpu 1
>   Warning: Error: expected type 5 but read 4
>   Warning: Error: expected type 5 but read 0
>   Warning: unknown op '}'
>
>
> Analyze events for VCPU 1:
>
>          MMIO Access    Samples      Samples%         Time%        Avg time
>
>         0xfee00380:W      58041        71.20%        74.90%         3.70us
>         0xfee00310:W       7826         9.60%         5.28%         1.93us
>         0xfee00300:W       7826         9.60%        13.82%         5.06us
>         0xfee00300:R       7826         9.60%         6.01%         2.20us
>
> Total Samples:81519, Total events handled time:286577.81us.
>
> And, '--key' is used to sort the result, the possible value sample (default,
> the result is sorted by samples number), time(the result is sorted by time%):
> # ./perf kvm-events report --key time
>   Warning: Error: expected type 5 but read 4
>   Warning: Error: expected type 5 but read 0
>   Warning: unknown op '}'
>
>
> Analyze events for all VCPUs:
>
>              VM-EXIT    Samples      Samples%         Time%        Avg time
>
>                  HLT      15424         1.58%        82.12%      6735.06us
>   EXTERNAL_INTERRUPT     219226        22.46%         8.01%        46.20us
>          APIC_ACCESS     438107        44.89%         6.20%        17.91us
>       IO_INSTRUCTION     122651        12.57%         1.88%        19.44us
>        EPT_VIOLATION      83110         8.52%         1.36%        20.75us
>        EXCEPTION_NMI      23601         2.42%         0.17%         8.87us
>    PENDING_INTERRUPT      37055         3.80%         0.16%         5.38us
>                CPUID      32718         3.35%         0.08%         3.15us
>            CR_ACCESS       4089         0.42%         0.02%         6.08us
>
> Total Samples:975981, Total events handled time:126502464.88us.
>
> I hope guys will like it and any comments are welcome! :)

I think it's great!  A live version would be a nice addition too.

Please copy the perf userspace maintainers to get more detailed review
in the next version.

-- 
error compiling committee.c: too many arguments to function


^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: [PATCH 1/3] KVM: trace mmio read event properly
  2012-01-16  9:31 ` [PATCH 1/3] KVM: trace mmio read event properly Xiao Guangrong
@ 2012-01-16 10:18   ` Avi Kivity
  2012-01-17  2:36     ` Xiao Guangrong
  0 siblings, 1 reply; 25+ messages in thread
From: Avi Kivity @ 2012-01-16 10:18 UTC (permalink / raw)
  To: Xiao Guangrong; +Cc: Marcelo Tosatti, LKML, KVM

On 01/16/2012 11:31 AM, Xiao Guangrong wrote:
> In current code, we use KVM_TRACE_MMIO_READ to trace mmio read event which
> only can be completed immediately, instead of it, we trace the time when
> read event occur, then cooperate with then later patch, we can know the time
> of mmio read emulation
>
> @@ -3744,6 +3740,8 @@ mmio:
>  	/*
>  	 * Is this MMIO handled locally?
>  	 */
> +	trace_kvm_mmio(write ? KVM_TRACE_MMIO_WRITE : KVM_TRACE_MMIO_READ,

It's better to push the conditional to the trace event itself, so it's
only evaluated if tracing is enabled.

> +			bytes, gpa, *(u64 *)val);

We get the wrong value for reads here, no?

Can't we leave the code as is, and infer the start of the event from the
last kvm_exit trace?

>  	handled = ops->read_write_mmio(vcpu, gpa, bytes, val);
>  	if (handled == bytes)
>  		return X86EMUL_CONTINUE;


-- 
error compiling committee.c: too many arguments to function


^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: [RFC][PATCH] KVM: perf: a smart tool to analyse kvm events
  2012-01-16  9:30 [RFC][PATCH] KVM: perf: a smart tool to analyse kvm events Xiao Guangrong
                   ` (3 preceding siblings ...)
  2012-01-16 10:11 ` [RFC][PATCH] KVM: perf: a smart tool to analyse kvm events Avi Kivity
@ 2012-01-16 22:53 ` David Ahern
  2012-01-17  2:41   ` Xiao Guangrong
  4 siblings, 1 reply; 25+ messages in thread
From: David Ahern @ 2012-01-16 22:53 UTC (permalink / raw)
  To: Xiao Guangrong; +Cc: Avi Kivity, Marcelo Tosatti, LKML, KVM



On 01/16/2012 02:30 AM, Xiao Guangrong wrote:
> This tool is very like xenoprof(if i remember correctly), and traces kvm events
> smartly. currently, it supports vmexit/mmio/ioport events.
> 
> Usage:
> - to trace kvm events:
> # ./perf kvm-events record
> 
> - show the result
> # ./perf kvm-events report
> 
> Some output are as follow:
> # ./perf kvm-events report
>   Warning: Error: expected type 5 but read 4
>   Warning: Error: expected type 5 but read 0
>   Warning: unknown op '}'

Integrating the trace-cmd plugins into perf will remedy the above errors:
https://lkml.org/lkml/2011/8/16/352

Unfortunately, that effort is stalled at the moment.

> 
> 
> Analyze events for all VCPUs:
> 
>              VM-EXIT    Samples      Samples%         Time%        Avg time
> 
>          APIC_ACCESS     438107        44.89%         6.20%        17.91us
>   EXTERNAL_INTERRUPT     219226        22.46%         8.01%        46.20us
>       IO_INSTRUCTION     122651        12.57%         1.88%        19.44us
>        EPT_VIOLATION      83110         8.52%         1.36%        20.75us
>    PENDING_INTERRUPT      37055         3.80%         0.16%         5.38us
>                CPUID      32718         3.35%         0.08%         3.15us
>        EXCEPTION_NMI      23601         2.42%         0.17%         8.87us
>                  HLT      15424         1.58%        82.12%      6735.06us
>            CR_ACCESS       4089         0.42%         0.02%         6.08us
> 
> Total Samples:975981, Total events handled time:126502464.88us.

Have you thought about dumping a time history -- something similar to
what perf-script can do with dumping events but adding in kvm-specific
analysis like what you are doing in these examples?

David


> 
> The default event to be analysed is vmexit, we can use --event to specify it,
> for example, if we want to trace mmio event:
> # ./perf kvm-events report --event mmio
>   Warning: Error: expected type 5 but read 4
>   Warning: Error: expected type 5 but read 0
>   Warning: unknown op '}'
> 
> 
> Analyze events for all VCPUs:
> 
>          MMIO Access    Samples      Samples%         Time%        Avg time
> 
>         0xfee00380:W     196589        64.95%        70.01%         3.83us
>         0xfee00310:W      35356        11.68%         6.48%         1.97us
>         0xfee00300:W      35356        11.68%        16.37%         4.97us
>         0xfee00300:R      35356        11.68%         7.14%         2.17us
> 
> Total Samples:302657, Total events handled time:1074746.01us.
> 
> We can use --vcpu to specify which vcpu is traced:
> root@localhost perf]# ./perf kvm-events report --event mmio --vcpu 1
>   Warning: Error: expected type 5 but read 4
>   Warning: Error: expected type 5 but read 0
>   Warning: unknown op '}'
> 
> 
> Analyze events for VCPU 1:
> 
>          MMIO Access    Samples      Samples%         Time%        Avg time
> 
>         0xfee00380:W      58041        71.20%        74.90%         3.70us
>         0xfee00310:W       7826         9.60%         5.28%         1.93us
>         0xfee00300:W       7826         9.60%        13.82%         5.06us
>         0xfee00300:R       7826         9.60%         6.01%         2.20us
> 
> Total Samples:81519, Total events handled time:286577.81us.
> 
> And, '--key' is used to sort the result, the possible value sample (default,
> the result is sorted by samples number), time(the result is sorted by time%):
> # ./perf kvm-events report --key time
>   Warning: Error: expected type 5 but read 4
>   Warning: Error: expected type 5 but read 0
>   Warning: unknown op '}'
> 
> 
> Analyze events for all VCPUs:
> 
>              VM-EXIT    Samples      Samples%         Time%        Avg time
> 
>                  HLT      15424         1.58%        82.12%      6735.06us
>   EXTERNAL_INTERRUPT     219226        22.46%         8.01%        46.20us
>          APIC_ACCESS     438107        44.89%         6.20%        17.91us
>       IO_INSTRUCTION     122651        12.57%         1.88%        19.44us
>        EPT_VIOLATION      83110         8.52%         1.36%        20.75us
>        EXCEPTION_NMI      23601         2.42%         0.17%         8.87us
>    PENDING_INTERRUPT      37055         3.80%         0.16%         5.38us
>                CPUID      32718         3.35%         0.08%         3.15us
>            CR_ACCESS       4089         0.42%         0.02%         6.08us
> 
> Total Samples:975981, Total events handled time:126502464.88us.
> 
> I hope guys will like it and any comments are welcome! :)
> 
> --
> To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html
> Please read the FAQ at  http://www.tux.org/lkml/

^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: [PATCH 2/3] KVM: improve trace events of vmexit/mmio/ioport
  2012-01-16  9:38   ` Avi Kivity
@ 2012-01-17  2:28     ` Xiao Guangrong
  2012-01-17 11:55       ` Marcelo Tosatti
  2012-01-24 12:44       ` Avi Kivity
  0 siblings, 2 replies; 25+ messages in thread
From: Xiao Guangrong @ 2012-01-17  2:28 UTC (permalink / raw)
  To: Avi Kivity; +Cc: Marcelo Tosatti, LKML, KVM

On 01/16/2012 05:38 PM, Avi Kivity wrote:

> On 01/16/2012 11:32 AM, Xiao Guangrong wrote:
>> - trace vcpu_id for these events
> 
> We can infer the vcpu id from the kvm_entry tracepoints, no?
> 


Thanks for your review, Avi!

Hmm. i think it is hard to do since the vcpu thread can be scheduled
anytime, one example is as follow:

CPU 0

kvm_entry vcpu 0
......
kvm_entry vcpu 1
......
event1 occurs
......
event2 occurs

It is hard to know the event belong to which kvm_entry?

>> - add kvm_mmio_done to trace the time when mmio/ioport emulation is completed
> 
> ditto?
> 


I think is ok to get the event end time by using kvm_entry.

> 
> Relying on the existing tracepoints will make the tool work on older
> kernels.
> 


We can drop all new events, but unfortunately, the information of the origin
tracepoints is not enough, at least vcpu_id need be traced in theses events
to match its events. Yes?


^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: [PATCH 3/3] KVM: perf: kvm events analysis tool
  2012-01-16 10:04   ` Avi Kivity
@ 2012-01-17  2:30     ` Xiao Guangrong
  2012-01-24 12:49       ` Avi Kivity
  0 siblings, 1 reply; 25+ messages in thread
From: Xiao Guangrong @ 2012-01-17  2:30 UTC (permalink / raw)
  To: Avi Kivity; +Cc: Marcelo Tosatti, LKML, KVM

On 01/16/2012 06:04 PM, Avi Kivity wrote:

> On 01/16/2012 11:32 AM, Xiao Guangrong wrote:
>> Add 'perf kvm-events' support to analyze kvm vmexit/mmio/ioport smartly
>>
>> Usage:
>> 	perf kvm-events record
> 
> Why not 'perf record -e kvm'?


It works, many perf tools have this style, like:
perf lock record,
perf sched record,
perf kmem record,

I think the reason is that only enable the tracepoints we need to avoid
unnecessary overload so that the result is more exacter.

> 
>> 	perf kvm-events report
> 
> 
> 
>> +static const char *get_exit_reason(long isa, u64 exit_code)
>> +{
>> +	int table_size = ARRAY_SIZE(svm_exit_reasons);
>> +	struct exit_reasons_table *table = svm_exit_reasons;
>> +
>> +
>> +	if (isa == 1) {
>> +		table = vmx_exit_reasons;
>> +		table_size = ARRAY_SIZE(vmx_exit_reasons);
>> +	}
>> +
>> +	while (table_size--) {
>> +		if (table->exit_code == exit_code)
>> +			return table->reason;
> 
> ... table[exit_code] ...
> 


Actually, this array is not indexed by exit_code, it means:
table[exit_code].exit_code != exit_code.

>> +		table++;
>> +	}
>> +
>> +	die("unkonw kvm exit code:%ld on %s\n", exit_code, isa == 1 ?
>> +						"VMX" : "SVM");
> 
> "unknown"
> 


......

>> +
>> +struct kvm_events_ops {
>> +	bool (*is_begain_event)(struct event *event, void *data);
> 
> begin
> 


Sorry for my careless.

>> +	bool (*is_end_event)(struct event *event);
>> +	struct event_key (*get_key)(struct event *event, void *data);
>> +	void (*decode_key)(struct event_key *key, char decode[20]);
>> +	const char *name;
>> +};
>> +
>> +
>> +static struct event_key exit_event_get_key(struct event *event, void *data)
>> +{
>> +	struct event_key key;
>> +
>> +	key.key = raw_field_value(event, "exit_reason", data);
>> +	key.info = raw_field_value(event, "isa", data);
> 
> "isa" is not available on all kernel versions; need to fall back to
> /proc/cpuid detection.
> 


Got it, will do it in the next version. Thanks Avi!


^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: [RFC][PATCH] KVM: perf: a smart tool to analyse kvm events
  2012-01-16 10:11 ` [RFC][PATCH] KVM: perf: a smart tool to analyse kvm events Avi Kivity
@ 2012-01-17  2:30   ` Xiao Guangrong
  0 siblings, 0 replies; 25+ messages in thread
From: Xiao Guangrong @ 2012-01-17  2:30 UTC (permalink / raw)
  To: Avi Kivity; +Cc: Marcelo Tosatti, LKML, KVM

On 01/16/2012 06:11 PM, Avi Kivity wrote:


>> Total Samples:975981, Total events handled time:126502464.88us.
> 
> Nice!  If we can have a live version as well, this can replace kvm_stat.
> 
> The average numbers are really high.  Like a factor of 3x-4x off.  Would
> be good to print the standard deviation and see why.  Maybe it's due to
> the tracing overhead.
> 


It is a good suggestion, i will print stddev in the next version.

>> The default event to be analysed is vmexit, we can use --event to specify it,
>> for example, if we want to trace mmio event:
>> # ./perf kvm-events report --event mmio
>>   Warning: Error: expected type 5 but read 4
>>   Warning: Error: expected type 5 but read 0
>>   Warning: unknown op '}'
>>
>>
>> Analyze events for all VCPUs:
>>
>>          MMIO Access    Samples      Samples%         Time%        Avg time
>>
>>         0xfee00380:W     196589        64.95%        70.01%         3.83us
>>         0xfee00310:W      35356        11.68%         6.48%         1.97us
>>         0xfee00300:W      35356        11.68%        16.37%         4.97us
>>         0xfee00300:R      35356        11.68%         7.14%         2.17us
> 
> These are more reasonable (though still high - 5us for an ICR write?)
> 


Hmm, maybe i need look into it...

>> Total Samples:975981, Total events handled time:126502464.88us.
>>
>> I hope guys will like it and any comments are welcome! :)
> 
> I think it's great!  A live version would be a nice addition too.
> 
> Please copy the perf userspace maintainers to get more detailed review
> in the next version.
> 


Okay, Thanks!


^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: [PATCH 1/3] KVM: trace mmio read event properly
  2012-01-16 10:18   ` Avi Kivity
@ 2012-01-17  2:36     ` Xiao Guangrong
  0 siblings, 0 replies; 25+ messages in thread
From: Xiao Guangrong @ 2012-01-17  2:36 UTC (permalink / raw)
  To: Avi Kivity; +Cc: Marcelo Tosatti, LKML, KVM

On 01/16/2012 06:18 PM, Avi Kivity wrote:

> On 01/16/2012 11:31 AM, Xiao Guangrong wrote:
>> In current code, we use KVM_TRACE_MMIO_READ to trace mmio read event which
>> only can be completed immediately, instead of it, we trace the time when
>> read event occur, then cooperate with then later patch, we can know the time
>> of mmio read emulation
>>
>> @@ -3744,6 +3740,8 @@ mmio:
>>  	/*
>>  	 * Is this MMIO handled locally?
>>  	 */
>> +	trace_kvm_mmio(write ? KVM_TRACE_MMIO_WRITE : KVM_TRACE_MMIO_READ,
> 
> It's better to push the conditional to the trace event itself, so it's
> only evaluated if tracing is enabled.
> 


Yes.

>> +			bytes, gpa, *(u64 *)val);
> 
> We get the wrong value for reads here, no?
> 


Yes, the value is meaningless in this point.

> Can't we leave the code as is, and infer the start of the event from the
> last kvm_exit trace?


Yes, we can do it, but the same problems is we need trace vcpu id to get its
vmexit, i will do it in the next version if you like this way :)


^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: [PATCH 3/3] KVM: perf: kvm events analysis tool
  2012-01-16 10:08   ` Stefan Hajnoczi
@ 2012-01-17  2:37     ` Xiao Guangrong
  2012-01-17 11:59     ` Marcelo Tosatti
  1 sibling, 0 replies; 25+ messages in thread
From: Xiao Guangrong @ 2012-01-17  2:37 UTC (permalink / raw)
  To: Stefan Hajnoczi; +Cc: Avi Kivity, Marcelo Tosatti, LKML, KVM

On 01/16/2012 06:08 PM, Stefan Hajnoczi wrote:

> On Mon, Jan 16, 2012 at 9:32 AM, Xiao Guangrong
> <xiaoguangrong@linux.vnet.ibm.com> wrote:
>> +DESCRIPTION
>> +-----------
>> +You can analyze some crucial events and statistics with this
>> +'perf kvm-events' command.
> 
> This line is very general and does not explain which events/statistics
> can be collected or how you can use that information.  I suggest
> making this description more specific.  Explain that this subcommand
> observers kvm.ko tracepoints and annotates/decodes them with
> additional information (this is why I would use this command and not
> raw perf record -e kvm:\*).
> 


Okay.

>> + � � � { SVM_EXIT_MONITOR, � � � � � � � � � � "monitor" }, \
>> + � � � { SVM_EXIT_MWAIT, � � � � � � � � � � � "mwait" }, \
>> + � � � { SVM_EXIT_XSETBV, � � � � � � � � � � �"xsetbv" }, \
>> + � � � { SVM_EXIT_NPF, � � � � � � � � � � � � "npf" }
> 
> All this copy-paste could be avoided by sharing this stuff with the
> arch/x86/kvm/ code.
> 


I will try to combine them in the next version.

>> +static void exit_event_decode_key(struct event_key *key, char decode[20])
>> +{
>> + � � � const char *exit_reason = get_exit_reason(key->info, key->key);
>> +
>> + � � � memset(decode, 0, 20);
>> + � � � strncpy(decode, exit_reason, 20);
> 
> This is a bad pattern to follow when using strncpy(3) because if there
> was a strlen(exit_reason) == 20 string then decode[] would not be
> NUL-terminated.  Right now it's safe but it's better to just use
> strlcpy() and drop the memset(3).
> 


Good point.

>> +static void mmio_event_decode_key(struct event_key *key, char decode[20])
>> +{
>> + � � � memset(decode, 0, 20);
>> + � � � sprintf(decode, "%#lx:%s", key->key,
>> + � � � � � � � key->info == KVM_TRACE_MMIO_READ ? "R" : "W");
> 
> Please drop the memset and use snprintf(3) instead of sprintf(3).  It
> places the NUL-terminator and ensures you don't exceed the buffer
> size.
> 
> Same pattern below.
> 


Will do, thanks Stefan! :)




^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: [RFC][PATCH] KVM: perf: a smart tool to analyse kvm events
  2012-01-16 22:53 ` David Ahern
@ 2012-01-17  2:41   ` Xiao Guangrong
  2012-01-17  4:49     ` David Ahern
  0 siblings, 1 reply; 25+ messages in thread
From: Xiao Guangrong @ 2012-01-17  2:41 UTC (permalink / raw)
  To: David Ahern; +Cc: Avi Kivity, Marcelo Tosatti, LKML, KVM

On 01/17/2012 06:53 AM, David Ahern wrote:

> 
> 
> On 01/16/2012 02:30 AM, Xiao Guangrong wrote:
>> This tool is very like xenoprof(if i remember correctly), and traces kvm events
>> smartly. currently, it supports vmexit/mmio/ioport events.
>>
>> Usage:
>> - to trace kvm events:
>> # ./perf kvm-events record
>>
>> - show the result
>> # ./perf kvm-events report
>>
>> Some output are as follow:
>> # ./perf kvm-events report
>>   Warning: Error: expected type 5 but read 4
>>   Warning: Error: expected type 5 but read 0
>>   Warning: unknown op '}'
> 
> Integrating the trace-cmd plugins into perf will remedy the above errors:
> https://lkml.org/lkml/2011/8/16/352
> 


Yes, it is great!

> Unfortunately, that effort is stalled at the moment.
> 
>>
>>
>> Analyze events for all VCPUs:
>>
>>              VM-EXIT    Samples      Samples%         Time%        Avg time
>>
>>          APIC_ACCESS     438107        44.89%         6.20%        17.91us
>>   EXTERNAL_INTERRUPT     219226        22.46%         8.01%        46.20us
>>       IO_INSTRUCTION     122651        12.57%         1.88%        19.44us
>>        EPT_VIOLATION      83110         8.52%         1.36%        20.75us
>>    PENDING_INTERRUPT      37055         3.80%         0.16%         5.38us
>>                CPUID      32718         3.35%         0.08%         3.15us
>>        EXCEPTION_NMI      23601         2.42%         0.17%         8.87us
>>                  HLT      15424         1.58%        82.12%      6735.06us
>>            CR_ACCESS       4089         0.42%         0.02%         6.08us
>>
>> Total Samples:975981, Total events handled time:126502464.88us.
> 
> Have you thought about dumping a time history -- something similar to
> what perf-script can do with dumping events but adding in kvm-specific
> analysis like what you are doing in these examples?
> 


I will look into it and put it to my todo list if it is possible.
Thanks, David!


^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: [RFC][PATCH] KVM: perf: a smart tool to analyse kvm events
  2012-01-17  2:41   ` Xiao Guangrong
@ 2012-01-17  4:49     ` David Ahern
  0 siblings, 0 replies; 25+ messages in thread
From: David Ahern @ 2012-01-17  4:49 UTC (permalink / raw)
  To: Xiao Guangrong; +Cc: Avi Kivity, Marcelo Tosatti, LKML, KVM

[-- Attachment #1: Type: text/plain, Size: 1037 bytes --]


On 01/16/2012 07:41 PM, Xiao Guangrong wrote:
>> Have you thought about dumping a time history -- something similar to
>> what perf-script can do with dumping events but adding in kvm-specific
>> analysis like what you are doing in these examples?
>>
> 
> 
> I will look into it and put it to my todo list if it is possible.
> Thanks, David!
> 

I've played around with ways to do it as time (and motivation) allowed.
Attached is one example using perf with the trace-cmd plugin plus a
patch on perf-script to dump time between events:

perf record -e kvm:* -fo /tmp/perf.data -p 2540 -- sleep 1
perf script -i /tmp/perf.data

The output of perf-script is in the attached file. The 5th column is the
dt between successive events which is mainly a convenience.

A perf-kvm-events type command would allow more customization in the
output -- like correlating specific events and computing total time
between exit and entry accounting for HLT reasons -- as well as various
statistical dumps (average, stddev, max/min, histograms).

David


[-- Attachment #2: kvm-trace-example.txt --]
[-- Type: text/plain, Size: 2525 bytes --]


qemu-kvm  2542 [001] 20757.662426  0.000001 kvm_cr: cr_write 3 = 0x6f3000
qemu-kvm  2542 [001] 20757.662430  0.000004 kvm_entry: vcpu 0
qemu-kvm  2542 [001] 20757.662432  0.000002 kvm_exit: reason IO_INSTRUCTION rip 0x806d8dbc info b008000b 0

qemu-kvm  2542 [001] 20757.662434  0.000002 kvm_emulate_insn: 0:806d8dbc: ed
qemu-kvm  2542 [001] 20757.662435  0.000001 kvm_pio: pio_read at 0xb008 size 4 count 1
qemu-kvm  2542 [001] 20757.662436  0.000001 kvm_userspace_exit: reason KVM_EXIT_IO (2)
qemu-kvm  2542 [001] 20757.662442  0.000006 kvm_entry: vcpu 0
qemu-kvm  2542 [001] 20757.662444  0.000002 kvm_exit: reason HLT rip 0xf770fd3d info 0 0

qemu-kvm  2540 [000] 20757.666479  0.004287 kvm_set_irq: gsi 9 level 1 source 0
qemu-kvm  2540 [000] 20757.666481  0.000002 kvm_pic_set_irq: chip 1 pin 1 (edge|masked)
qemu-kvm  2540 [000] 20757.666482  0.000001 kvm_apic_accept_irq: apicid 0 vec 177 (LowPrio|level)
qemu-kvm  2540 [000] 20757.666485  0.000003 kvm_ioapic_set_irq: pin 9 dst 1 vec=177 (LowPrio|logical|level)

qemu-kvm  2542 [001] 20757.666505  0.004061 kvm_inj_virq: irq 177
qemu-kvm  2542 [001] 20757.666506  0.000001 kvm_entry: vcpu 0
qemu-kvm  2542 [001] 20757.666512  0.000006 kvm_exit: reason IO_INSTRUCTION rip 0x806d88ca info b0000009 0

qemu-kvm  2542 [001] 20757.666516  0.000004 kvm_emulate_insn: 0:806d88ca: 66 ed
qemu-kvm  2542 [001] 20757.666517  0.000001 kvm_pio: pio_read at 0xb000 size 2 count 1
qemu-kvm  2542 [001] 20757.666519  0.000002 kvm_userspace_exit: reason KVM_EXIT_IO (2)
qemu-kvm  2542 [001] 20757.666528  0.000009 kvm_entry: vcpu 0
qemu-kvm  2542 [001] 20757.666531  0.000003 kvm_exit: reason IO_INSTRUCTION rip 0x806d88be info afe00008 0

qemu-kvm  2542 [001] 20757.666534  0.000003 kvm_emulate_insn: 0:806d88be: ec
qemu-kvm  2542 [001] 20757.666535  0.000001 kvm_pio: pio_read at 0xafe0 size 1 count 1
qemu-kvm  2542 [001] 20757.666537  0.000002 kvm_userspace_exit: reason KVM_EXIT_IO (2)
qemu-kvm  2542 [001] 20757.666544  0.000007 kvm_entry: vcpu 0
qemu-kvm  2542 [001] 20757.666547  0.000003 kvm_exit: reason IO_INSTRUCTION rip 0x806d88be info afe10008 0

qemu-kvm  2542 [001] 20757.666550  0.000003 kvm_emulate_insn: 0:806d88be: ec
qemu-kvm  2542 [001] 20757.666551  0.000001 kvm_pio: pio_read at 0xafe1 size 1 count 1
qemu-kvm  2542 [001] 20757.666552  0.000001 kvm_userspace_exit: reason KVM_EXIT_IO (2)
qemu-kvm  2542 [001] 20757.666558  0.000006 kvm_entry: vcpu 0
qemu-kvm  2542 [001] 20757.666562  0.000004 kvm_exit: reason IO_INSTRUCTION rip 0x806d8934 info b0000001 0


^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: [PATCH 2/3] KVM: improve trace events of vmexit/mmio/ioport
  2012-01-17  2:28     ` Xiao Guangrong
@ 2012-01-17 11:55       ` Marcelo Tosatti
  2012-01-17 17:31         ` David Ahern
  2012-01-24 12:44       ` Avi Kivity
  1 sibling, 1 reply; 25+ messages in thread
From: Marcelo Tosatti @ 2012-01-17 11:55 UTC (permalink / raw)
  To: Xiao Guangrong; +Cc: Avi Kivity, LKML, KVM

On Tue, Jan 17, 2012 at 10:28:11AM +0800, Xiao Guangrong wrote:
> On 01/16/2012 05:38 PM, Avi Kivity wrote:
> 
> > On 01/16/2012 11:32 AM, Xiao Guangrong wrote:
> >> - trace vcpu_id for these events
> > 
> > We can infer the vcpu id from the kvm_entry tracepoints, no?
> > 
> 
> 
> Thanks for your review, Avi!
> 
> Hmm. i think it is hard to do since the vcpu thread can be scheduled
> anytime, one example is as follow:
> 
> CPU 0
> 
> kvm_entry vcpu 0
> ......
> kvm_entry vcpu 1
> ......
> event1 occurs
> ......
> event2 occurs
> 
> It is hard to know the event belong to which kvm_entry?
> 
> >> - add kvm_mmio_done to trace the time when mmio/ioport emulation is completed
> > 
> > ditto?
> > 
> 
> 
> I think is ok to get the event end time by using kvm_entry.
> 
> > 
> > Relying on the existing tracepoints will make the tool work on older
> > kernels.
> > 
> 
> 
> We can drop all new events, but unfortunately, the information of the origin
> tracepoints is not enough, at least vcpu_id need be traced in theses events
> to match its events. Yes?

Yes, and if you don't add completed events, you'll have to use kvm_entry 
whose collection adds a lot of overhead.


^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: [PATCH 3/3] KVM: perf: kvm events analysis tool
  2012-01-16 10:08   ` Stefan Hajnoczi
  2012-01-17  2:37     ` Xiao Guangrong
@ 2012-01-17 11:59     ` Marcelo Tosatti
  2012-01-24 12:51       ` Avi Kivity
  1 sibling, 1 reply; 25+ messages in thread
From: Marcelo Tosatti @ 2012-01-17 11:59 UTC (permalink / raw)
  To: Stefan Hajnoczi; +Cc: Xiao Guangrong, Avi Kivity, LKML, KVM

On Mon, Jan 16, 2012 at 10:08:40AM +0000, Stefan Hajnoczi wrote:
> On Mon, Jan 16, 2012 at 9:32 AM, Xiao Guangrong
> <xiaoguangrong@linux.vnet.ibm.com> wrote:
> > +DESCRIPTION
> > +-----------
> > +You can analyze some crucial events and statistics with this
> > +'perf kvm-events' command.
> 
> This line is very general and does not explain which events/statistics
> can be collected or how you can use that information.  I suggest
> making this description more specific.  Explain that this subcommand
> observers kvm.ko tracepoints and annotates/decodes them with
> additional information (this is why I would use this command and not
> raw perf record -e kvm:\*).
> 
> > +       { SVM_EXIT_MONITOR,                     "monitor" }, \
> > +       { SVM_EXIT_MWAIT,                       "mwait" }, \
> > +       { SVM_EXIT_XSETBV,                      "xsetbv" }, \
> > +       { SVM_EXIT_NPF,                         "npf" }
> 
> All this copy-paste could be avoided by sharing this stuff with the
> arch/x86/kvm/ code.

Yes, same for KVM_MAX_VCPUS.


^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: [PATCH 2/3] KVM: improve trace events of vmexit/mmio/ioport
  2012-01-17 11:55       ` Marcelo Tosatti
@ 2012-01-17 17:31         ` David Ahern
  2012-01-18  2:32           ` Xiao Guangrong
  0 siblings, 1 reply; 25+ messages in thread
From: David Ahern @ 2012-01-17 17:31 UTC (permalink / raw)
  To: Marcelo Tosatti, Xiao Guangrong; +Cc: Avi Kivity, LKML, KVM

On 01/17/2012 04:55 AM, Marcelo Tosatti wrote:
> On Tue, Jan 17, 2012 at 10:28:11AM +0800, Xiao Guangrong wrote:
>> On 01/16/2012 05:38 PM, Avi Kivity wrote:
>>
>>> On 01/16/2012 11:32 AM, Xiao Guangrong wrote:
>>>> - trace vcpu_id for these events
>>>
>>> We can infer the vcpu id from the kvm_entry tracepoints, no?
>>>
>>
>>
>> Thanks for your review, Avi!
>>
>> Hmm. i think it is hard to do since the vcpu thread can be scheduled
>> anytime, one example is as follow:
>>
>> CPU 0
>>
>> kvm_entry vcpu 0
>> ......
>> kvm_entry vcpu 1
>> ......
>> event1 occurs
>> ......
>> event2 occurs
>>
>> It is hard to know the event belong to which kvm_entry?
>>
>>>> - add kvm_mmio_done to trace the time when mmio/ioport emulation is completed
>>>
>>> ditto?
>>>
>>
>>
>> I think is ok to get the event end time by using kvm_entry.
>>
>>>
>>> Relying on the existing tracepoints will make the tool work on older
>>> kernels.
>>>
>>
>>
>> We can drop all new events, but unfortunately, the information of the origin
>> tracepoints is not enough, at least vcpu_id need be traced in theses events
>> to match its events. Yes?
> 
> Yes, and if you don't add completed events, you'll have to use kvm_entry 
> whose collection adds a lot of overhead.


vcpus run as threads and hence have their own tid which is available in
the perf samples.

There is an ongoing 'discussion' about modifying existing tracepoints
which your proposed patch violates. This is the start of it:

https://lkml.org/lkml/2012/1/16/41

David

^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: [PATCH 2/3] KVM: improve trace events of vmexit/mmio/ioport
  2012-01-17 17:31         ` David Ahern
@ 2012-01-18  2:32           ` Xiao Guangrong
  2012-01-18  5:34             ` David Ahern
  0 siblings, 1 reply; 25+ messages in thread
From: Xiao Guangrong @ 2012-01-18  2:32 UTC (permalink / raw)
  To: David Ahern; +Cc: Marcelo Tosatti, Avi Kivity, LKML, KVM

On 01/18/2012 01:31 AM, David Ahern wrote:


>>>>
>>>> Relying on the existing tracepoints will make the tool work on older
>>>> kernels.
>>>>
>>>
>>>
>>> We can drop all new events, but unfortunately, the information of the origin
>>> tracepoints is not enough, at least vcpu_id need be traced in theses events
>>> to match its events. Yes?
>>
>> Yes, and if you don't add completed events, you'll have to use kvm_entry 
>> whose collection adds a lot of overhead.
> 
> 
> vcpus run as threads and hence have their own tid which is available in
> the perf samples.
> 


Nice, thanks for your information, David!

> There is an ongoing 'discussion' about modifying existing tracepoints
> which your proposed patch violates. This is the start of it:
> 


Hmm, i think we can still add completed tracepoints in the new version kernel,
if old version kernel is used, we can fall back to use kvm_entry instead?

And there is a exception for mmio read, in current code, the mmio read event is
actually used to trace the time when then read emulation is completed, i think
we can add a tracepoint like mmio_read_begin to trace the start time of mmio read.
So:
- for the new kernel, we use mmio_read_begion and kvm_mmio(READ...) to calculate
  start time and end time.
- for the old kernel, we use kvm_exit and kvm_mmio(READ...) to calculate start time
  and end time.

Your idea?


^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: [PATCH 2/3] KVM: improve trace events of vmexit/mmio/ioport
  2012-01-18  2:32           ` Xiao Guangrong
@ 2012-01-18  5:34             ` David Ahern
  0 siblings, 0 replies; 25+ messages in thread
From: David Ahern @ 2012-01-18  5:34 UTC (permalink / raw)
  To: Xiao Guangrong; +Cc: Marcelo Tosatti, Avi Kivity, LKML, KVM

On 01/17/2012 07:32 PM, Xiao Guangrong wrote:
>> There is an ongoing 'discussion' about modifying existing tracepoints
>> which your proposed patch violates. This is the start of it:
>>
> 
> 
> Hmm, i think we can still add completed tracepoints in the new version kernel,
> if old version kernel is used, we can fall back to use kvm_entry instead?

Yes, new ones are ok. I was referring to the tracepoints like kvm_pio
where you added the vcpu_id. And it is not really necessary: with
kvm_entry tracepoints it is easy to correlate vcpu to tid and even
without you still get thread base samples so events happening on a
thread are all the sample vcpu (even if you don't know whether that is
vcpu 0, 1, 2, etc).

> 
> And there is a exception for mmio read, in current code, the mmio read event is
> actually used to trace the time when then read emulation is completed, i think
> we can add a tracepoint like mmio_read_begin to trace the start time of mmio read.
> So:
> - for the new kernel, we use mmio_read_begion and kvm_mmio(READ...) to calculate
>   start time and end time.
> - for the old kernel, we use kvm_exit and kvm_mmio(READ...) to calculate start time
>   and end time.
> 
> Your idea?
> 

I did play around with it a bit more today.

My concern would be adding more events may make things more precise, but
it adds more overhead and I am not sure the precision is worth it. For
example on my laptop (Penryn Core 2; a lab server with a xeon E5560
processor is much faster):

0.000002 kvm_exit reason IO_INSTRUCTION rip 0x806d0e30 info 710048 0
0.000003     kvm_emulate_insn 0:806d0e30: e4 71
0.000001     kvm_pio pio_read at 0x71 size 1 count 1
0.000001     kvm_userspace_exit reason KVM_EXIT_IO (2)
0.000003     kvm_set_irq gsi 8 level 0 source 0
0.000001     kvm_pic_set_irq chip 1 pin 0 (edge|masked)
0.000001     kvm_ioapic_set_irq pin 8 dst 1 vec=209 (Fixed|logical|edge)
         total exit time: 0.000016

The first column is the delta-time between successive events for a vcpu.
Most of those events are in the rounded up microsecond range. Moving on
to the subsequent kvm_entry shows a total VMENTRY run time of 5 usecs:
ie., 16 usecs on a VMEXIT with 7 tracepoints and 5 usecs spent in a VMENTRY.

What I am getting at is that the cost of the tracepoints becomes a
significant overhead. If it costs 500nsec or 1usec to generate an event
and the time in a VMENTRY is only 5 usecs the tracepoint is a large part
of the time.

David

^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: [PATCH 2/3] KVM: improve trace events of vmexit/mmio/ioport
  2012-01-17  2:28     ` Xiao Guangrong
  2012-01-17 11:55       ` Marcelo Tosatti
@ 2012-01-24 12:44       ` Avi Kivity
  1 sibling, 0 replies; 25+ messages in thread
From: Avi Kivity @ 2012-01-24 12:44 UTC (permalink / raw)
  To: Xiao Guangrong; +Cc: Marcelo Tosatti, LKML, KVM

On 01/17/2012 04:28 AM, Xiao Guangrong wrote:
> On 01/16/2012 05:38 PM, Avi Kivity wrote:
>
> > On 01/16/2012 11:32 AM, Xiao Guangrong wrote:
> >> - trace vcpu_id for these events
> > 
> > We can infer the vcpu id from the kvm_entry tracepoints, no?
> > 
>
>
> Thanks for your review, Avi!
>
> Hmm. i think it is hard to do since the vcpu thread can be scheduled
> anytime, one example is as follow:
>
> CPU 0
>
> kvm_entry vcpu 0
> ......
> kvm_entry vcpu 1
> ......
> event1 occurs
> ......
> event2 occurs
>
> It is hard to know the event belong to which kvm_entry?
>

Why is it hard?  Correlate the task IDs.

-- 
error compiling committee.c: too many arguments to function


^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: [PATCH 3/3] KVM: perf: kvm events analysis tool
  2012-01-17  2:30     ` Xiao Guangrong
@ 2012-01-24 12:49       ` Avi Kivity
  0 siblings, 0 replies; 25+ messages in thread
From: Avi Kivity @ 2012-01-24 12:49 UTC (permalink / raw)
  To: Xiao Guangrong; +Cc: Marcelo Tosatti, LKML, KVM

On 01/17/2012 04:30 AM, Xiao Guangrong wrote:
> On 01/16/2012 06:04 PM, Avi Kivity wrote:
>
> > On 01/16/2012 11:32 AM, Xiao Guangrong wrote:
> >> Add 'perf kvm-events' support to analyze kvm vmexit/mmio/ioport smartly
> >>
> >> Usage:
> >> 	perf kvm-events record
> > 
> > Why not 'perf record -e kvm'?
>
>
> It works, many perf tools have this style, like:
> perf lock record,
> perf sched record,
> perf kmem record,
>
> I think the reason is that only enable the tracepoints we need to avoid
> unnecessary overload so that the result is more exacter.

Maybe a virtual event list:

  perf record -e kvm-pio

This allows us to combine the event list with other events:

  perf record -e kvm-pio -e timer


> >> +
> >> +	while (table_size--) {
> >> +		if (table->exit_code == exit_code)
> >> +			return table->reason;
> > 
> > ... table[exit_code] ...
> > 
>
>
> Actually, this array is not indexed by exit_code, it means:
> table[exit_code].exit_code != exit_code.

So make it indexed by exit code.



-- 
error compiling committee.c: too many arguments to function


^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: [PATCH 3/3] KVM: perf: kvm events analysis tool
  2012-01-17 11:59     ` Marcelo Tosatti
@ 2012-01-24 12:51       ` Avi Kivity
  0 siblings, 0 replies; 25+ messages in thread
From: Avi Kivity @ 2012-01-24 12:51 UTC (permalink / raw)
  To: Marcelo Tosatti; +Cc: Stefan Hajnoczi, Xiao Guangrong, LKML, KVM

On 01/17/2012 01:59 PM, Marcelo Tosatti wrote:
> > 
> > All this copy-paste could be avoided by sharing this stuff with the
> > arch/x86/kvm/ code.
>
> Yes, same for KVM_MAX_VCPUS.
>
>

This is an internal define.  perf kvm should work with different kernel
versions, so it needs to query the value dynamically.

-- 
error compiling committee.c: too many arguments to function


^ permalink raw reply	[flat|nested] 25+ messages in thread

end of thread, other threads:[~2012-01-24 12:51 UTC | newest]

Thread overview: 25+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2012-01-16  9:30 [RFC][PATCH] KVM: perf: a smart tool to analyse kvm events Xiao Guangrong
2012-01-16  9:31 ` [PATCH 1/3] KVM: trace mmio read event properly Xiao Guangrong
2012-01-16 10:18   ` Avi Kivity
2012-01-17  2:36     ` Xiao Guangrong
2012-01-16  9:32 ` [PATCH 2/3] KVM: improve trace events of vmexit/mmio/ioport Xiao Guangrong
2012-01-16  9:38   ` Avi Kivity
2012-01-17  2:28     ` Xiao Guangrong
2012-01-17 11:55       ` Marcelo Tosatti
2012-01-17 17:31         ` David Ahern
2012-01-18  2:32           ` Xiao Guangrong
2012-01-18  5:34             ` David Ahern
2012-01-24 12:44       ` Avi Kivity
2012-01-16  9:32 ` [PATCH 3/3] KVM: perf: kvm events analysis tool Xiao Guangrong
2012-01-16 10:04   ` Avi Kivity
2012-01-17  2:30     ` Xiao Guangrong
2012-01-24 12:49       ` Avi Kivity
2012-01-16 10:08   ` Stefan Hajnoczi
2012-01-17  2:37     ` Xiao Guangrong
2012-01-17 11:59     ` Marcelo Tosatti
2012-01-24 12:51       ` Avi Kivity
2012-01-16 10:11 ` [RFC][PATCH] KVM: perf: a smart tool to analyse kvm events Avi Kivity
2012-01-17  2:30   ` Xiao Guangrong
2012-01-16 22:53 ` David Ahern
2012-01-17  2:41   ` Xiao Guangrong
2012-01-17  4:49     ` David Ahern

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).