* [PATCH 0/9] yet another batch of perf_counter patches
@ 2009-04-08 13:01 Peter Zijlstra
2009-04-08 13:01 ` [PATCH 1/9] perf_counter: fix NMI race in task clock Peter Zijlstra
` (8 more replies)
0 siblings, 9 replies; 24+ messages in thread
From: Peter Zijlstra @ 2009-04-08 13:01 UTC (permalink / raw)
To: Ingo Molnar; +Cc: Paul Mackerras, Corey Ashford, linux-kernel, Peter Zijlstra
- fixes an NMI race in the new context clock.
- puts misc bits in the header
- updates kerneltop
- provides simple userspace profiling tools
- provides PERF_RECORD_ADDR to record the data address that triggered
the event (pagefaults only for now).
^ permalink raw reply [flat|nested] 24+ messages in thread
* [PATCH 1/9] perf_counter: fix NMI race in task clock
2009-04-08 13:01 [PATCH 0/9] yet another batch of perf_counter patches Peter Zijlstra
@ 2009-04-08 13:01 ` Peter Zijlstra
2009-04-08 16:57 ` [tip:perfcounters/core] " Peter Zijlstra
2009-04-08 13:01 ` [PATCH 2/9] perf_counter: provide misc bits in the event header Peter Zijlstra
` (7 subsequent siblings)
8 siblings, 1 reply; 24+ messages in thread
From: Peter Zijlstra @ 2009-04-08 13:01 UTC (permalink / raw)
To: Ingo Molnar; +Cc: Paul Mackerras, Corey Ashford, linux-kernel, Peter Zijlstra
[-- Attachment #1: perf_counter-task-clock-read.patch --]
[-- Type: text/plain, Size: 1923 bytes --]
We should not be updating ctx->time from NMI context, work around that.
Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl>
---
kernel/perf_counter.c | 25 ++++++++++++++++---------
1 file changed, 16 insertions(+), 9 deletions(-)
Index: linux-2.6/kernel/perf_counter.c
===================================================================
--- linux-2.6.orig/kernel/perf_counter.c
+++ linux-2.6/kernel/perf_counter.c
@@ -319,8 +319,6 @@ static void __perf_counter_disable(void
spin_lock_irqsave(&ctx->lock, flags);
- update_context_time(ctx);
-
/*
* If the counter is on, turn it off.
* If it is in error state, leave it in error state.
@@ -2335,13 +2333,11 @@ static const struct hw_perf_counter_ops
* Software counter: task time clock
*/
-static void task_clock_perf_counter_update(struct perf_counter *counter)
+static void task_clock_perf_counter_update(struct perf_counter *counter, u64 now)
{
- u64 prev, now;
+ u64 prev;
s64 delta;
- now = counter->ctx->time;
-
prev = atomic64_xchg(&counter->hw.prev_count, now);
delta = now - prev;
atomic64_add(delta, &counter->count);
@@ -2369,13 +2365,24 @@ static int task_clock_perf_counter_enabl
static void task_clock_perf_counter_disable(struct perf_counter *counter)
{
hrtimer_cancel(&counter->hw.hrtimer);
- task_clock_perf_counter_update(counter);
+ task_clock_perf_counter_update(counter, counter->ctx->time);
+
}
static void task_clock_perf_counter_read(struct perf_counter *counter)
{
- update_context_time(counter->ctx);
- task_clock_perf_counter_update(counter);
+ u64 time;
+
+ if (!in_nmi()) {
+ update_context_time(counter->ctx);
+ time = counter->ctx->time;
+ } else {
+ u64 now = perf_clock();
+ u64 delta = now - counter->ctx->timestamp;
+ time = counter->ctx->time + delta;
+ }
+
+ task_clock_perf_counter_update(counter, time);
}
static const struct hw_perf_counter_ops perf_ops_task_clock = {
--
^ permalink raw reply [flat|nested] 24+ messages in thread
* [PATCH 2/9] perf_counter: provide misc bits in the event header
2009-04-08 13:01 [PATCH 0/9] yet another batch of perf_counter patches Peter Zijlstra
2009-04-08 13:01 ` [PATCH 1/9] perf_counter: fix NMI race in task clock Peter Zijlstra
@ 2009-04-08 13:01 ` Peter Zijlstra
2009-04-08 16:57 ` [tip:perfcounters/core] " Peter Zijlstra
2009-04-08 13:01 ` [PATCH 3/9] perf_counter: use misc field to widen type Peter Zijlstra
` (6 subsequent siblings)
8 siblings, 1 reply; 24+ messages in thread
From: Peter Zijlstra @ 2009-04-08 13:01 UTC (permalink / raw)
To: Ingo Molnar; +Cc: Paul Mackerras, Corey Ashford, linux-kernel, Peter Zijlstra
[-- Attachment #1: perf_counter-header-reserve.patch --]
[-- Type: text/plain, Size: 1503 bytes --]
Limit the size of each record to 64k (or should we count in multiples
of u64 and have a 512K limit?), this gives 16 bits or spare room in the
header, which we can use for misc bits, so as to not have to grow the
record with u64 every time we have a few bits to report.
Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl>
---
include/linux/perf_counter.h | 6 +++++-
kernel/perf_counter.c | 3 +++
2 files changed, 8 insertions(+), 1 deletion(-)
Index: linux-2.6/include/linux/perf_counter.h
===================================================================
--- linux-2.6.orig/include/linux/perf_counter.h
+++ linux-2.6/include/linux/perf_counter.h
@@ -201,9 +201,13 @@ struct perf_counter_mmap_page {
__u32 data_head; /* head in the data section */
};
+#define PERF_EVENT_MISC_KERNEL (1 << 0)
+#define PERF_EVENT_MISC_USER (1 << 1)
+
struct perf_event_header {
__u32 type;
- __u32 size;
+ __u16 misc;
+ __u16 size;
};
enum perf_event_type {
Index: linux-2.6/kernel/perf_counter.c
===================================================================
--- linux-2.6.orig/kernel/perf_counter.c
+++ linux-2.6/kernel/perf_counter.c
@@ -1831,6 +1831,9 @@ static void perf_counter_output(struct p
header.type = PERF_EVENT_COUNTER_OVERFLOW;
header.size = sizeof(header);
+ header.misc = user_mode(regs) ?
+ PERF_EVENT_MISC_USER : PERF_EVENT_MISC_KERNEL;
+
if (record_type & PERF_RECORD_IP) {
ip = instruction_pointer(regs);
header.type |= __PERF_EVENT_IP;
--
^ permalink raw reply [flat|nested] 24+ messages in thread
* [PATCH 3/9] perf_counter: use misc field to widen type
2009-04-08 13:01 [PATCH 0/9] yet another batch of perf_counter patches Peter Zijlstra
2009-04-08 13:01 ` [PATCH 1/9] perf_counter: fix NMI race in task clock Peter Zijlstra
2009-04-08 13:01 ` [PATCH 2/9] perf_counter: provide misc bits in the event header Peter Zijlstra
@ 2009-04-08 13:01 ` Peter Zijlstra
2009-04-08 16:57 ` [tip:perfcounters/core] " Peter Zijlstra
2009-04-08 13:01 ` [PATCH 4/9] perf_counter: kerneltop: keep up with ABI changes Peter Zijlstra
` (5 subsequent siblings)
8 siblings, 1 reply; 24+ messages in thread
From: Peter Zijlstra @ 2009-04-08 13:01 UTC (permalink / raw)
To: Ingo Molnar; +Cc: Paul Mackerras, Corey Ashford, linux-kernel, Peter Zijlstra
[-- Attachment #1: perf_counter-header-overflow.patch --]
[-- Type: text/plain, Size: 3937 bytes --]
Push the PERF_EVENT_COUNTER_OVERFLOW bit into the misc field so that
we can have the full 32bit for PERF_RECORD_ bits.
Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl>
---
include/linux/perf_counter.h | 28 ++++++++++------------------
kernel/perf_counter.c | 15 ++++++++-------
2 files changed, 18 insertions(+), 25 deletions(-)
Index: linux-2.6/include/linux/perf_counter.h
===================================================================
--- linux-2.6.orig/include/linux/perf_counter.h
+++ linux-2.6/include/linux/perf_counter.h
@@ -201,8 +201,9 @@ struct perf_counter_mmap_page {
__u32 data_head; /* head in the data section */
};
-#define PERF_EVENT_MISC_KERNEL (1 << 0)
-#define PERF_EVENT_MISC_USER (1 << 1)
+#define PERF_EVENT_MISC_KERNEL (1 << 0)
+#define PERF_EVENT_MISC_USER (1 << 1)
+#define PERF_EVENT_MISC_OVERFLOW (1 << 2)
struct perf_event_header {
__u32 type;
@@ -230,36 +231,27 @@ enum perf_event_type {
PERF_EVENT_MUNMAP = 2,
/*
- * Half the event type space is reserved for the counter overflow
- * bitfields, as found in hw_event.record_type.
- *
- * These events will have types of the form:
- * PERF_EVENT_COUNTER_OVERFLOW { | __PERF_EVENT_* } *
+ * When header.misc & PERF_EVENT_MISC_OVERFLOW the event_type field
+ * will be PERF_RECORD_*
*
* struct {
* struct perf_event_header header;
*
- * { u64 ip; } && __PERF_EVENT_IP
- * { u32 pid, tid; } && __PERF_EVENT_TID
+ * { u64 ip; } && PERF_RECORD_IP
+ * { u32 pid, tid; } && PERF_RECORD_TID
*
* { u64 nr;
- * { u64 event, val; } cnt[nr]; } && __PERF_EVENT_GROUP
+ * { u64 event, val; } cnt[nr]; } && PERF_RECORD_GROUP
*
* { u16 nr,
* hv,
* kernel,
* user;
- * u64 ips[nr]; } && __PERF_EVENT_CALLCHAIN
+ * u64 ips[nr]; } && PERF_RECORD_CALLCHAIN
*
- * { u64 time; } && __PERF_EVENT_TIME
+ * { u64 time; } && PERF_RECORD_TIME
* };
*/
- PERF_EVENT_COUNTER_OVERFLOW = 1UL << 31,
- __PERF_EVENT_IP = PERF_RECORD_IP,
- __PERF_EVENT_TID = PERF_RECORD_TID,
- __PERF_EVENT_GROUP = PERF_RECORD_GROUP,
- __PERF_EVENT_CALLCHAIN = PERF_RECORD_CALLCHAIN,
- __PERF_EVENT_TIME = PERF_RECORD_TIME,
};
#ifdef __KERNEL__
Index: linux-2.6/kernel/perf_counter.c
===================================================================
--- linux-2.6.orig/kernel/perf_counter.c
+++ linux-2.6/kernel/perf_counter.c
@@ -1828,15 +1828,16 @@ static void perf_counter_output(struct p
int callchain_size = 0;
u64 time;
- header.type = PERF_EVENT_COUNTER_OVERFLOW;
+ header.type = 0;
header.size = sizeof(header);
- header.misc = user_mode(regs) ?
+ header.misc = PERF_EVENT_MISC_OVERFLOW;
+ header.misc |= user_mode(regs) ?
PERF_EVENT_MISC_USER : PERF_EVENT_MISC_KERNEL;
if (record_type & PERF_RECORD_IP) {
ip = instruction_pointer(regs);
- header.type |= __PERF_EVENT_IP;
+ header.type |= PERF_RECORD_IP;
header.size += sizeof(ip);
}
@@ -1845,12 +1846,12 @@ static void perf_counter_output(struct p
tid_entry.pid = current->group_leader->pid;
tid_entry.tid = current->pid;
- header.type |= __PERF_EVENT_TID;
+ header.type |= PERF_RECORD_TID;
header.size += sizeof(tid_entry);
}
if (record_type & PERF_RECORD_GROUP) {
- header.type |= __PERF_EVENT_GROUP;
+ header.type |= PERF_RECORD_GROUP;
header.size += sizeof(u64) +
counter->nr_siblings * sizeof(group_entry);
}
@@ -1861,7 +1862,7 @@ static void perf_counter_output(struct p
if (callchain) {
callchain_size = (1 + callchain->nr) * sizeof(u64);
- header.type |= __PERF_EVENT_CALLCHAIN;
+ header.type |= PERF_RECORD_CALLCHAIN;
header.size += callchain_size;
}
}
@@ -1872,7 +1873,7 @@ static void perf_counter_output(struct p
*/
time = sched_clock();
- header.type |= __PERF_EVENT_TIME;
+ header.type |= PERF_RECORD_TIME;
header.size += sizeof(u64);
}
--
^ permalink raw reply [flat|nested] 24+ messages in thread
* [PATCH 4/9] perf_counter: kerneltop: keep up with ABI changes
2009-04-08 13:01 [PATCH 0/9] yet another batch of perf_counter patches Peter Zijlstra
` (2 preceding siblings ...)
2009-04-08 13:01 ` [PATCH 3/9] perf_counter: use misc field to widen type Peter Zijlstra
@ 2009-04-08 13:01 ` Peter Zijlstra
2009-04-08 16:58 ` [tip:perfcounters/core] " Peter Zijlstra
2009-04-08 13:01 ` [PATCH 5/9] perf_counter: add some comments Peter Zijlstra
` (4 subsequent siblings)
8 siblings, 1 reply; 24+ messages in thread
From: Peter Zijlstra @ 2009-04-08 13:01 UTC (permalink / raw)
To: Ingo Molnar; +Cc: Paul Mackerras, Corey Ashford, linux-kernel, Peter Zijlstra
[-- Attachment #1: kerneltop-new-header.patch --]
[-- Type: text/plain, Size: 1563 bytes --]
Update kerneltop to use PERF_EVENT_MISC_OVERFLOW
Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl>
---
Documentation/perf_counter/kerneltop.c | 32 ++++++++++++++++----------------
1 file changed, 16 insertions(+), 16 deletions(-)
Index: linux-2.6/Documentation/perf_counter/kerneltop.c
===================================================================
--- linux-2.6.orig/Documentation/perf_counter/kerneltop.c
+++ linux-2.6/Documentation/perf_counter/kerneltop.c
@@ -1276,22 +1276,22 @@ static void mmap_read(struct mmap_data *
old += size;
- switch (event->header.type) {
- case PERF_EVENT_COUNTER_OVERFLOW | __PERF_EVENT_IP:
- case PERF_EVENT_COUNTER_OVERFLOW | __PERF_EVENT_IP | __PERF_EVENT_TID:
- process_event(event->ip.ip, md->counter);
- break;
-
- case PERF_EVENT_MMAP:
- case PERF_EVENT_MUNMAP:
- printf("%s: %Lu %Lu %Lu %s\n",
- event->header.type == PERF_EVENT_MMAP
- ? "mmap" : "munmap",
- event->mmap.start,
- event->mmap.len,
- event->mmap.pgoff,
- event->mmap.filename);
- break;
+ if (event->header.misc & PERF_EVENT_MISC_OVERFLOW) {
+ if (event->header.type & PERF_RECORD_IP)
+ process_event(event->ip.ip, md->counter);
+ } else {
+ switch (event->header.type) {
+ case PERF_EVENT_MMAP:
+ case PERF_EVENT_MUNMAP:
+ printf("%s: %Lu %Lu %Lu %s\n",
+ event->header.type == PERF_EVENT_MMAP
+ ? "mmap" : "munmap",
+ event->mmap.start,
+ event->mmap.len,
+ event->mmap.pgoff,
+ event->mmap.filename);
+ break;
+ }
}
}
--
^ permalink raw reply [flat|nested] 24+ messages in thread
* [PATCH 5/9] perf_counter: add some comments
2009-04-08 13:01 [PATCH 0/9] yet another batch of perf_counter patches Peter Zijlstra
` (3 preceding siblings ...)
2009-04-08 13:01 ` [PATCH 4/9] perf_counter: kerneltop: keep up with ABI changes Peter Zijlstra
@ 2009-04-08 13:01 ` Peter Zijlstra
2009-04-08 16:58 ` [tip:perfcounters/core] " Peter Zijlstra
2009-04-08 13:01 ` [PATCH 6/9] perf_counter: track task-comm data Peter Zijlstra
` (3 subsequent siblings)
8 siblings, 1 reply; 24+ messages in thread
From: Peter Zijlstra @ 2009-04-08 13:01 UTC (permalink / raw)
To: Ingo Molnar; +Cc: Paul Mackerras, Corey Ashford, linux-kernel, Peter Zijlstra
[-- Attachment #1: perf_counter-more-comments.patch --]
[-- Type: text/plain, Size: 885 bytes --]
Add a few comments because I was forgetting what field what for what
functionality.
Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl>
---
include/linux/perf_counter.h | 10 ++++++----
1 file changed, 6 insertions(+), 4 deletions(-)
Index: linux-2.6/include/linux/perf_counter.h
===================================================================
--- linux-2.6.orig/include/linux/perf_counter.h
+++ linux-2.6/include/linux/perf_counter.h
@@ -347,10 +347,12 @@ struct file;
struct perf_mmap_data {
struct rcu_head rcu_head;
- int nr_pages;
- atomic_t wakeup;
- atomic_t head;
- atomic_t events;
+ int nr_pages; /* nr of data pages */
+
+ atomic_t wakeup; /* POLL_ for wakeups */
+ atomic_t head; /* write position */
+ atomic_t events; /* event limit */
+
struct perf_counter_mmap_page *user_page;
void *data_pages[0];
};
--
^ permalink raw reply [flat|nested] 24+ messages in thread
* [PATCH 6/9] perf_counter: track task-comm data
2009-04-08 13:01 [PATCH 0/9] yet another batch of perf_counter patches Peter Zijlstra
` (4 preceding siblings ...)
2009-04-08 13:01 ` [PATCH 5/9] perf_counter: add some comments Peter Zijlstra
@ 2009-04-08 13:01 ` Peter Zijlstra
2009-04-08 16:58 ` [tip:perfcounters/core] " Peter Zijlstra
` (2 more replies)
2009-04-08 13:01 ` [PATCH 7/9] perf_counter: some simple userspace profiling Peter Zijlstra
` (2 subsequent siblings)
8 siblings, 3 replies; 24+ messages in thread
From: Peter Zijlstra @ 2009-04-08 13:01 UTC (permalink / raw)
To: Ingo Molnar; +Cc: Paul Mackerras, Corey Ashford, linux-kernel, Peter Zijlstra
[-- Attachment #1: perf_counter-comm.patch --]
[-- Type: text/plain, Size: 4484 bytes --]
Similar to the mmap data stream, add one that tracks the task COMM field,
so that the userspace reporting knows what to call a task.
Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl>
---
fs/exec.c | 1
include/linux/perf_counter.h | 15 ++++++
kernel/perf_counter.c | 93 +++++++++++++++++++++++++++++++++++++++++++
3 files changed, 108 insertions(+), 1 deletion(-)
Index: linux-2.6/fs/exec.c
===================================================================
--- linux-2.6.orig/fs/exec.c
+++ linux-2.6/fs/exec.c
@@ -951,6 +951,7 @@ void set_task_comm(struct task_struct *t
task_lock(tsk);
strlcpy(tsk->comm, buf, sizeof(tsk->comm));
task_unlock(tsk);
+ perf_counter_comm(tsk);
}
int flush_old_exec(struct linux_binprm * bprm)
Index: linux-2.6/include/linux/perf_counter.h
===================================================================
--- linux-2.6.orig/include/linux/perf_counter.h
+++ linux-2.6/include/linux/perf_counter.h
@@ -142,8 +142,9 @@ struct perf_counter_hw_event {
exclude_idle : 1, /* don't count when idle */
mmap : 1, /* include mmap data */
munmap : 1, /* include munmap data */
+ comm : 1, /* include comm data */
- __reserved_1 : 53;
+ __reserved_1 : 52;
__u32 extra_config_len;
__u32 wakeup_events; /* wakeup every n events */
@@ -231,6 +232,16 @@ enum perf_event_type {
PERF_EVENT_MUNMAP = 2,
/*
+ * struct {
+ * struct perf_event_header header;
+ *
+ * u32 pid, tid;
+ * char comm[];
+ * };
+ */
+ PERF_EVENT_COMM = 3,
+
+ /*
* When header.misc & PERF_EVENT_MISC_OVERFLOW the event_type field
* will be PERF_RECORD_*
*
@@ -545,6 +556,8 @@ extern void perf_counter_mmap(unsigned l
extern void perf_counter_munmap(unsigned long addr, unsigned long len,
unsigned long pgoff, struct file *file);
+extern void perf_counter_comm(struct task_struct *tsk);
+
#define MAX_STACK_DEPTH 255
struct perf_callchain_entry {
Index: linux-2.6/kernel/perf_counter.c
===================================================================
--- linux-2.6.orig/kernel/perf_counter.c
+++ linux-2.6/kernel/perf_counter.c
@@ -1917,6 +1917,99 @@ static void perf_counter_output(struct p
}
/*
+ * comm tracking
+ */
+
+struct perf_comm_event {
+ struct task_struct *task;
+ char *comm;
+ int comm_size;
+
+ struct {
+ struct perf_event_header header;
+
+ u32 pid;
+ u32 tid;
+ } event;
+};
+
+static void perf_counter_comm_output(struct perf_counter *counter,
+ struct perf_comm_event *comm_event)
+{
+ struct perf_output_handle handle;
+ int size = comm_event->event.header.size;
+ int ret = perf_output_begin(&handle, counter, size, 0, 0);
+
+ if (ret)
+ return;
+
+ perf_output_put(&handle, comm_event->event);
+ perf_output_copy(&handle, comm_event->comm,
+ comm_event->comm_size);
+ perf_output_end(&handle);
+}
+
+static int perf_counter_comm_match(struct perf_counter *counter,
+ struct perf_comm_event *comm_event)
+{
+ if (counter->hw_event.comm &&
+ comm_event->event.header.type == PERF_EVENT_COMM)
+ return 1;
+
+ return 0;
+}
+
+static void perf_counter_comm_ctx(struct perf_counter_context *ctx,
+ struct perf_comm_event *comm_event)
+{
+ struct perf_counter *counter;
+
+ if (system_state != SYSTEM_RUNNING || list_empty(&ctx->event_list))
+ return;
+
+ rcu_read_lock();
+ list_for_each_entry_rcu(counter, &ctx->event_list, event_entry) {
+ if (perf_counter_comm_match(counter, comm_event))
+ perf_counter_comm_output(counter, comm_event);
+ }
+ rcu_read_unlock();
+}
+
+static void perf_counter_comm_event(struct perf_comm_event *comm_event)
+{
+ struct perf_cpu_context *cpuctx;
+ unsigned int size;
+ char *comm = comm_event->task->comm;
+
+ size = ALIGN(strlen(comm), sizeof(u64));
+
+ comm_event->comm = comm;
+ comm_event->comm_size = size;
+
+ comm_event->event.header.size = sizeof(comm_event->event) + size;
+
+ cpuctx = &get_cpu_var(perf_cpu_context);
+ perf_counter_comm_ctx(&cpuctx->ctx, comm_event);
+ put_cpu_var(perf_cpu_context);
+
+ perf_counter_comm_ctx(¤t->perf_counter_ctx, comm_event);
+}
+
+void perf_counter_comm(struct task_struct *task)
+{
+ struct perf_comm_event comm_event = {
+ .task = task,
+ .event = {
+ .header = { .type = PERF_EVENT_COMM, },
+ .pid = task->group_leader->pid,
+ .tid = task->pid,
+ },
+ };
+
+ perf_counter_comm_event(&comm_event);
+}
+
+/*
* mmap tracking
*/
--
^ permalink raw reply [flat|nested] 24+ messages in thread
* [PATCH 7/9] perf_counter: some simple userspace profiling
2009-04-08 13:01 [PATCH 0/9] yet another batch of perf_counter patches Peter Zijlstra
` (5 preceding siblings ...)
2009-04-08 13:01 ` [PATCH 6/9] perf_counter: track task-comm data Peter Zijlstra
@ 2009-04-08 13:01 ` Peter Zijlstra
2009-04-08 16:58 ` [tip:perfcounters/core] " Peter Zijlstra
2009-04-08 17:09 ` Peter Zijlstra
2009-04-08 13:01 ` [PATCH 8/9] perf_counter: move PERF_RECORD_TIME Peter Zijlstra
2009-04-08 13:01 ` [PATCH 9/9] perf_counter: allow for data addresses to be recorded Peter Zijlstra
8 siblings, 2 replies; 24+ messages in thread
From: Peter Zijlstra @ 2009-04-08 13:01 UTC (permalink / raw)
To: Ingo Molnar
Cc: Paul Mackerras, Corey Ashford, linux-kernel, Peter Zijlstra,
Arnaldo Carvalho de Melo
[-- Attachment #1: perf-tools.patch --]
[-- Type: text/plain, Size: 25276 bytes --]
# perf-record make -j4 kernel/
# perf-report | tail -15
0.39 cc1 [kernel] lock_acquired
0.42 cc1 [kernel] lock_acquire
0.51 cc1 [ user ] /lib64/libc-2.8.90.so: _int_free
0.51 as [kernel] clear_page_c
0.53 cc1 [ user ] /lib64/libc-2.8.90.so: memcpy
0.56 cc1 [ user ] /lib64/libc-2.8.90.so: _IO_vfprintf
0.63 cc1 [kernel] lock_release
0.67 cc1 [ user ] /lib64/libc-2.8.90.so: strlen
0.68 cc1 [kernel] debug_smp_processor_id
1.38 cc1 [ user ] /lib64/libc-2.8.90.so: _int_malloc
1.55 cc1 [ user ] /lib64/libc-2.8.90.so: memset
1.77 cc1 [kernel] __lock_acquire
1.88 cc1 [kernel] clear_page_c
3.61 as [ user ] /usr/bin/as: <unknown>
59.16 cc1 [ user ] /usr/libexec/gcc/x86_64-redhat-linux/4.3.2/cc1: <unknown>
Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl>
CC: Arnaldo Carvalho de Melo <acme@ghostprotocols.net>
---
Documentation/perf_counter/Makefile | 8
Documentation/perf_counter/perf-record.c | 530 ++++++++++++++++++++++++++++++
Documentation/perf_counter/perf-report.cc | 472 ++++++++++++++++++++++++++
3 files changed, 1009 insertions(+), 1 deletion(-)
Index: linux-2.6/Documentation/perf_counter/Makefile
===================================================================
--- linux-2.6.orig/Documentation/perf_counter/Makefile
+++ linux-2.6/Documentation/perf_counter/Makefile
@@ -1,10 +1,16 @@
-BINS = kerneltop perfstat
+BINS = kerneltop perfstat perf-record perf-report
all: $(BINS)
kerneltop: kerneltop.c ../../include/linux/perf_counter.h
cc -O6 -Wall -lrt -o $@ $<
+perf-record: perf-record.c ../../include/linux/perf_counter.h
+ cc -O6 -Wall -lrt -o $@ $<
+
+perf-report: perf-report.cc ../../include/linux/perf_counter.h
+ g++ -O6 -Wall -lrt -o $@ $<
+
perfstat: kerneltop
ln -sf kerneltop perfstat
Index: linux-2.6/Documentation/perf_counter/perf-record.c
===================================================================
--- /dev/null
+++ linux-2.6/Documentation/perf_counter/perf-record.c
@@ -0,0 +1,530 @@
+
+
+#define _GNU_SOURCE
+#include <sys/types.h>
+#include <sys/stat.h>
+#include <sys/time.h>
+#include <unistd.h>
+#include <stdint.h>
+#include <stdlib.h>
+#include <string.h>
+#include <limits.h>
+#include <getopt.h>
+#include <assert.h>
+#include <fcntl.h>
+#include <stdio.h>
+#include <errno.h>
+#include <ctype.h>
+#include <time.h>
+#include <sched.h>
+#include <pthread.h>
+
+#include <sys/syscall.h>
+#include <sys/ioctl.h>
+#include <sys/poll.h>
+#include <sys/prctl.h>
+#include <sys/wait.h>
+#include <sys/uio.h>
+#include <sys/mman.h>
+
+#include <linux/unistd.h>
+#include <linux/types.h>
+
+#include "../../include/linux/perf_counter.h"
+
+
+/*
+ * prctl(PR_TASK_PERF_COUNTERS_DISABLE) will (cheaply) disable all
+ * counters in the current task.
+ */
+#define PR_TASK_PERF_COUNTERS_DISABLE 31
+#define PR_TASK_PERF_COUNTERS_ENABLE 32
+
+#define ARRAY_SIZE(x) (sizeof(x) / sizeof((x)[0]))
+
+#define rdclock() \
+({ \
+ struct timespec ts; \
+ \
+ clock_gettime(CLOCK_MONOTONIC, &ts); \
+ ts.tv_sec * 1000000000ULL + ts.tv_nsec; \
+})
+
+/*
+ * Pick up some kernel type conventions:
+ */
+#define __user
+#define asmlinkage
+
+#ifdef __x86_64__
+#define __NR_perf_counter_open 295
+#define rmb() asm volatile("lfence" ::: "memory")
+#define cpu_relax() asm volatile("rep; nop" ::: "memory");
+#endif
+
+#ifdef __i386__
+#define __NR_perf_counter_open 333
+#define rmb() asm volatile("lfence" ::: "memory")
+#define cpu_relax() asm volatile("rep; nop" ::: "memory");
+#endif
+
+#ifdef __powerpc__
+#define __NR_perf_counter_open 319
+#define rmb() asm volatile ("sync" ::: "memory")
+#define cpu_relax() asm volatile ("" ::: "memory");
+#endif
+
+#define unlikely(x) __builtin_expect(!!(x), 0)
+#define min(x, y) ({ \
+ typeof(x) _min1 = (x); \
+ typeof(y) _min2 = (y); \
+ (void) (&_min1 == &_min2); \
+ _min1 < _min2 ? _min1 : _min2; })
+
+asmlinkage int sys_perf_counter_open(
+ struct perf_counter_hw_event *hw_event_uptr __user,
+ pid_t pid,
+ int cpu,
+ int group_fd,
+ unsigned long flags)
+{
+ return syscall(
+ __NR_perf_counter_open, hw_event_uptr, pid, cpu, group_fd, flags);
+}
+
+#define MAX_COUNTERS 64
+#define MAX_NR_CPUS 256
+
+#define EID(type, id) (((__u64)(type) << PERF_COUNTER_TYPE_SHIFT) | (id))
+
+static int nr_counters = 0;
+static __u64 event_id[MAX_COUNTERS] = { };
+static int default_interval = 100000;
+static int event_count[MAX_COUNTERS];
+static int fd[MAX_NR_CPUS][MAX_COUNTERS];
+static int nr_cpus = 0;
+static unsigned int page_size;
+static unsigned int mmap_pages = 16;
+static int output;
+static char *output_name = "output.perf";
+static int group = 0;
+static unsigned int realtime_prio = 0;
+
+const unsigned int default_count[] = {
+ 1000000,
+ 1000000,
+ 10000,
+ 10000,
+ 1000000,
+ 10000,
+};
+
+static char *hw_event_names[] = {
+ "CPU cycles",
+ "instructions",
+ "cache references",
+ "cache misses",
+ "branches",
+ "branch misses",
+ "bus cycles",
+};
+
+static char *sw_event_names[] = {
+ "cpu clock ticks",
+ "task clock ticks",
+ "pagefaults",
+ "context switches",
+ "CPU migrations",
+ "minor faults",
+ "major faults",
+};
+
+struct event_symbol {
+ __u64 event;
+ char *symbol;
+};
+
+static struct event_symbol event_symbols[] = {
+ {EID(PERF_TYPE_HARDWARE, PERF_COUNT_CPU_CYCLES), "cpu-cycles", },
+ {EID(PERF_TYPE_HARDWARE, PERF_COUNT_CPU_CYCLES), "cycles", },
+ {EID(PERF_TYPE_HARDWARE, PERF_COUNT_INSTRUCTIONS), "instructions", },
+ {EID(PERF_TYPE_HARDWARE, PERF_COUNT_CACHE_REFERENCES), "cache-references", },
+ {EID(PERF_TYPE_HARDWARE, PERF_COUNT_CACHE_MISSES), "cache-misses", },
+ {EID(PERF_TYPE_HARDWARE, PERF_COUNT_BRANCH_INSTRUCTIONS), "branch-instructions", },
+ {EID(PERF_TYPE_HARDWARE, PERF_COUNT_BRANCH_INSTRUCTIONS), "branches", },
+ {EID(PERF_TYPE_HARDWARE, PERF_COUNT_BRANCH_MISSES), "branch-misses", },
+ {EID(PERF_TYPE_HARDWARE, PERF_COUNT_BUS_CYCLES), "bus-cycles", },
+
+ {EID(PERF_TYPE_SOFTWARE, PERF_COUNT_CPU_CLOCK), "cpu-clock", },
+ {EID(PERF_TYPE_SOFTWARE, PERF_COUNT_TASK_CLOCK), "task-clock", },
+ {EID(PERF_TYPE_SOFTWARE, PERF_COUNT_PAGE_FAULTS), "page-faults", },
+ {EID(PERF_TYPE_SOFTWARE, PERF_COUNT_PAGE_FAULTS), "faults", },
+ {EID(PERF_TYPE_SOFTWARE, PERF_COUNT_PAGE_FAULTS_MIN), "minor-faults", },
+ {EID(PERF_TYPE_SOFTWARE, PERF_COUNT_PAGE_FAULTS_MAJ), "major-faults", },
+ {EID(PERF_TYPE_SOFTWARE, PERF_COUNT_CONTEXT_SWITCHES), "context-switches", },
+ {EID(PERF_TYPE_SOFTWARE, PERF_COUNT_CONTEXT_SWITCHES), "cs", },
+ {EID(PERF_TYPE_SOFTWARE, PERF_COUNT_CPU_MIGRATIONS), "cpu-migrations", },
+ {EID(PERF_TYPE_SOFTWARE, PERF_COUNT_CPU_MIGRATIONS), "migrations", },
+};
+
+/*
+ * Each event can have multiple symbolic names.
+ * Symbolic names are (almost) exactly matched.
+ */
+static __u64 match_event_symbols(char *str)
+{
+ __u64 config, id;
+ int type;
+ unsigned int i;
+
+ if (sscanf(str, "r%llx", &config) == 1)
+ return config | PERF_COUNTER_RAW_MASK;
+
+ if (sscanf(str, "%d:%llu", &type, &id) == 2)
+ return EID(type, id);
+
+ for (i = 0; i < ARRAY_SIZE(event_symbols); i++) {
+ if (!strncmp(str, event_symbols[i].symbol,
+ strlen(event_symbols[i].symbol)))
+ return event_symbols[i].event;
+ }
+
+ return ~0ULL;
+}
+
+static int parse_events(char *str)
+{
+ __u64 config;
+
+again:
+ if (nr_counters == MAX_COUNTERS)
+ return -1;
+
+ config = match_event_symbols(str);
+ if (config == ~0ULL)
+ return -1;
+
+ event_id[nr_counters] = config;
+ nr_counters++;
+
+ str = strstr(str, ",");
+ if (str) {
+ str++;
+ goto again;
+ }
+
+ return 0;
+}
+
+#define __PERF_COUNTER_FIELD(config, name) \
+ ((config & PERF_COUNTER_##name##_MASK) >> PERF_COUNTER_##name##_SHIFT)
+
+#define PERF_COUNTER_RAW(config) __PERF_COUNTER_FIELD(config, RAW)
+#define PERF_COUNTER_CONFIG(config) __PERF_COUNTER_FIELD(config, CONFIG)
+#define PERF_COUNTER_TYPE(config) __PERF_COUNTER_FIELD(config, TYPE)
+#define PERF_COUNTER_ID(config) __PERF_COUNTER_FIELD(config, EVENT)
+
+static void display_events_help(void)
+{
+ unsigned int i;
+ __u64 e;
+
+ printf(
+ " -e EVENT --event=EVENT # symbolic-name abbreviations");
+
+ for (i = 0; i < ARRAY_SIZE(event_symbols); i++) {
+ int type, id;
+
+ e = event_symbols[i].event;
+ type = PERF_COUNTER_TYPE(e);
+ id = PERF_COUNTER_ID(e);
+
+ printf("\n %d:%d: %-20s",
+ type, id, event_symbols[i].symbol);
+ }
+
+ printf("\n"
+ " rNNN: raw PMU events (eventsel+umask)\n\n");
+}
+
+static void display_help(void)
+{
+ printf(
+ "Usage: perf-record [<options>]\n"
+ "perf-record Options (up to %d event types can be specified at once):\n\n",
+ MAX_COUNTERS);
+
+ display_events_help();
+
+ printf(
+ " -c CNT --count=CNT # event period to sample\n"
+ " -m pages --mmap_pages=<pages> # number of mmap data pages\n"
+ " -o file --output=<file> # output file\n"
+ " -r prio --realtime=<prio> # use RT prio\n"
+ );
+
+ exit(0);
+}
+
+static void process_options(int argc, char *argv[])
+{
+ int error = 0, counter;
+
+ for (;;) {
+ int option_index = 0;
+ /** Options for getopt */
+ static struct option long_options[] = {
+ {"count", required_argument, NULL, 'c'},
+ {"event", required_argument, NULL, 'e'},
+ {"mmap_pages", required_argument, NULL, 'm'},
+ {"output", required_argument, NULL, 'o'},
+ {"realtime", required_argument, NULL, 'r'},
+ {NULL, 0, NULL, 0 }
+ };
+ int c = getopt_long(argc, argv, "+:c:e:m:o:r:",
+ long_options, &option_index);
+ if (c == -1)
+ break;
+
+ switch (c) {
+ case 'c': default_interval = atoi(optarg); break;
+ case 'e': error = parse_events(optarg); break;
+ case 'm': mmap_pages = atoi(optarg); break;
+ case 'o': output_name = strdup(optarg); break;
+ case 'r': realtime_prio = atoi(optarg); break;
+ default: error = 1; break;
+ }
+ }
+ if (error)
+ display_help();
+
+ if (!nr_counters) {
+ nr_counters = 1;
+ event_id[0] = 0;
+ }
+
+ for (counter = 0; counter < nr_counters; counter++) {
+ if (event_count[counter])
+ continue;
+
+ event_count[counter] = default_interval;
+ }
+}
+
+struct mmap_data {
+ int counter;
+ void *base;
+ unsigned int mask;
+ unsigned int prev;
+};
+
+static unsigned int mmap_read_head(struct mmap_data *md)
+{
+ struct perf_counter_mmap_page *pc = md->base;
+ int head;
+
+ head = pc->data_head;
+ rmb();
+
+ return head;
+}
+
+static long events;
+static struct timeval last_read, this_read;
+
+static void mmap_read(struct mmap_data *md)
+{
+ unsigned int head = mmap_read_head(md);
+ unsigned int old = md->prev;
+ unsigned char *data = md->base + page_size;
+ unsigned long size;
+ void *buf;
+ int diff;
+
+ gettimeofday(&this_read, NULL);
+
+ /*
+ * If we're further behind than half the buffer, there's a chance
+ * the writer will bite our tail and screw up the events under us.
+ *
+ * If we somehow ended up ahead of the head, we got messed up.
+ *
+ * In either case, truncate and restart at head.
+ */
+ diff = head - old;
+ if (diff > md->mask / 2 || diff < 0) {
+ struct timeval iv;
+ unsigned long msecs;
+
+ timersub(&this_read, &last_read, &iv);
+ msecs = iv.tv_sec*1000 + iv.tv_usec/1000;
+
+ fprintf(stderr, "WARNING: failed to keep up with mmap data."
+ " Last read %lu msecs ago.\n", msecs);
+
+ /*
+ * head points to a known good entry, start there.
+ */
+ old = head;
+ }
+
+ last_read = this_read;
+
+ if (old != head)
+ events++;
+
+ size = head - old;
+
+ if ((old & md->mask) + size != (head & md->mask)) {
+ buf = &data[old & md->mask];
+ size = md->mask + 1 - (old & md->mask);
+ old += size;
+ while (size) {
+ int ret = write(output, buf, size);
+ if (ret < 0) {
+ perror("failed to write");
+ exit(-1);
+ }
+ size -= ret;
+ buf += ret;
+ }
+ }
+
+ buf = &data[old & md->mask];
+ size = head - old;
+ old += size;
+ while (size) {
+ int ret = write(output, buf, size);
+ if (ret < 0) {
+ perror("failed to write");
+ exit(-1);
+ }
+ size -= ret;
+ buf += ret;
+ }
+
+ md->prev = old;
+}
+
+static volatile int done = 0;
+
+static void sigchld_handler(int sig)
+{
+ if (sig == SIGCHLD)
+ done = 1;
+}
+
+int main(int argc, char *argv[])
+{
+ struct pollfd event_array[MAX_NR_CPUS * MAX_COUNTERS];
+ struct mmap_data mmap_array[MAX_NR_CPUS][MAX_COUNTERS];
+ struct perf_counter_hw_event hw_event;
+ int i, counter, group_fd, nr_poll = 0;
+ pid_t pid;
+ int ret;
+
+ page_size = sysconf(_SC_PAGE_SIZE);
+
+ process_options(argc, argv);
+
+ nr_cpus = sysconf(_SC_NPROCESSORS_ONLN);
+ assert(nr_cpus <= MAX_NR_CPUS);
+ assert(nr_cpus >= 0);
+
+ output = open(output_name, O_CREAT|O_RDWR, S_IRWXU);
+ if (output < 0) {
+ perror("failed to create output file");
+ exit(-1);
+ }
+
+ argc -= optind;
+ argv += optind;
+
+ for (i = 0; i < nr_cpus; i++) {
+ group_fd = -1;
+ for (counter = 0; counter < nr_counters; counter++) {
+
+ memset(&hw_event, 0, sizeof(hw_event));
+ hw_event.config = event_id[counter];
+ hw_event.irq_period = event_count[counter];
+ hw_event.record_type = PERF_RECORD_IP | PERF_RECORD_TID;
+ hw_event.nmi = 1;
+ hw_event.mmap = 1;
+ hw_event.comm = 1;
+
+ fd[i][counter] = sys_perf_counter_open(&hw_event, -1, i, group_fd, 0);
+ if (fd[i][counter] < 0) {
+ int err = errno;
+ printf("kerneltop error: syscall returned with %d (%s)\n",
+ fd[i][counter], strerror(err));
+ if (err == EPERM)
+ printf("Are you root?\n");
+ exit(-1);
+ }
+ assert(fd[i][counter] >= 0);
+ fcntl(fd[i][counter], F_SETFL, O_NONBLOCK);
+
+ /*
+ * First counter acts as the group leader:
+ */
+ if (group && group_fd == -1)
+ group_fd = fd[i][counter];
+
+ event_array[nr_poll].fd = fd[i][counter];
+ event_array[nr_poll].events = POLLIN;
+ nr_poll++;
+
+ mmap_array[i][counter].counter = counter;
+ mmap_array[i][counter].prev = 0;
+ mmap_array[i][counter].mask = mmap_pages*page_size - 1;
+ mmap_array[i][counter].base = mmap(NULL, (mmap_pages+1)*page_size,
+ PROT_READ, MAP_SHARED, fd[i][counter], 0);
+ if (mmap_array[i][counter].base == MAP_FAILED) {
+ printf("kerneltop error: failed to mmap with %d (%s)\n",
+ errno, strerror(errno));
+ exit(-1);
+ }
+ }
+ }
+
+ signal(SIGCHLD, sigchld_handler);
+
+ pid = fork();
+ if (pid < 0)
+ perror("failed to fork");
+
+ if (!pid) {
+ if (execvp(argv[0], argv)) {
+ perror(argv[0]);
+ exit(-1);
+ }
+ }
+
+ if (realtime_prio) {
+ struct sched_param param;
+
+ param.sched_priority = realtime_prio;
+ if (sched_setscheduler(0, SCHED_FIFO, ¶m)) {
+ printf("Could not set realtime priority.\n");
+ exit(-1);
+ }
+ }
+
+ /*
+ * TODO: store the current /proc/$/maps information somewhere
+ */
+
+ while (!done) {
+ int hits = events;
+
+ for (i = 0; i < nr_cpus; i++) {
+ for (counter = 0; counter < nr_counters; counter++)
+ mmap_read(&mmap_array[i][counter]);
+ }
+
+ if (hits == events)
+ ret = poll(event_array, nr_poll, 100);
+ }
+
+ return 0;
+}
Index: linux-2.6/Documentation/perf_counter/perf-report.cc
===================================================================
--- /dev/null
+++ linux-2.6/Documentation/perf_counter/perf-report.cc
@@ -0,0 +1,472 @@
+#define _GNU_SOURCE
+#include <sys/types.h>
+#include <sys/stat.h>
+#include <sys/time.h>
+#include <unistd.h>
+#include <stdint.h>
+#include <stdlib.h>
+#include <string.h>
+#include <limits.h>
+#include <fcntl.h>
+#include <stdio.h>
+#include <errno.h>
+#include <ctype.h>
+#include <time.h>
+#include <getopt.h>
+
+#include <sys/ioctl.h>
+#include <sys/poll.h>
+#include <sys/prctl.h>
+#include <sys/wait.h>
+#include <sys/mman.h>
+#include <sys/types.h>
+#include <sys/stat.h>
+
+#include <linux/unistd.h>
+#include <linux/types.h>
+
+#include "../../include/linux/perf_counter.h"
+
+#include <set>
+#include <map>
+#include <string>
+
+
+static char const *input_name = "output.perf";
+static int input;
+
+static unsigned long page_size;
+static unsigned long mmap_window = 32;
+
+struct ip_event {
+ struct perf_event_header header;
+ __u64 ip;
+ __u32 pid, tid;
+};
+struct mmap_event {
+ struct perf_event_header header;
+ __u32 pid, tid;
+ __u64 start;
+ __u64 len;
+ __u64 pgoff;
+ char filename[PATH_MAX];
+};
+struct comm_event {
+ struct perf_event_header header;
+ __u32 pid,tid;
+ char comm[16];
+};
+
+typedef union event_union {
+ struct perf_event_header header;
+ struct ip_event ip;
+ struct mmap_event mmap;
+ struct comm_event comm;
+} event_t;
+
+struct section {
+ uint64_t start;
+ uint64_t end;
+
+ uint64_t offset;
+
+ std::string name;
+
+ section() { };
+
+ section(uint64_t stab) : end(stab) { };
+
+ section(uint64_t start, uint64_t size, uint64_t offset, std::string name) :
+ start(start), end(start + size), offset(offset), name(name)
+ { };
+
+ bool operator < (const struct section &s) const {
+ return end < s.end;
+ };
+};
+
+typedef std::set<struct section> sections_t;
+
+struct symbol {
+ uint64_t start;
+ uint64_t end;
+
+ std::string name;
+
+ symbol() { };
+
+ symbol(uint64_t ip) : start(ip) { }
+
+ symbol(uint64_t start, uint64_t len, std::string name) :
+ start(start), end(start + len), name(name)
+ { };
+
+ bool operator < (const struct symbol &s) const {
+ return start < s.start;
+ };
+};
+
+typedef std::set<struct symbol> symbols_t;
+
+struct dso {
+ sections_t sections;
+ symbols_t syms;
+};
+
+static std::map<std::string, struct dso> dsos;
+
+static void load_dso_sections(std::string dso_name)
+{
+ struct dso &dso = dsos[dso_name];
+
+ std::string cmd = "readelf -DSW " + dso_name;
+
+ FILE *file = popen(cmd.c_str(), "r");
+ if (!file) {
+ perror("failed to open pipe");
+ exit(-1);
+ }
+
+ char *line = NULL;
+ size_t n = 0;
+
+ while (!feof(file)) {
+ uint64_t addr, off, size;
+ char name[32];
+
+ if (getline(&line, &n, file) < 0)
+ break;
+ if (!line)
+ break;
+
+ if (sscanf(line, " [%*2d] %16s %*14s %Lx %Lx %Lx",
+ name, &addr, &off, &size) == 4) {
+
+ dso.sections.insert(section(addr, size, addr - off, name));
+ }
+#if 0
+ /*
+ * for reading readelf symbols (-s), however these don't seem
+ * to include nearly everything, so use nm for that.
+ */
+ if (sscanf(line, " %*4d %*3d: %Lx %5Lu %*7s %*6s %*7s %3d %s",
+ &start, &size, §ion, sym) == 4) {
+
+ start -= dso.section_offsets[section];
+
+ dso.syms.insert(symbol(start, size, std::string(sym)));
+ }
+#endif
+ }
+ pclose(file);
+}
+
+static void load_dso_symbols(std::string dso_name, std::string args)
+{
+ struct dso &dso = dsos[dso_name];
+
+ std::string cmd = "nm -nSC " + args + " " + dso_name;
+
+ FILE *file = popen(cmd.c_str(), "r");
+ if (!file) {
+ perror("failed to open pipe");
+ exit(-1);
+ }
+
+ char *line = NULL;
+ size_t n = 0;
+
+ while (!feof(file)) {
+ uint64_t start, size;
+ char c;
+ char sym[1024];
+
+ if (getline(&line, &n, file) < 0)
+ break;
+ if (!line)
+ break;
+
+
+ if (sscanf(line, "%Lx %Lx %c %s", &start, &size, &c, sym) == 4) {
+ sections_t::const_iterator si =
+ dso.sections.upper_bound(section(start));
+ if (si == dso.sections.end()) {
+ printf("symbol in unknown section: %s\n", sym);
+ continue;
+ }
+
+ start -= si->offset;
+
+ dso.syms.insert(symbol(start, size, sym));
+ }
+ }
+ pclose(file);
+}
+
+static void load_dso(std::string dso_name)
+{
+ load_dso_sections(dso_name);
+ load_dso_symbols(dso_name, "-D"); /* dynamic symbols */
+ load_dso_symbols(dso_name, ""); /* regular ones */
+}
+
+void load_kallsyms(void)
+{
+ struct dso &dso = dsos["[kernel]"];
+
+ FILE *file = fopen("/proc/kallsyms", "r");
+ if (!file) {
+ perror("failed to open kallsyms");
+ exit(-1);
+ }
+
+ char *line;
+ size_t n;
+
+ while (!feof(file)) {
+ uint64_t start;
+ char c;
+ char sym[1024];
+
+ if (getline(&line, &n, file) < 0)
+ break;
+ if (!line)
+ break;
+
+ if (sscanf(line, "%Lx %c %s", &start, &c, sym) == 3)
+ dso.syms.insert(symbol(start, 0x1000000, std::string(sym)));
+ }
+ fclose(file);
+}
+
+struct map {
+ uint64_t start;
+ uint64_t end;
+ uint64_t pgoff;
+
+ std::string dso;
+
+ map() { };
+
+ map(uint64_t ip) : end(ip) { }
+
+ map(mmap_event *mmap) {
+ start = mmap->start;
+ end = mmap->start + mmap->len;
+ pgoff = mmap->pgoff;
+
+ dso = std::string(mmap->filename);
+
+ if (dsos.find(dso) == dsos.end())
+ load_dso(dso);
+ };
+
+ bool operator < (const struct map &m) const {
+ return end < m.end;
+ };
+};
+
+typedef std::set<struct map> maps_t;
+
+static std::map<int, maps_t> maps;
+
+static std::map<int, std::string> comms;
+
+static std::map<std::string, int> hist;
+static std::multimap<int, std::string> rev_hist;
+
+static std::string resolve_comm(int pid)
+{
+ std::string comm = "<unknown>";
+ std::map<int, std::string>::const_iterator ci = comms.find(pid);
+ if (ci != comms.end())
+ comm = ci->second;
+
+ return comm;
+}
+
+static std::string resolve_user_symbol(int pid, uint64_t ip)
+{
+ std::string sym = "<unknown>";
+
+ maps_t &m = maps[pid];
+ maps_t::const_iterator mi = m.upper_bound(map(ip));
+ if (mi == m.end())
+ return sym;
+
+ ip -= mi->start + mi->pgoff;
+
+ symbols_t &s = dsos[mi->dso].syms;
+ symbols_t::const_iterator si = s.upper_bound(symbol(ip));
+
+ sym = mi->dso + ": <unknown>";
+
+ if (si == s.begin())
+ return sym;
+ si--;
+
+ if (si->start <= ip && ip < si->end)
+ sym = mi->dso + ": " + si->name;
+#if 0
+ else if (si->start <= ip)
+ sym = mi->dso + ": ?" + si->name;
+#endif
+
+ return sym;
+}
+
+static std::string resolve_kernel_symbol(uint64_t ip)
+{
+ std::string sym = "<unknown>";
+
+ symbols_t &s = dsos["[kernel]"].syms;
+ symbols_t::const_iterator si = s.upper_bound(symbol(ip));
+
+ if (si == s.begin())
+ return sym;
+ si--;
+
+ if (si->start <= ip && ip < si->end)
+ sym = si->name;
+
+ return sym;
+}
+
+static void display_help(void)
+{
+ printf(
+ "Usage: perf-report [<options>]\n"
+ " -i file --input=<file> # input file\n"
+ );
+
+ exit(0);
+}
+
+static void process_options(int argc, char *argv[])
+{
+ int error = 0;
+
+ for (;;) {
+ int option_index = 0;
+ /** Options for getopt */
+ static struct option long_options[] = {
+ {"input", required_argument, NULL, 'i'},
+ {NULL, 0, NULL, 0 }
+ };
+ int c = getopt_long(argc, argv, "+:i:",
+ long_options, &option_index);
+ if (c == -1)
+ break;
+
+ switch (c) {
+ case 'i': input_name = strdup(optarg); break;
+ default: error = 1; break;
+ }
+ }
+
+ if (error)
+ display_help();
+}
+
+int main(int argc, char *argv[])
+{
+ unsigned long offset = 0;
+ unsigned long head = 0;
+ struct stat stat;
+ char *buf;
+ event_t *event;
+ int ret;
+ unsigned long total = 0;
+
+ page_size = getpagesize();
+
+ process_options(argc, argv);
+
+ input = open(input_name, O_RDONLY);
+ if (input < 0) {
+ perror("failed to open file");
+ exit(-1);
+ }
+
+ ret = fstat(input, &stat);
+ if (ret < 0) {
+ perror("failed to stat file");
+ exit(-1);
+ }
+
+ load_kallsyms();
+
+remap:
+ buf = (char *)mmap(NULL, page_size * mmap_window, PROT_READ,
+ MAP_SHARED, input, offset);
+ if (buf == MAP_FAILED) {
+ perror("failed to mmap file");
+ exit(-1);
+ }
+
+more:
+ event = (event_t *)(buf + head);
+
+ if (head + event->header.size >= page_size * mmap_window) {
+ unsigned long shift = page_size * (head / page_size);
+
+ munmap(buf, page_size * mmap_window);
+ offset += shift;
+ head -= shift;
+ goto remap;
+ }
+ head += event->header.size;
+
+ if (event->header.misc & PERF_EVENT_MISC_OVERFLOW) {
+ std::string comm, sym, level;
+ char output[1024];
+
+ if (event->header.misc & PERF_EVENT_MISC_KERNEL) {
+ level = "[kernel]";
+ sym = resolve_kernel_symbol(event->ip.ip);
+ } else if (event->header.misc & PERF_EVENT_MISC_USER) {
+ level = "[ user ]";
+ sym = resolve_user_symbol(event->ip.pid, event->ip.ip);
+ } else {
+ level = "[ hv ]";
+ }
+ comm = resolve_comm(event->ip.pid);
+
+ snprintf(output, sizeof(output), "%16s %s %s",
+ comm.c_str(), level.c_str(), sym.c_str());
+ hist[output]++;
+
+ total++;
+
+ } else switch (event->header.type) {
+ case PERF_EVENT_MMAP:
+ maps[event->mmap.pid].insert(map(&event->mmap));
+ break;
+
+ case PERF_EVENT_COMM:
+ comms[event->comm.pid] = std::string(event->comm.comm);
+ break;
+ }
+
+ if (offset + head < stat.st_size)
+ goto more;
+
+ close(input);
+
+ std::map<std::string, int>::iterator hi = hist.begin();
+
+ while (hi != hist.end()) {
+ rev_hist.insert(std::pair<int, std::string>(hi->second, hi->first));
+ hist.erase(hi++);
+ }
+
+ std::multimap<int, std::string>::const_iterator ri = rev_hist.begin();
+
+ while (ri != rev_hist.end()) {
+ printf(" %5.2f %s\n", (100.0 * ri->first)/total, ri->second.c_str());
+ ri++;
+ }
+
+ return 0;
+}
+
--
^ permalink raw reply [flat|nested] 24+ messages in thread
* [PATCH 8/9] perf_counter: move PERF_RECORD_TIME
2009-04-08 13:01 [PATCH 0/9] yet another batch of perf_counter patches Peter Zijlstra
` (6 preceding siblings ...)
2009-04-08 13:01 ` [PATCH 7/9] perf_counter: some simple userspace profiling Peter Zijlstra
@ 2009-04-08 13:01 ` Peter Zijlstra
2009-04-08 16:58 ` [tip:perfcounters/core] " Peter Zijlstra
2009-04-08 17:09 ` Peter Zijlstra
2009-04-08 13:01 ` [PATCH 9/9] perf_counter: allow for data addresses to be recorded Peter Zijlstra
8 siblings, 2 replies; 24+ messages in thread
From: Peter Zijlstra @ 2009-04-08 13:01 UTC (permalink / raw)
To: Ingo Molnar; +Cc: Paul Mackerras, Corey Ashford, linux-kernel, Peter Zijlstra
[-- Attachment #1: perf_counter-move-time.patch --]
[-- Type: text/plain, Size: 2886 bytes --]
Move PERF_RECORD_TIME so that all the fixed length items vome before
the variable length ones.
Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl>
---
include/linux/perf_counter.h | 9 ++++-----
kernel/perf_counter.c | 26 +++++++++++++-------------
2 files changed, 17 insertions(+), 18 deletions(-)
Index: linux-2.6/include/linux/perf_counter.h
===================================================================
--- linux-2.6.orig/include/linux/perf_counter.h
+++ linux-2.6/include/linux/perf_counter.h
@@ -100,9 +100,9 @@ enum sw_event_ids {
enum perf_counter_record_format {
PERF_RECORD_IP = 1U << 0,
PERF_RECORD_TID = 1U << 1,
- PERF_RECORD_GROUP = 1U << 2,
- PERF_RECORD_CALLCHAIN = 1U << 3,
- PERF_RECORD_TIME = 1U << 4,
+ PERF_RECORD_TIME = 1U << 2,
+ PERF_RECORD_GROUP = 1U << 3,
+ PERF_RECORD_CALLCHAIN = 1U << 4,
};
/*
@@ -250,6 +250,7 @@ enum perf_event_type {
*
* { u64 ip; } && PERF_RECORD_IP
* { u32 pid, tid; } && PERF_RECORD_TID
+ * { u64 time; } && PERF_RECORD_TIME
*
* { u64 nr;
* { u64 event, val; } cnt[nr]; } && PERF_RECORD_GROUP
@@ -259,8 +260,6 @@ enum perf_event_type {
* kernel,
* user;
* u64 ips[nr]; } && PERF_RECORD_CALLCHAIN
- *
- * { u64 time; } && PERF_RECORD_TIME
* };
*/
};
Index: linux-2.6/kernel/perf_counter.c
===================================================================
--- linux-2.6.orig/kernel/perf_counter.c
+++ linux-2.6/kernel/perf_counter.c
@@ -1850,6 +1850,16 @@ static void perf_counter_output(struct p
header.size += sizeof(tid_entry);
}
+ if (record_type & PERF_RECORD_TIME) {
+ /*
+ * Maybe do better on x86 and provide cpu_clock_nmi()
+ */
+ time = sched_clock();
+
+ header.type |= PERF_RECORD_TIME;
+ header.size += sizeof(u64);
+ }
+
if (record_type & PERF_RECORD_GROUP) {
header.type |= PERF_RECORD_GROUP;
header.size += sizeof(u64) +
@@ -1867,16 +1877,6 @@ static void perf_counter_output(struct p
}
}
- if (record_type & PERF_RECORD_TIME) {
- /*
- * Maybe do better on x86 and provide cpu_clock_nmi()
- */
- time = sched_clock();
-
- header.type |= PERF_RECORD_TIME;
- header.size += sizeof(u64);
- }
-
ret = perf_output_begin(&handle, counter, header.size, nmi, 1);
if (ret)
return;
@@ -1889,6 +1889,9 @@ static void perf_counter_output(struct p
if (record_type & PERF_RECORD_TID)
perf_output_put(&handle, tid_entry);
+ if (record_type & PERF_RECORD_TIME)
+ perf_output_put(&handle, time);
+
if (record_type & PERF_RECORD_GROUP) {
struct perf_counter *leader, *sub;
u64 nr = counter->nr_siblings;
@@ -1910,9 +1913,6 @@ static void perf_counter_output(struct p
if (callchain)
perf_output_copy(&handle, callchain, callchain_size);
- if (record_type & PERF_RECORD_TIME)
- perf_output_put(&handle, time);
-
perf_output_end(&handle);
}
--
^ permalink raw reply [flat|nested] 24+ messages in thread
* [PATCH 9/9] perf_counter: allow for data addresses to be recorded
2009-04-08 13:01 [PATCH 0/9] yet another batch of perf_counter patches Peter Zijlstra
` (7 preceding siblings ...)
2009-04-08 13:01 ` [PATCH 8/9] perf_counter: move PERF_RECORD_TIME Peter Zijlstra
@ 2009-04-08 13:01 ` Peter Zijlstra
2009-04-08 16:59 ` [tip:perfcounters/core] " Peter Zijlstra
2009-04-08 17:10 ` Peter Zijlstra
8 siblings, 2 replies; 24+ messages in thread
From: Peter Zijlstra @ 2009-04-08 13:01 UTC (permalink / raw)
To: Ingo Molnar; +Cc: Paul Mackerras, Corey Ashford, linux-kernel, Peter Zijlstra
[-- Attachment #1: perf_counter-data.patch --]
[-- Type: text/plain, Size: 10989 bytes --]
Paul suggested we allow for data addresses to be recorded along with
the traditional IPs as power can provide these.
For now, only the software pagefault events provide data addresses,
but in the future power might as well for some events.
x86 doesn't seem capable of providing this atm.
Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl>
---
arch/powerpc/kernel/perf_counter.c | 2 -
arch/powerpc/mm/fault.c | 8 ++++--
arch/x86/kernel/cpu/perf_counter.c | 2 -
arch/x86/mm/fault.c | 8 ++++--
include/linux/perf_counter.h | 14 ++++++-----
kernel/perf_counter.c | 46 +++++++++++++++++++++++--------------
6 files changed, 49 insertions(+), 31 deletions(-)
Index: linux-2.6/arch/powerpc/kernel/perf_counter.c
===================================================================
--- linux-2.6.orig/arch/powerpc/kernel/perf_counter.c
+++ linux-2.6/arch/powerpc/kernel/perf_counter.c
@@ -732,7 +732,7 @@ static void record_and_restart(struct pe
* Finally record data if requested.
*/
if (record)
- perf_counter_overflow(counter, 1, regs);
+ perf_counter_overflow(counter, 1, regs, 0);
}
/*
Index: linux-2.6/arch/powerpc/mm/fault.c
===================================================================
--- linux-2.6.orig/arch/powerpc/mm/fault.c
+++ linux-2.6/arch/powerpc/mm/fault.c
@@ -171,7 +171,7 @@ int __kprobes do_page_fault(struct pt_re
die("Weird page fault", regs, SIGSEGV);
}
- perf_swcounter_event(PERF_COUNT_PAGE_FAULTS, 1, 0, regs);
+ perf_swcounter_event(PERF_COUNT_PAGE_FAULTS, 1, 0, regs, address);
/* When running in the kernel we expect faults to occur only to
* addresses in user space. All other faults represent errors in the
@@ -312,7 +312,8 @@ good_area:
}
if (ret & VM_FAULT_MAJOR) {
current->maj_flt++;
- perf_swcounter_event(PERF_COUNT_PAGE_FAULTS_MAJ, 1, 0, regs);
+ perf_swcounter_event(PERF_COUNT_PAGE_FAULTS_MAJ, 1, 0,
+ regs, address);
#ifdef CONFIG_PPC_SMLPAR
if (firmware_has_feature(FW_FEATURE_CMO)) {
preempt_disable();
@@ -322,7 +323,8 @@ good_area:
#endif
} else {
current->min_flt++;
- perf_swcounter_event(PERF_COUNT_PAGE_FAULTS_MIN, 1, 0, regs);
+ perf_swcounter_event(PERF_COUNT_PAGE_FAULTS_MIN, 1, 0,
+ regs, address);
}
up_read(&mm->mmap_sem);
return 0;
Index: linux-2.6/arch/x86/mm/fault.c
===================================================================
--- linux-2.6.orig/arch/x86/mm/fault.c
+++ linux-2.6/arch/x86/mm/fault.c
@@ -1033,7 +1033,7 @@ do_page_fault(struct pt_regs *regs, unsi
if (unlikely(error_code & PF_RSVD))
pgtable_bad(regs, error_code, address);
- perf_swcounter_event(PERF_COUNT_PAGE_FAULTS, 1, 0, regs);
+ perf_swcounter_event(PERF_COUNT_PAGE_FAULTS, 1, 0, regs, address);
/*
* If we're in an interrupt, have no user context or are running
@@ -1130,10 +1130,12 @@ good_area:
if (fault & VM_FAULT_MAJOR) {
tsk->maj_flt++;
- perf_swcounter_event(PERF_COUNT_PAGE_FAULTS_MAJ, 1, 0, regs);
+ perf_swcounter_event(PERF_COUNT_PAGE_FAULTS_MAJ, 1, 0,
+ regs, address);
} else {
tsk->min_flt++;
- perf_swcounter_event(PERF_COUNT_PAGE_FAULTS_MIN, 1, 0, regs);
+ perf_swcounter_event(PERF_COUNT_PAGE_FAULTS_MIN, 1, 0,
+ regs, address);
}
check_v8086_mode(regs, address, tsk);
Index: linux-2.6/include/linux/perf_counter.h
===================================================================
--- linux-2.6.orig/include/linux/perf_counter.h
+++ linux-2.6/include/linux/perf_counter.h
@@ -101,8 +101,9 @@ enum perf_counter_record_format {
PERF_RECORD_IP = 1U << 0,
PERF_RECORD_TID = 1U << 1,
PERF_RECORD_TIME = 1U << 2,
- PERF_RECORD_GROUP = 1U << 3,
- PERF_RECORD_CALLCHAIN = 1U << 4,
+ PERF_RECORD_ADDR = 1U << 3,
+ PERF_RECORD_GROUP = 1U << 4,
+ PERF_RECORD_CALLCHAIN = 1U << 5,
};
/*
@@ -251,6 +252,7 @@ enum perf_event_type {
* { u64 ip; } && PERF_RECORD_IP
* { u32 pid, tid; } && PERF_RECORD_TID
* { u64 time; } && PERF_RECORD_TIME
+ * { u64 addr; } && PERF_RECORD_ADDR
*
* { u64 nr;
* { u64 event, val; } cnt[nr]; } && PERF_RECORD_GROUP
@@ -537,7 +539,7 @@ extern int hw_perf_group_sched_in(struct
extern void perf_counter_update_userpage(struct perf_counter *counter);
extern int perf_counter_overflow(struct perf_counter *counter,
- int nmi, struct pt_regs *regs);
+ int nmi, struct pt_regs *regs, u64 addr);
/*
* Return 1 for a software counter, 0 for a hardware counter
*/
@@ -547,7 +549,7 @@ static inline int is_software_counter(st
perf_event_type(&counter->hw_event) != PERF_TYPE_HARDWARE;
}
-extern void perf_swcounter_event(u32, u64, int, struct pt_regs *);
+extern void perf_swcounter_event(u32, u64, int, struct pt_regs *, u64);
extern void perf_counter_mmap(unsigned long addr, unsigned long len,
unsigned long pgoff, struct file *file);
@@ -584,8 +586,8 @@ static inline int perf_counter_task_disa
static inline int perf_counter_task_enable(void) { return -EINVAL; }
static inline void
-perf_swcounter_event(u32 event, u64 nr, int nmi, struct pt_regs *regs) { }
-
+perf_swcounter_event(u32 event, u64 nr, int nmi,
+ struct pt_regs *regs, u64 addr) { }
static inline void
perf_counter_mmap(unsigned long addr, unsigned long len,
Index: linux-2.6/kernel/perf_counter.c
===================================================================
--- linux-2.6.orig/kernel/perf_counter.c
+++ linux-2.6/kernel/perf_counter.c
@@ -800,7 +800,7 @@ void perf_counter_task_sched_out(struct
update_context_time(ctx);
regs = task_pt_regs(task);
- perf_swcounter_event(PERF_COUNT_CONTEXT_SWITCHES, 1, 1, regs);
+ perf_swcounter_event(PERF_COUNT_CONTEXT_SWITCHES, 1, 1, regs, 0);
__perf_counter_sched_out(ctx, cpuctx);
cpuctx->task_ctx = NULL;
@@ -1810,7 +1810,7 @@ static void perf_output_end(struct perf_
}
static void perf_counter_output(struct perf_counter *counter,
- int nmi, struct pt_regs *regs)
+ int nmi, struct pt_regs *regs, u64 addr)
{
int ret;
u64 record_type = counter->hw_event.record_type;
@@ -1860,6 +1860,11 @@ static void perf_counter_output(struct p
header.size += sizeof(u64);
}
+ if (record_type & PERF_RECORD_ADDR) {
+ header.type |= PERF_RECORD_ADDR;
+ header.size += sizeof(u64);
+ }
+
if (record_type & PERF_RECORD_GROUP) {
header.type |= PERF_RECORD_GROUP;
header.size += sizeof(u64) +
@@ -1892,6 +1897,9 @@ static void perf_counter_output(struct p
if (record_type & PERF_RECORD_TIME)
perf_output_put(&handle, time);
+ if (record_type & PERF_RECORD_ADDR)
+ perf_output_put(&handle, addr);
+
if (record_type & PERF_RECORD_GROUP) {
struct perf_counter *leader, *sub;
u64 nr = counter->nr_siblings;
@@ -2158,7 +2166,7 @@ void perf_counter_munmap(unsigned long a
*/
int perf_counter_overflow(struct perf_counter *counter,
- int nmi, struct pt_regs *regs)
+ int nmi, struct pt_regs *regs, u64 addr)
{
int events = atomic_read(&counter->event_limit);
int ret = 0;
@@ -2175,7 +2183,7 @@ int perf_counter_overflow(struct perf_co
perf_counter_disable(counter);
}
- perf_counter_output(counter, nmi, regs);
+ perf_counter_output(counter, nmi, regs, addr);
return ret;
}
@@ -2240,7 +2248,7 @@ static enum hrtimer_restart perf_swcount
regs = task_pt_regs(current);
if (regs) {
- if (perf_counter_overflow(counter, 0, regs))
+ if (perf_counter_overflow(counter, 0, regs, 0))
ret = HRTIMER_NORESTART;
}
@@ -2250,11 +2258,11 @@ static enum hrtimer_restart perf_swcount
}
static void perf_swcounter_overflow(struct perf_counter *counter,
- int nmi, struct pt_regs *regs)
+ int nmi, struct pt_regs *regs, u64 addr)
{
perf_swcounter_update(counter);
perf_swcounter_set_period(counter);
- if (perf_counter_overflow(counter, nmi, regs))
+ if (perf_counter_overflow(counter, nmi, regs, addr))
/* soft-disable the counter */
;
@@ -2286,16 +2294,17 @@ static int perf_swcounter_match(struct p
}
static void perf_swcounter_add(struct perf_counter *counter, u64 nr,
- int nmi, struct pt_regs *regs)
+ int nmi, struct pt_regs *regs, u64 addr)
{
int neg = atomic64_add_negative(nr, &counter->hw.count);
if (counter->hw.irq_period && !neg)
- perf_swcounter_overflow(counter, nmi, regs);
+ perf_swcounter_overflow(counter, nmi, regs, addr);
}
static void perf_swcounter_ctx_event(struct perf_counter_context *ctx,
enum perf_event_types type, u32 event,
- u64 nr, int nmi, struct pt_regs *regs)
+ u64 nr, int nmi, struct pt_regs *regs,
+ u64 addr)
{
struct perf_counter *counter;
@@ -2305,7 +2314,7 @@ static void perf_swcounter_ctx_event(str
rcu_read_lock();
list_for_each_entry_rcu(counter, &ctx->event_list, event_entry) {
if (perf_swcounter_match(counter, type, event, regs))
- perf_swcounter_add(counter, nr, nmi, regs);
+ perf_swcounter_add(counter, nr, nmi, regs, addr);
}
rcu_read_unlock();
}
@@ -2325,7 +2334,8 @@ static int *perf_swcounter_recursion_con
}
static void __perf_swcounter_event(enum perf_event_types type, u32 event,
- u64 nr, int nmi, struct pt_regs *regs)
+ u64 nr, int nmi, struct pt_regs *regs,
+ u64 addr)
{
struct perf_cpu_context *cpuctx = &get_cpu_var(perf_cpu_context);
int *recursion = perf_swcounter_recursion_context(cpuctx);
@@ -2336,10 +2346,11 @@ static void __perf_swcounter_event(enum
(*recursion)++;
barrier();
- perf_swcounter_ctx_event(&cpuctx->ctx, type, event, nr, nmi, regs);
+ perf_swcounter_ctx_event(&cpuctx->ctx, type, event,
+ nr, nmi, regs, addr);
if (cpuctx->task_ctx) {
perf_swcounter_ctx_event(cpuctx->task_ctx, type, event,
- nr, nmi, regs);
+ nr, nmi, regs, addr);
}
barrier();
@@ -2349,9 +2360,10 @@ out:
put_cpu_var(perf_cpu_context);
}
-void perf_swcounter_event(u32 event, u64 nr, int nmi, struct pt_regs *regs)
+void
+perf_swcounter_event(u32 event, u64 nr, int nmi, struct pt_regs *regs, u64 addr)
{
- __perf_swcounter_event(PERF_TYPE_SOFTWARE, event, nr, nmi, regs);
+ __perf_swcounter_event(PERF_TYPE_SOFTWARE, event, nr, nmi, regs, addr);
}
static void perf_swcounter_read(struct perf_counter *counter)
@@ -2548,7 +2560,7 @@ void perf_tpcounter_event(int event_id)
if (!regs)
regs = task_pt_regs(current);
- __perf_swcounter_event(PERF_TYPE_TRACEPOINT, event_id, 1, 1, regs);
+ __perf_swcounter_event(PERF_TYPE_TRACEPOINT, event_id, 1, 1, regs, 0);
}
extern int ftrace_profile_enable(int);
Index: linux-2.6/arch/x86/kernel/cpu/perf_counter.c
===================================================================
--- linux-2.6.orig/arch/x86/kernel/cpu/perf_counter.c
+++ linux-2.6/arch/x86/kernel/cpu/perf_counter.c
@@ -800,7 +800,7 @@ again:
continue;
perf_save_and_restart(counter);
- if (perf_counter_overflow(counter, nmi, regs))
+ if (perf_counter_overflow(counter, nmi, regs, 0))
__pmc_generic_disable(counter, &counter->hw, bit);
}
--
^ permalink raw reply [flat|nested] 24+ messages in thread
* [tip:perfcounters/core] perf_counter: fix NMI race in task clock
2009-04-08 13:01 ` [PATCH 1/9] perf_counter: fix NMI race in task clock Peter Zijlstra
@ 2009-04-08 16:57 ` Peter Zijlstra
0 siblings, 0 replies; 24+ messages in thread
From: Peter Zijlstra @ 2009-04-08 16:57 UTC (permalink / raw)
To: linux-tip-commits
Cc: linux-kernel, paulus, hpa, mingo, a.p.zijlstra, tglx, cjashfor, mingo
Commit-ID: e30e08f65c7ef6c230424264f09c3d53f117f58b
Gitweb: http://git.kernel.org/tip/e30e08f65c7ef6c230424264f09c3d53f117f58b
Author: Peter Zijlstra <a.p.zijlstra@chello.nl>
AuthorDate: Wed, 8 Apr 2009 15:01:25 +0200
Committer: Ingo Molnar <mingo@elte.hu>
CommitDate: Wed, 8 Apr 2009 18:53:27 +0200
perf_counter: fix NMI race in task clock
We should not be updating ctx->time from NMI context, work around that.
Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl>
Cc: Paul Mackerras <paulus@samba.org>
Cc: Corey Ashford <cjashfor@linux.vnet.ibm.com>
LKML-Reference: <20090408130408.681326666@chello.nl>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
---
kernel/perf_counter.c | 25 ++++++++++++++++---------
1 files changed, 16 insertions(+), 9 deletions(-)
diff --git a/kernel/perf_counter.c b/kernel/perf_counter.c
index 863703b..84a3908 100644
--- a/kernel/perf_counter.c
+++ b/kernel/perf_counter.c
@@ -319,8 +319,6 @@ static void __perf_counter_disable(void *info)
spin_lock_irqsave(&ctx->lock, flags);
- update_context_time(ctx);
-
/*
* If the counter is on, turn it off.
* If it is in error state, leave it in error state.
@@ -2335,13 +2333,11 @@ static const struct hw_perf_counter_ops perf_ops_cpu_clock = {
* Software counter: task time clock
*/
-static void task_clock_perf_counter_update(struct perf_counter *counter)
+static void task_clock_perf_counter_update(struct perf_counter *counter, u64 now)
{
- u64 prev, now;
+ u64 prev;
s64 delta;
- now = counter->ctx->time;
-
prev = atomic64_xchg(&counter->hw.prev_count, now);
delta = now - prev;
atomic64_add(delta, &counter->count);
@@ -2369,13 +2365,24 @@ static int task_clock_perf_counter_enable(struct perf_counter *counter)
static void task_clock_perf_counter_disable(struct perf_counter *counter)
{
hrtimer_cancel(&counter->hw.hrtimer);
- task_clock_perf_counter_update(counter);
+ task_clock_perf_counter_update(counter, counter->ctx->time);
+
}
static void task_clock_perf_counter_read(struct perf_counter *counter)
{
- update_context_time(counter->ctx);
- task_clock_perf_counter_update(counter);
+ u64 time;
+
+ if (!in_nmi()) {
+ update_context_time(counter->ctx);
+ time = counter->ctx->time;
+ } else {
+ u64 now = perf_clock();
+ u64 delta = now - counter->ctx->timestamp;
+ time = counter->ctx->time + delta;
+ }
+
+ task_clock_perf_counter_update(counter, time);
}
static const struct hw_perf_counter_ops perf_ops_task_clock = {
^ permalink raw reply related [flat|nested] 24+ messages in thread
* [tip:perfcounters/core] perf_counter: provide misc bits in the event header
2009-04-08 13:01 ` [PATCH 2/9] perf_counter: provide misc bits in the event header Peter Zijlstra
@ 2009-04-08 16:57 ` Peter Zijlstra
0 siblings, 0 replies; 24+ messages in thread
From: Peter Zijlstra @ 2009-04-08 16:57 UTC (permalink / raw)
To: linux-tip-commits
Cc: linux-kernel, paulus, hpa, mingo, a.p.zijlstra, tglx, cjashfor, mingo
Commit-ID: 6fab01927e8bdbbc77bafba2abb4810c5591ad52
Gitweb: http://git.kernel.org/tip/6fab01927e8bdbbc77bafba2abb4810c5591ad52
Author: Peter Zijlstra <a.p.zijlstra@chello.nl>
AuthorDate: Wed, 8 Apr 2009 15:01:26 +0200
Committer: Ingo Molnar <mingo@elte.hu>
CommitDate: Wed, 8 Apr 2009 18:53:27 +0200
perf_counter: provide misc bits in the event header
Limit the size of each record to 64k (or should we count in multiples
of u64 and have a 512K limit?), this gives 16 bits or spare room in the
header, which we can use for misc bits, so as to not have to grow the
record with u64 every time we have a few bits to report.
Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl>
Cc: Paul Mackerras <paulus@samba.org>
Cc: Corey Ashford <cjashfor@linux.vnet.ibm.com>
LKML-Reference: <20090408130408.769271806@chello.nl>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
---
include/linux/perf_counter.h | 6 +++++-
kernel/perf_counter.c | 3 +++
2 files changed, 8 insertions(+), 1 deletions(-)
diff --git a/include/linux/perf_counter.h b/include/linux/perf_counter.h
index 7f5d353..5bd8817 100644
--- a/include/linux/perf_counter.h
+++ b/include/linux/perf_counter.h
@@ -201,9 +201,13 @@ struct perf_counter_mmap_page {
__u32 data_head; /* head in the data section */
};
+#define PERF_EVENT_MISC_KERNEL (1 << 0)
+#define PERF_EVENT_MISC_USER (1 << 1)
+
struct perf_event_header {
__u32 type;
- __u32 size;
+ __u16 misc;
+ __u16 size;
};
enum perf_event_type {
diff --git a/kernel/perf_counter.c b/kernel/perf_counter.c
index 84a3908..4af98f9 100644
--- a/kernel/perf_counter.c
+++ b/kernel/perf_counter.c
@@ -1831,6 +1831,9 @@ static void perf_counter_output(struct perf_counter *counter,
header.type = PERF_EVENT_COUNTER_OVERFLOW;
header.size = sizeof(header);
+ header.misc = user_mode(regs) ?
+ PERF_EVENT_MISC_USER : PERF_EVENT_MISC_KERNEL;
+
if (record_type & PERF_RECORD_IP) {
ip = instruction_pointer(regs);
header.type |= __PERF_EVENT_IP;
^ permalink raw reply related [flat|nested] 24+ messages in thread
* [tip:perfcounters/core] perf_counter: use misc field to widen type
2009-04-08 13:01 ` [PATCH 3/9] perf_counter: use misc field to widen type Peter Zijlstra
@ 2009-04-08 16:57 ` Peter Zijlstra
0 siblings, 0 replies; 24+ messages in thread
From: Peter Zijlstra @ 2009-04-08 16:57 UTC (permalink / raw)
To: linux-tip-commits
Cc: linux-kernel, paulus, hpa, mingo, a.p.zijlstra, tglx, cjashfor, mingo
Commit-ID: 6b6e5486b3a168f0328c82a8d4376caf901472b1
Gitweb: http://git.kernel.org/tip/6b6e5486b3a168f0328c82a8d4376caf901472b1
Author: Peter Zijlstra <a.p.zijlstra@chello.nl>
AuthorDate: Wed, 8 Apr 2009 15:01:27 +0200
Committer: Ingo Molnar <mingo@elte.hu>
CommitDate: Wed, 8 Apr 2009 18:53:28 +0200
perf_counter: use misc field to widen type
Push the PERF_EVENT_COUNTER_OVERFLOW bit into the misc field so that
we can have the full 32bit for PERF_RECORD_ bits.
Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl>
Cc: Paul Mackerras <paulus@samba.org>
Cc: Corey Ashford <cjashfor@linux.vnet.ibm.com>
LKML-Reference: <20090408130408.891867663@chello.nl>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
---
include/linux/perf_counter.h | 28 ++++++++++------------------
kernel/perf_counter.c | 15 ++++++++-------
2 files changed, 18 insertions(+), 25 deletions(-)
diff --git a/include/linux/perf_counter.h b/include/linux/perf_counter.h
index 5bd8817..4809ae1 100644
--- a/include/linux/perf_counter.h
+++ b/include/linux/perf_counter.h
@@ -201,8 +201,9 @@ struct perf_counter_mmap_page {
__u32 data_head; /* head in the data section */
};
-#define PERF_EVENT_MISC_KERNEL (1 << 0)
-#define PERF_EVENT_MISC_USER (1 << 1)
+#define PERF_EVENT_MISC_KERNEL (1 << 0)
+#define PERF_EVENT_MISC_USER (1 << 1)
+#define PERF_EVENT_MISC_OVERFLOW (1 << 2)
struct perf_event_header {
__u32 type;
@@ -230,36 +231,27 @@ enum perf_event_type {
PERF_EVENT_MUNMAP = 2,
/*
- * Half the event type space is reserved for the counter overflow
- * bitfields, as found in hw_event.record_type.
- *
- * These events will have types of the form:
- * PERF_EVENT_COUNTER_OVERFLOW { | __PERF_EVENT_* } *
+ * When header.misc & PERF_EVENT_MISC_OVERFLOW the event_type field
+ * will be PERF_RECORD_*
*
* struct {
* struct perf_event_header header;
*
- * { u64 ip; } && __PERF_EVENT_IP
- * { u32 pid, tid; } && __PERF_EVENT_TID
+ * { u64 ip; } && PERF_RECORD_IP
+ * { u32 pid, tid; } && PERF_RECORD_TID
*
* { u64 nr;
- * { u64 event, val; } cnt[nr]; } && __PERF_EVENT_GROUP
+ * { u64 event, val; } cnt[nr]; } && PERF_RECORD_GROUP
*
* { u16 nr,
* hv,
* kernel,
* user;
- * u64 ips[nr]; } && __PERF_EVENT_CALLCHAIN
+ * u64 ips[nr]; } && PERF_RECORD_CALLCHAIN
*
- * { u64 time; } && __PERF_EVENT_TIME
+ * { u64 time; } && PERF_RECORD_TIME
* };
*/
- PERF_EVENT_COUNTER_OVERFLOW = 1UL << 31,
- __PERF_EVENT_IP = PERF_RECORD_IP,
- __PERF_EVENT_TID = PERF_RECORD_TID,
- __PERF_EVENT_GROUP = PERF_RECORD_GROUP,
- __PERF_EVENT_CALLCHAIN = PERF_RECORD_CALLCHAIN,
- __PERF_EVENT_TIME = PERF_RECORD_TIME,
};
#ifdef __KERNEL__
diff --git a/kernel/perf_counter.c b/kernel/perf_counter.c
index 4af98f9..bf12df6 100644
--- a/kernel/perf_counter.c
+++ b/kernel/perf_counter.c
@@ -1828,15 +1828,16 @@ static void perf_counter_output(struct perf_counter *counter,
int callchain_size = 0;
u64 time;
- header.type = PERF_EVENT_COUNTER_OVERFLOW;
+ header.type = 0;
header.size = sizeof(header);
- header.misc = user_mode(regs) ?
+ header.misc = PERF_EVENT_MISC_OVERFLOW;
+ header.misc |= user_mode(regs) ?
PERF_EVENT_MISC_USER : PERF_EVENT_MISC_KERNEL;
if (record_type & PERF_RECORD_IP) {
ip = instruction_pointer(regs);
- header.type |= __PERF_EVENT_IP;
+ header.type |= PERF_RECORD_IP;
header.size += sizeof(ip);
}
@@ -1845,12 +1846,12 @@ static void perf_counter_output(struct perf_counter *counter,
tid_entry.pid = current->group_leader->pid;
tid_entry.tid = current->pid;
- header.type |= __PERF_EVENT_TID;
+ header.type |= PERF_RECORD_TID;
header.size += sizeof(tid_entry);
}
if (record_type & PERF_RECORD_GROUP) {
- header.type |= __PERF_EVENT_GROUP;
+ header.type |= PERF_RECORD_GROUP;
header.size += sizeof(u64) +
counter->nr_siblings * sizeof(group_entry);
}
@@ -1861,7 +1862,7 @@ static void perf_counter_output(struct perf_counter *counter,
if (callchain) {
callchain_size = (1 + callchain->nr) * sizeof(u64);
- header.type |= __PERF_EVENT_CALLCHAIN;
+ header.type |= PERF_RECORD_CALLCHAIN;
header.size += callchain_size;
}
}
@@ -1872,7 +1873,7 @@ static void perf_counter_output(struct perf_counter *counter,
*/
time = sched_clock();
- header.type |= __PERF_EVENT_TIME;
+ header.type |= PERF_RECORD_TIME;
header.size += sizeof(u64);
}
^ permalink raw reply related [flat|nested] 24+ messages in thread
* [tip:perfcounters/core] perf_counter: kerneltop: keep up with ABI changes
2009-04-08 13:01 ` [PATCH 4/9] perf_counter: kerneltop: keep up with ABI changes Peter Zijlstra
@ 2009-04-08 16:58 ` Peter Zijlstra
0 siblings, 0 replies; 24+ messages in thread
From: Peter Zijlstra @ 2009-04-08 16:58 UTC (permalink / raw)
To: linux-tip-commits
Cc: linux-kernel, paulus, hpa, mingo, a.p.zijlstra, tglx, cjashfor, mingo
Commit-ID: 808382b33bb4c60df6379ec2db39f332cc56b82a
Gitweb: http://git.kernel.org/tip/808382b33bb4c60df6379ec2db39f332cc56b82a
Author: Peter Zijlstra <a.p.zijlstra@chello.nl>
AuthorDate: Wed, 8 Apr 2009 15:01:28 +0200
Committer: Ingo Molnar <mingo@elte.hu>
CommitDate: Wed, 8 Apr 2009 18:53:29 +0200
perf_counter: kerneltop: keep up with ABI changes
Update kerneltop to use PERF_EVENT_MISC_OVERFLOW
Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl>
Cc: Paul Mackerras <paulus@samba.org>
Cc: Corey Ashford <cjashfor@linux.vnet.ibm.com>
LKML-Reference: <20090408130408.947197470@chello.nl>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
---
Documentation/perf_counter/kerneltop.c | 32 ++++++++++++++++----------------
1 files changed, 16 insertions(+), 16 deletions(-)
diff --git a/Documentation/perf_counter/kerneltop.c b/Documentation/perf_counter/kerneltop.c
index 15f3a5f..042c1b8 100644
--- a/Documentation/perf_counter/kerneltop.c
+++ b/Documentation/perf_counter/kerneltop.c
@@ -1277,22 +1277,22 @@ static void mmap_read(struct mmap_data *md)
old += size;
- switch (event->header.type) {
- case PERF_EVENT_COUNTER_OVERFLOW | __PERF_EVENT_IP:
- case PERF_EVENT_COUNTER_OVERFLOW | __PERF_EVENT_IP | __PERF_EVENT_TID:
- process_event(event->ip.ip, md->counter);
- break;
-
- case PERF_EVENT_MMAP:
- case PERF_EVENT_MUNMAP:
- printf("%s: %Lu %Lu %Lu %s\n",
- event->header.type == PERF_EVENT_MMAP
- ? "mmap" : "munmap",
- event->mmap.start,
- event->mmap.len,
- event->mmap.pgoff,
- event->mmap.filename);
- break;
+ if (event->header.misc & PERF_EVENT_MISC_OVERFLOW) {
+ if (event->header.type & PERF_RECORD_IP)
+ process_event(event->ip.ip, md->counter);
+ } else {
+ switch (event->header.type) {
+ case PERF_EVENT_MMAP:
+ case PERF_EVENT_MUNMAP:
+ printf("%s: %Lu %Lu %Lu %s\n",
+ event->header.type == PERF_EVENT_MMAP
+ ? "mmap" : "munmap",
+ event->mmap.start,
+ event->mmap.len,
+ event->mmap.pgoff,
+ event->mmap.filename);
+ break;
+ }
}
}
^ permalink raw reply related [flat|nested] 24+ messages in thread
* [tip:perfcounters/core] perf_counter: add some comments
2009-04-08 13:01 ` [PATCH 5/9] perf_counter: add some comments Peter Zijlstra
@ 2009-04-08 16:58 ` Peter Zijlstra
0 siblings, 0 replies; 24+ messages in thread
From: Peter Zijlstra @ 2009-04-08 16:58 UTC (permalink / raw)
To: linux-tip-commits
Cc: linux-kernel, paulus, hpa, mingo, a.p.zijlstra, tglx, cjashfor, mingo
Commit-ID: 8740f9418c78dcad694b46ab25d1645d5aef1f5e
Gitweb: http://git.kernel.org/tip/8740f9418c78dcad694b46ab25d1645d5aef1f5e
Author: Peter Zijlstra <a.p.zijlstra@chello.nl>
AuthorDate: Wed, 8 Apr 2009 15:01:29 +0200
Committer: Ingo Molnar <mingo@elte.hu>
CommitDate: Wed, 8 Apr 2009 18:53:29 +0200
perf_counter: add some comments
Add a few comments because I was forgetting what field what for what
functionality.
Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl>
Cc: Paul Mackerras <paulus@samba.org>
Cc: Corey Ashford <cjashfor@linux.vnet.ibm.com>
LKML-Reference: <20090408130409.036984214@chello.nl>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
---
include/linux/perf_counter.h | 10 ++++++----
1 files changed, 6 insertions(+), 4 deletions(-)
diff --git a/include/linux/perf_counter.h b/include/linux/perf_counter.h
index 4809ae1..8bf764f 100644
--- a/include/linux/perf_counter.h
+++ b/include/linux/perf_counter.h
@@ -344,10 +344,12 @@ struct file;
struct perf_mmap_data {
struct rcu_head rcu_head;
- int nr_pages;
- atomic_t wakeup;
- atomic_t head;
- atomic_t events;
+ int nr_pages; /* nr of data pages */
+
+ atomic_t wakeup; /* POLL_ for wakeups */
+ atomic_t head; /* write position */
+ atomic_t events; /* event limit */
+
struct perf_counter_mmap_page *user_page;
void *data_pages[0];
};
^ permalink raw reply related [flat|nested] 24+ messages in thread
* [tip:perfcounters/core] perf_counter: track task-comm data
2009-04-08 13:01 ` [PATCH 6/9] perf_counter: track task-comm data Peter Zijlstra
@ 2009-04-08 16:58 ` Peter Zijlstra
2009-04-08 17:03 ` [PATCH 6.5/9] perf_counter: fix " Peter Zijlstra
2009-04-08 17:09 ` [tip:perfcounters/core] perf_counter: " Peter Zijlstra
2 siblings, 0 replies; 24+ messages in thread
From: Peter Zijlstra @ 2009-04-08 16:58 UTC (permalink / raw)
To: linux-tip-commits
Cc: linux-kernel, paulus, hpa, mingo, a.p.zijlstra, tglx, cjashfor, mingo
Commit-ID: e278f3cdbd9cfb9c9e7e0d0b3a2844ceed84197c
Gitweb: http://git.kernel.org/tip/e278f3cdbd9cfb9c9e7e0d0b3a2844ceed84197c
Author: Peter Zijlstra <a.p.zijlstra@chello.nl>
AuthorDate: Wed, 8 Apr 2009 15:01:30 +0200
Committer: Ingo Molnar <mingo@elte.hu>
CommitDate: Wed, 8 Apr 2009 18:53:30 +0200
perf_counter: track task-comm data
Similar to the mmap data stream, add one that tracks the task COMM field,
so that the userspace reporting knows what to call a task.
Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl>
Cc: Paul Mackerras <paulus@samba.org>
Cc: Corey Ashford <cjashfor@linux.vnet.ibm.com>
LKML-Reference: <20090408130409.127422406@chello.nl>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
---
fs/exec.c | 1 +
include/linux/perf_counter.h | 15 ++++++-
kernel/perf_counter.c | 93 ++++++++++++++++++++++++++++++++++++++++++
3 files changed, 108 insertions(+), 1 deletions(-)
diff --git a/fs/exec.c b/fs/exec.c
index e015c0b..bf47ed0 100644
--- a/fs/exec.c
+++ b/fs/exec.c
@@ -951,6 +951,7 @@ void set_task_comm(struct task_struct *tsk, char *buf)
task_lock(tsk);
strlcpy(tsk->comm, buf, sizeof(tsk->comm));
task_unlock(tsk);
+ perf_counter_comm(tsk);
}
int flush_old_exec(struct linux_binprm * bprm)
diff --git a/include/linux/perf_counter.h b/include/linux/perf_counter.h
index 8bf764f..fed9216 100644
--- a/include/linux/perf_counter.h
+++ b/include/linux/perf_counter.h
@@ -142,8 +142,9 @@ struct perf_counter_hw_event {
exclude_idle : 1, /* don't count when idle */
mmap : 1, /* include mmap data */
munmap : 1, /* include munmap data */
+ comm : 1, /* include comm data */
- __reserved_1 : 53;
+ __reserved_1 : 52;
__u32 extra_config_len;
__u32 wakeup_events; /* wakeup every n events */
@@ -231,6 +232,16 @@ enum perf_event_type {
PERF_EVENT_MUNMAP = 2,
/*
+ * struct {
+ * struct perf_event_header header;
+ *
+ * u32 pid, tid;
+ * char comm[];
+ * };
+ */
+ PERF_EVENT_COMM = 3,
+
+ /*
* When header.misc & PERF_EVENT_MISC_OVERFLOW the event_type field
* will be PERF_RECORD_*
*
@@ -545,6 +556,8 @@ extern void perf_counter_mmap(unsigned long addr, unsigned long len,
extern void perf_counter_munmap(unsigned long addr, unsigned long len,
unsigned long pgoff, struct file *file);
+extern void perf_counter_comm(struct task_struct *tsk);
+
#define MAX_STACK_DEPTH 255
struct perf_callchain_entry {
diff --git a/kernel/perf_counter.c b/kernel/perf_counter.c
index bf12df6..2d4aebb 100644
--- a/kernel/perf_counter.c
+++ b/kernel/perf_counter.c
@@ -1917,6 +1917,99 @@ static void perf_counter_output(struct perf_counter *counter,
}
/*
+ * comm tracking
+ */
+
+struct perf_comm_event {
+ struct task_struct *task;
+ char *comm;
+ int comm_size;
+
+ struct {
+ struct perf_event_header header;
+
+ u32 pid;
+ u32 tid;
+ } event;
+};
+
+static void perf_counter_comm_output(struct perf_counter *counter,
+ struct perf_comm_event *comm_event)
+{
+ struct perf_output_handle handle;
+ int size = comm_event->event.header.size;
+ int ret = perf_output_begin(&handle, counter, size, 0, 0);
+
+ if (ret)
+ return;
+
+ perf_output_put(&handle, comm_event->event);
+ perf_output_copy(&handle, comm_event->comm,
+ comm_event->comm_size);
+ perf_output_end(&handle);
+}
+
+static int perf_counter_comm_match(struct perf_counter *counter,
+ struct perf_comm_event *comm_event)
+{
+ if (counter->hw_event.comm &&
+ comm_event->event.header.type == PERF_EVENT_COMM)
+ return 1;
+
+ return 0;
+}
+
+static void perf_counter_comm_ctx(struct perf_counter_context *ctx,
+ struct perf_comm_event *comm_event)
+{
+ struct perf_counter *counter;
+
+ if (system_state != SYSTEM_RUNNING || list_empty(&ctx->event_list))
+ return;
+
+ rcu_read_lock();
+ list_for_each_entry_rcu(counter, &ctx->event_list, event_entry) {
+ if (perf_counter_comm_match(counter, comm_event))
+ perf_counter_comm_output(counter, comm_event);
+ }
+ rcu_read_unlock();
+}
+
+static void perf_counter_comm_event(struct perf_comm_event *comm_event)
+{
+ struct perf_cpu_context *cpuctx;
+ unsigned int size;
+ char *comm = comm_event->task->comm;
+
+ size = ALIGN(strlen(comm), sizeof(u64));
+
+ comm_event->comm = comm;
+ comm_event->comm_size = size;
+
+ comm_event->event.header.size = sizeof(comm_event->event) + size;
+
+ cpuctx = &get_cpu_var(perf_cpu_context);
+ perf_counter_comm_ctx(&cpuctx->ctx, comm_event);
+ put_cpu_var(perf_cpu_context);
+
+ perf_counter_comm_ctx(¤t->perf_counter_ctx, comm_event);
+}
+
+void perf_counter_comm(struct task_struct *task)
+{
+ struct perf_comm_event comm_event = {
+ .task = task,
+ .event = {
+ .header = { .type = PERF_EVENT_COMM, },
+ .pid = task->group_leader->pid,
+ .tid = task->pid,
+ },
+ };
+
+ perf_counter_comm_event(&comm_event);
+}
+
+/*
* mmap tracking
*/
^ permalink raw reply related [flat|nested] 24+ messages in thread
* [tip:perfcounters/core] perf_counter: some simple userspace profiling
2009-04-08 13:01 ` [PATCH 7/9] perf_counter: some simple userspace profiling Peter Zijlstra
@ 2009-04-08 16:58 ` Peter Zijlstra
2009-04-08 17:09 ` Peter Zijlstra
1 sibling, 0 replies; 24+ messages in thread
From: Peter Zijlstra @ 2009-04-08 16:58 UTC (permalink / raw)
To: linux-tip-commits
Cc: linux-kernel, paulus, hpa, mingo, a.p.zijlstra, acme, tglx,
cjashfor, mingo
Commit-ID: 513162537b73d972206a3974594522c86b8a9238
Gitweb: http://git.kernel.org/tip/513162537b73d972206a3974594522c86b8a9238
Author: Peter Zijlstra <a.p.zijlstra@chello.nl>
AuthorDate: Wed, 8 Apr 2009 15:01:31 +0200
Committer: Ingo Molnar <mingo@elte.hu>
CommitDate: Wed, 8 Apr 2009 18:53:30 +0200
perf_counter: some simple userspace profiling
# perf-record make -j4 kernel/
# perf-report | tail -15
0.39 cc1 [kernel] lock_acquired
0.42 cc1 [kernel] lock_acquire
0.51 cc1 [ user ] /lib64/libc-2.8.90.so: _int_free
0.51 as [kernel] clear_page_c
0.53 cc1 [ user ] /lib64/libc-2.8.90.so: memcpy
0.56 cc1 [ user ] /lib64/libc-2.8.90.so: _IO_vfprintf
0.63 cc1 [kernel] lock_release
0.67 cc1 [ user ] /lib64/libc-2.8.90.so: strlen
0.68 cc1 [kernel] debug_smp_processor_id
1.38 cc1 [ user ] /lib64/libc-2.8.90.so: _int_malloc
1.55 cc1 [ user ] /lib64/libc-2.8.90.so: memset
1.77 cc1 [kernel] __lock_acquire
1.88 cc1 [kernel] clear_page_c
3.61 as [ user ] /usr/bin/as: <unknown>
59.16 cc1 [ user ] /usr/libexec/gcc/x86_64-redhat-linux/4.3.2/cc1: <unknown>
Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl>
Cc: Paul Mackerras <paulus@samba.org>
Cc: Corey Ashford <cjashfor@linux.vnet.ibm.com>
Cc: Arnaldo Carvalho de Melo <acme@ghostprotocols.net>
LKML-Reference: <20090408130409.220518450@chello.nl>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
---
Documentation/perf_counter/Makefile | 8 +-
Documentation/perf_counter/perf-record.c | 530 +++++++++++++++++++++++++++++
Documentation/perf_counter/perf-report.cc | 472 +++++++++++++++++++++++++
3 files changed, 1009 insertions(+), 1 deletions(-)
diff --git a/Documentation/perf_counter/Makefile b/Documentation/perf_counter/Makefile
index 194b662..1dd37ee 100644
--- a/Documentation/perf_counter/Makefile
+++ b/Documentation/perf_counter/Makefile
@@ -1,10 +1,16 @@
-BINS = kerneltop perfstat
+BINS = kerneltop perfstat perf-record perf-report
all: $(BINS)
kerneltop: kerneltop.c ../../include/linux/perf_counter.h
cc -O6 -Wall -lrt -o $@ $<
+perf-record: perf-record.c ../../include/linux/perf_counter.h
+ cc -O6 -Wall -lrt -o $@ $<
+
+perf-report: perf-report.cc ../../include/linux/perf_counter.h
+ g++ -O6 -Wall -lrt -o $@ $<
+
perfstat: kerneltop
ln -sf kerneltop perfstat
diff --git a/Documentation/perf_counter/perf-record.c b/Documentation/perf_counter/perf-record.c
new file mode 100644
index 0000000..614de7c
--- /dev/null
+++ b/Documentation/perf_counter/perf-record.c
@@ -0,0 +1,530 @@
+
+
+#define _GNU_SOURCE
+#include <sys/types.h>
+#include <sys/stat.h>
+#include <sys/time.h>
+#include <unistd.h>
+#include <stdint.h>
+#include <stdlib.h>
+#include <string.h>
+#include <limits.h>
+#include <getopt.h>
+#include <assert.h>
+#include <fcntl.h>
+#include <stdio.h>
+#include <errno.h>
+#include <ctype.h>
+#include <time.h>
+#include <sched.h>
+#include <pthread.h>
+
+#include <sys/syscall.h>
+#include <sys/ioctl.h>
+#include <sys/poll.h>
+#include <sys/prctl.h>
+#include <sys/wait.h>
+#include <sys/uio.h>
+#include <sys/mman.h>
+
+#include <linux/unistd.h>
+#include <linux/types.h>
+
+#include "../../include/linux/perf_counter.h"
+
+
+/*
+ * prctl(PR_TASK_PERF_COUNTERS_DISABLE) will (cheaply) disable all
+ * counters in the current task.
+ */
+#define PR_TASK_PERF_COUNTERS_DISABLE 31
+#define PR_TASK_PERF_COUNTERS_ENABLE 32
+
+#define ARRAY_SIZE(x) (sizeof(x) / sizeof((x)[0]))
+
+#define rdclock() \
+({ \
+ struct timespec ts; \
+ \
+ clock_gettime(CLOCK_MONOTONIC, &ts); \
+ ts.tv_sec * 1000000000ULL + ts.tv_nsec; \
+})
+
+/*
+ * Pick up some kernel type conventions:
+ */
+#define __user
+#define asmlinkage
+
+#ifdef __x86_64__
+#define __NR_perf_counter_open 295
+#define rmb() asm volatile("lfence" ::: "memory")
+#define cpu_relax() asm volatile("rep; nop" ::: "memory");
+#endif
+
+#ifdef __i386__
+#define __NR_perf_counter_open 333
+#define rmb() asm volatile("lfence" ::: "memory")
+#define cpu_relax() asm volatile("rep; nop" ::: "memory");
+#endif
+
+#ifdef __powerpc__
+#define __NR_perf_counter_open 319
+#define rmb() asm volatile ("sync" ::: "memory")
+#define cpu_relax() asm volatile ("" ::: "memory");
+#endif
+
+#define unlikely(x) __builtin_expect(!!(x), 0)
+#define min(x, y) ({ \
+ typeof(x) _min1 = (x); \
+ typeof(y) _min2 = (y); \
+ (void) (&_min1 == &_min2); \
+ _min1 < _min2 ? _min1 : _min2; })
+
+asmlinkage int sys_perf_counter_open(
+ struct perf_counter_hw_event *hw_event_uptr __user,
+ pid_t pid,
+ int cpu,
+ int group_fd,
+ unsigned long flags)
+{
+ return syscall(
+ __NR_perf_counter_open, hw_event_uptr, pid, cpu, group_fd, flags);
+}
+
+#define MAX_COUNTERS 64
+#define MAX_NR_CPUS 256
+
+#define EID(type, id) (((__u64)(type) << PERF_COUNTER_TYPE_SHIFT) | (id))
+
+static int nr_counters = 0;
+static __u64 event_id[MAX_COUNTERS] = { };
+static int default_interval = 100000;
+static int event_count[MAX_COUNTERS];
+static int fd[MAX_NR_CPUS][MAX_COUNTERS];
+static int nr_cpus = 0;
+static unsigned int page_size;
+static unsigned int mmap_pages = 16;
+static int output;
+static char *output_name = "output.perf";
+static int group = 0;
+static unsigned int realtime_prio = 0;
+
+const unsigned int default_count[] = {
+ 1000000,
+ 1000000,
+ 10000,
+ 10000,
+ 1000000,
+ 10000,
+};
+
+static char *hw_event_names[] = {
+ "CPU cycles",
+ "instructions",
+ "cache references",
+ "cache misses",
+ "branches",
+ "branch misses",
+ "bus cycles",
+};
+
+static char *sw_event_names[] = {
+ "cpu clock ticks",
+ "task clock ticks",
+ "pagefaults",
+ "context switches",
+ "CPU migrations",
+ "minor faults",
+ "major faults",
+};
+
+struct event_symbol {
+ __u64 event;
+ char *symbol;
+};
+
+static struct event_symbol event_symbols[] = {
+ {EID(PERF_TYPE_HARDWARE, PERF_COUNT_CPU_CYCLES), "cpu-cycles", },
+ {EID(PERF_TYPE_HARDWARE, PERF_COUNT_CPU_CYCLES), "cycles", },
+ {EID(PERF_TYPE_HARDWARE, PERF_COUNT_INSTRUCTIONS), "instructions", },
+ {EID(PERF_TYPE_HARDWARE, PERF_COUNT_CACHE_REFERENCES), "cache-references", },
+ {EID(PERF_TYPE_HARDWARE, PERF_COUNT_CACHE_MISSES), "cache-misses", },
+ {EID(PERF_TYPE_HARDWARE, PERF_COUNT_BRANCH_INSTRUCTIONS), "branch-instructions", },
+ {EID(PERF_TYPE_HARDWARE, PERF_COUNT_BRANCH_INSTRUCTIONS), "branches", },
+ {EID(PERF_TYPE_HARDWARE, PERF_COUNT_BRANCH_MISSES), "branch-misses", },
+ {EID(PERF_TYPE_HARDWARE, PERF_COUNT_BUS_CYCLES), "bus-cycles", },
+
+ {EID(PERF_TYPE_SOFTWARE, PERF_COUNT_CPU_CLOCK), "cpu-clock", },
+ {EID(PERF_TYPE_SOFTWARE, PERF_COUNT_TASK_CLOCK), "task-clock", },
+ {EID(PERF_TYPE_SOFTWARE, PERF_COUNT_PAGE_FAULTS), "page-faults", },
+ {EID(PERF_TYPE_SOFTWARE, PERF_COUNT_PAGE_FAULTS), "faults", },
+ {EID(PERF_TYPE_SOFTWARE, PERF_COUNT_PAGE_FAULTS_MIN), "minor-faults", },
+ {EID(PERF_TYPE_SOFTWARE, PERF_COUNT_PAGE_FAULTS_MAJ), "major-faults", },
+ {EID(PERF_TYPE_SOFTWARE, PERF_COUNT_CONTEXT_SWITCHES), "context-switches", },
+ {EID(PERF_TYPE_SOFTWARE, PERF_COUNT_CONTEXT_SWITCHES), "cs", },
+ {EID(PERF_TYPE_SOFTWARE, PERF_COUNT_CPU_MIGRATIONS), "cpu-migrations", },
+ {EID(PERF_TYPE_SOFTWARE, PERF_COUNT_CPU_MIGRATIONS), "migrations", },
+};
+
+/*
+ * Each event can have multiple symbolic names.
+ * Symbolic names are (almost) exactly matched.
+ */
+static __u64 match_event_symbols(char *str)
+{
+ __u64 config, id;
+ int type;
+ unsigned int i;
+
+ if (sscanf(str, "r%llx", &config) == 1)
+ return config | PERF_COUNTER_RAW_MASK;
+
+ if (sscanf(str, "%d:%llu", &type, &id) == 2)
+ return EID(type, id);
+
+ for (i = 0; i < ARRAY_SIZE(event_symbols); i++) {
+ if (!strncmp(str, event_symbols[i].symbol,
+ strlen(event_symbols[i].symbol)))
+ return event_symbols[i].event;
+ }
+
+ return ~0ULL;
+}
+
+static int parse_events(char *str)
+{
+ __u64 config;
+
+again:
+ if (nr_counters == MAX_COUNTERS)
+ return -1;
+
+ config = match_event_symbols(str);
+ if (config == ~0ULL)
+ return -1;
+
+ event_id[nr_counters] = config;
+ nr_counters++;
+
+ str = strstr(str, ",");
+ if (str) {
+ str++;
+ goto again;
+ }
+
+ return 0;
+}
+
+#define __PERF_COUNTER_FIELD(config, name) \
+ ((config & PERF_COUNTER_##name##_MASK) >> PERF_COUNTER_##name##_SHIFT)
+
+#define PERF_COUNTER_RAW(config) __PERF_COUNTER_FIELD(config, RAW)
+#define PERF_COUNTER_CONFIG(config) __PERF_COUNTER_FIELD(config, CONFIG)
+#define PERF_COUNTER_TYPE(config) __PERF_COUNTER_FIELD(config, TYPE)
+#define PERF_COUNTER_ID(config) __PERF_COUNTER_FIELD(config, EVENT)
+
+static void display_events_help(void)
+{
+ unsigned int i;
+ __u64 e;
+
+ printf(
+ " -e EVENT --event=EVENT # symbolic-name abbreviations");
+
+ for (i = 0; i < ARRAY_SIZE(event_symbols); i++) {
+ int type, id;
+
+ e = event_symbols[i].event;
+ type = PERF_COUNTER_TYPE(e);
+ id = PERF_COUNTER_ID(e);
+
+ printf("\n %d:%d: %-20s",
+ type, id, event_symbols[i].symbol);
+ }
+
+ printf("\n"
+ " rNNN: raw PMU events (eventsel+umask)\n\n");
+}
+
+static void display_help(void)
+{
+ printf(
+ "Usage: perf-record [<options>]\n"
+ "perf-record Options (up to %d event types can be specified at once):\n\n",
+ MAX_COUNTERS);
+
+ display_events_help();
+
+ printf(
+ " -c CNT --count=CNT # event period to sample\n"
+ " -m pages --mmap_pages=<pages> # number of mmap data pages\n"
+ " -o file --output=<file> # output file\n"
+ " -r prio --realtime=<prio> # use RT prio\n"
+ );
+
+ exit(0);
+}
+
+static void process_options(int argc, char *argv[])
+{
+ int error = 0, counter;
+
+ for (;;) {
+ int option_index = 0;
+ /** Options for getopt */
+ static struct option long_options[] = {
+ {"count", required_argument, NULL, 'c'},
+ {"event", required_argument, NULL, 'e'},
+ {"mmap_pages", required_argument, NULL, 'm'},
+ {"output", required_argument, NULL, 'o'},
+ {"realtime", required_argument, NULL, 'r'},
+ {NULL, 0, NULL, 0 }
+ };
+ int c = getopt_long(argc, argv, "+:c:e:m:o:r:",
+ long_options, &option_index);
+ if (c == -1)
+ break;
+
+ switch (c) {
+ case 'c': default_interval = atoi(optarg); break;
+ case 'e': error = parse_events(optarg); break;
+ case 'm': mmap_pages = atoi(optarg); break;
+ case 'o': output_name = strdup(optarg); break;
+ case 'r': realtime_prio = atoi(optarg); break;
+ default: error = 1; break;
+ }
+ }
+ if (error)
+ display_help();
+
+ if (!nr_counters) {
+ nr_counters = 1;
+ event_id[0] = 0;
+ }
+
+ for (counter = 0; counter < nr_counters; counter++) {
+ if (event_count[counter])
+ continue;
+
+ event_count[counter] = default_interval;
+ }
+}
+
+struct mmap_data {
+ int counter;
+ void *base;
+ unsigned int mask;
+ unsigned int prev;
+};
+
+static unsigned int mmap_read_head(struct mmap_data *md)
+{
+ struct perf_counter_mmap_page *pc = md->base;
+ int head;
+
+ head = pc->data_head;
+ rmb();
+
+ return head;
+}
+
+static long events;
+static struct timeval last_read, this_read;
+
+static void mmap_read(struct mmap_data *md)
+{
+ unsigned int head = mmap_read_head(md);
+ unsigned int old = md->prev;
+ unsigned char *data = md->base + page_size;
+ unsigned long size;
+ void *buf;
+ int diff;
+
+ gettimeofday(&this_read, NULL);
+
+ /*
+ * If we're further behind than half the buffer, there's a chance
+ * the writer will bite our tail and screw up the events under us.
+ *
+ * If we somehow ended up ahead of the head, we got messed up.
+ *
+ * In either case, truncate and restart at head.
+ */
+ diff = head - old;
+ if (diff > md->mask / 2 || diff < 0) {
+ struct timeval iv;
+ unsigned long msecs;
+
+ timersub(&this_read, &last_read, &iv);
+ msecs = iv.tv_sec*1000 + iv.tv_usec/1000;
+
+ fprintf(stderr, "WARNING: failed to keep up with mmap data."
+ " Last read %lu msecs ago.\n", msecs);
+
+ /*
+ * head points to a known good entry, start there.
+ */
+ old = head;
+ }
+
+ last_read = this_read;
+
+ if (old != head)
+ events++;
+
+ size = head - old;
+
+ if ((old & md->mask) + size != (head & md->mask)) {
+ buf = &data[old & md->mask];
+ size = md->mask + 1 - (old & md->mask);
+ old += size;
+ while (size) {
+ int ret = write(output, buf, size);
+ if (ret < 0) {
+ perror("failed to write");
+ exit(-1);
+ }
+ size -= ret;
+ buf += ret;
+ }
+ }
+
+ buf = &data[old & md->mask];
+ size = head - old;
+ old += size;
+ while (size) {
+ int ret = write(output, buf, size);
+ if (ret < 0) {
+ perror("failed to write");
+ exit(-1);
+ }
+ size -= ret;
+ buf += ret;
+ }
+
+ md->prev = old;
+}
+
+static volatile int done = 0;
+
+static void sigchld_handler(int sig)
+{
+ if (sig == SIGCHLD)
+ done = 1;
+}
+
+int main(int argc, char *argv[])
+{
+ struct pollfd event_array[MAX_NR_CPUS * MAX_COUNTERS];
+ struct mmap_data mmap_array[MAX_NR_CPUS][MAX_COUNTERS];
+ struct perf_counter_hw_event hw_event;
+ int i, counter, group_fd, nr_poll = 0;
+ pid_t pid;
+ int ret;
+
+ page_size = sysconf(_SC_PAGE_SIZE);
+
+ process_options(argc, argv);
+
+ nr_cpus = sysconf(_SC_NPROCESSORS_ONLN);
+ assert(nr_cpus <= MAX_NR_CPUS);
+ assert(nr_cpus >= 0);
+
+ output = open(output_name, O_CREAT|O_RDWR, S_IRWXU);
+ if (output < 0) {
+ perror("failed to create output file");
+ exit(-1);
+ }
+
+ argc -= optind;
+ argv += optind;
+
+ for (i = 0; i < nr_cpus; i++) {
+ group_fd = -1;
+ for (counter = 0; counter < nr_counters; counter++) {
+
+ memset(&hw_event, 0, sizeof(hw_event));
+ hw_event.config = event_id[counter];
+ hw_event.irq_period = event_count[counter];
+ hw_event.record_type = PERF_RECORD_IP | PERF_RECORD_TID;
+ hw_event.nmi = 1;
+ hw_event.mmap = 1;
+ hw_event.comm = 1;
+
+ fd[i][counter] = sys_perf_counter_open(&hw_event, -1, i, group_fd, 0);
+ if (fd[i][counter] < 0) {
+ int err = errno;
+ printf("kerneltop error: syscall returned with %d (%s)\n",
+ fd[i][counter], strerror(err));
+ if (err == EPERM)
+ printf("Are you root?\n");
+ exit(-1);
+ }
+ assert(fd[i][counter] >= 0);
+ fcntl(fd[i][counter], F_SETFL, O_NONBLOCK);
+
+ /*
+ * First counter acts as the group leader:
+ */
+ if (group && group_fd == -1)
+ group_fd = fd[i][counter];
+
+ event_array[nr_poll].fd = fd[i][counter];
+ event_array[nr_poll].events = POLLIN;
+ nr_poll++;
+
+ mmap_array[i][counter].counter = counter;
+ mmap_array[i][counter].prev = 0;
+ mmap_array[i][counter].mask = mmap_pages*page_size - 1;
+ mmap_array[i][counter].base = mmap(NULL, (mmap_pages+1)*page_size,
+ PROT_READ, MAP_SHARED, fd[i][counter], 0);
+ if (mmap_array[i][counter].base == MAP_FAILED) {
+ printf("kerneltop error: failed to mmap with %d (%s)\n",
+ errno, strerror(errno));
+ exit(-1);
+ }
+ }
+ }
+
+ signal(SIGCHLD, sigchld_handler);
+
+ pid = fork();
+ if (pid < 0)
+ perror("failed to fork");
+
+ if (!pid) {
+ if (execvp(argv[0], argv)) {
+ perror(argv[0]);
+ exit(-1);
+ }
+ }
+
+ if (realtime_prio) {
+ struct sched_param param;
+
+ param.sched_priority = realtime_prio;
+ if (sched_setscheduler(0, SCHED_FIFO, ¶m)) {
+ printf("Could not set realtime priority.\n");
+ exit(-1);
+ }
+ }
+
+ /*
+ * TODO: store the current /proc/$/maps information somewhere
+ */
+
+ while (!done) {
+ int hits = events;
+
+ for (i = 0; i < nr_cpus; i++) {
+ for (counter = 0; counter < nr_counters; counter++)
+ mmap_read(&mmap_array[i][counter]);
+ }
+
+ if (hits == events)
+ ret = poll(event_array, nr_poll, 100);
+ }
+
+ return 0;
+}
diff --git a/Documentation/perf_counter/perf-report.cc b/Documentation/perf_counter/perf-report.cc
new file mode 100644
index 0000000..09da0ba
--- /dev/null
+++ b/Documentation/perf_counter/perf-report.cc
@@ -0,0 +1,472 @@
+#define _GNU_SOURCE
+#include <sys/types.h>
+#include <sys/stat.h>
+#include <sys/time.h>
+#include <unistd.h>
+#include <stdint.h>
+#include <stdlib.h>
+#include <string.h>
+#include <limits.h>
+#include <fcntl.h>
+#include <stdio.h>
+#include <errno.h>
+#include <ctype.h>
+#include <time.h>
+#include <getopt.h>
+
+#include <sys/ioctl.h>
+#include <sys/poll.h>
+#include <sys/prctl.h>
+#include <sys/wait.h>
+#include <sys/mman.h>
+#include <sys/types.h>
+#include <sys/stat.h>
+
+#include <linux/unistd.h>
+#include <linux/types.h>
+
+#include "../../include/linux/perf_counter.h"
+
+#include <set>
+#include <map>
+#include <string>
+
+
+static char const *input_name = "output.perf";
+static int input;
+
+static unsigned long page_size;
+static unsigned long mmap_window = 32;
+
+struct ip_event {
+ struct perf_event_header header;
+ __u64 ip;
+ __u32 pid, tid;
+};
+struct mmap_event {
+ struct perf_event_header header;
+ __u32 pid, tid;
+ __u64 start;
+ __u64 len;
+ __u64 pgoff;
+ char filename[PATH_MAX];
+};
+struct comm_event {
+ struct perf_event_header header;
+ __u32 pid,tid;
+ char comm[16];
+};
+
+typedef union event_union {
+ struct perf_event_header header;
+ struct ip_event ip;
+ struct mmap_event mmap;
+ struct comm_event comm;
+} event_t;
+
+struct section {
+ uint64_t start;
+ uint64_t end;
+
+ uint64_t offset;
+
+ std::string name;
+
+ section() { };
+
+ section(uint64_t stab) : end(stab) { };
+
+ section(uint64_t start, uint64_t size, uint64_t offset, std::string name) :
+ start(start), end(start + size), offset(offset), name(name)
+ { };
+
+ bool operator < (const struct section &s) const {
+ return end < s.end;
+ };
+};
+
+typedef std::set<struct section> sections_t;
+
+struct symbol {
+ uint64_t start;
+ uint64_t end;
+
+ std::string name;
+
+ symbol() { };
+
+ symbol(uint64_t ip) : start(ip) { }
+
+ symbol(uint64_t start, uint64_t len, std::string name) :
+ start(start), end(start + len), name(name)
+ { };
+
+ bool operator < (const struct symbol &s) const {
+ return start < s.start;
+ };
+};
+
+typedef std::set<struct symbol> symbols_t;
+
+struct dso {
+ sections_t sections;
+ symbols_t syms;
+};
+
+static std::map<std::string, struct dso> dsos;
+
+static void load_dso_sections(std::string dso_name)
+{
+ struct dso &dso = dsos[dso_name];
+
+ std::string cmd = "readelf -DSW " + dso_name;
+
+ FILE *file = popen(cmd.c_str(), "r");
+ if (!file) {
+ perror("failed to open pipe");
+ exit(-1);
+ }
+
+ char *line = NULL;
+ size_t n = 0;
+
+ while (!feof(file)) {
+ uint64_t addr, off, size;
+ char name[32];
+
+ if (getline(&line, &n, file) < 0)
+ break;
+ if (!line)
+ break;
+
+ if (sscanf(line, " [%*2d] %16s %*14s %Lx %Lx %Lx",
+ name, &addr, &off, &size) == 4) {
+
+ dso.sections.insert(section(addr, size, addr - off, name));
+ }
+#if 0
+ /*
+ * for reading readelf symbols (-s), however these don't seem
+ * to include nearly everything, so use nm for that.
+ */
+ if (sscanf(line, " %*4d %*3d: %Lx %5Lu %*7s %*6s %*7s %3d %s",
+ &start, &size, §ion, sym) == 4) {
+
+ start -= dso.section_offsets[section];
+
+ dso.syms.insert(symbol(start, size, std::string(sym)));
+ }
+#endif
+ }
+ pclose(file);
+}
+
+static void load_dso_symbols(std::string dso_name, std::string args)
+{
+ struct dso &dso = dsos[dso_name];
+
+ std::string cmd = "nm -nSC " + args + " " + dso_name;
+
+ FILE *file = popen(cmd.c_str(), "r");
+ if (!file) {
+ perror("failed to open pipe");
+ exit(-1);
+ }
+
+ char *line = NULL;
+ size_t n = 0;
+
+ while (!feof(file)) {
+ uint64_t start, size;
+ char c;
+ char sym[1024];
+
+ if (getline(&line, &n, file) < 0)
+ break;
+ if (!line)
+ break;
+
+
+ if (sscanf(line, "%Lx %Lx %c %s", &start, &size, &c, sym) == 4) {
+ sections_t::const_iterator si =
+ dso.sections.upper_bound(section(start));
+ if (si == dso.sections.end()) {
+ printf("symbol in unknown section: %s\n", sym);
+ continue;
+ }
+
+ start -= si->offset;
+
+ dso.syms.insert(symbol(start, size, sym));
+ }
+ }
+ pclose(file);
+}
+
+static void load_dso(std::string dso_name)
+{
+ load_dso_sections(dso_name);
+ load_dso_symbols(dso_name, "-D"); /* dynamic symbols */
+ load_dso_symbols(dso_name, ""); /* regular ones */
+}
+
+void load_kallsyms(void)
+{
+ struct dso &dso = dsos["[kernel]"];
+
+ FILE *file = fopen("/proc/kallsyms", "r");
+ if (!file) {
+ perror("failed to open kallsyms");
+ exit(-1);
+ }
+
+ char *line;
+ size_t n;
+
+ while (!feof(file)) {
+ uint64_t start;
+ char c;
+ char sym[1024];
+
+ if (getline(&line, &n, file) < 0)
+ break;
+ if (!line)
+ break;
+
+ if (sscanf(line, "%Lx %c %s", &start, &c, sym) == 3)
+ dso.syms.insert(symbol(start, 0x1000000, std::string(sym)));
+ }
+ fclose(file);
+}
+
+struct map {
+ uint64_t start;
+ uint64_t end;
+ uint64_t pgoff;
+
+ std::string dso;
+
+ map() { };
+
+ map(uint64_t ip) : end(ip) { }
+
+ map(mmap_event *mmap) {
+ start = mmap->start;
+ end = mmap->start + mmap->len;
+ pgoff = mmap->pgoff;
+
+ dso = std::string(mmap->filename);
+
+ if (dsos.find(dso) == dsos.end())
+ load_dso(dso);
+ };
+
+ bool operator < (const struct map &m) const {
+ return end < m.end;
+ };
+};
+
+typedef std::set<struct map> maps_t;
+
+static std::map<int, maps_t> maps;
+
+static std::map<int, std::string> comms;
+
+static std::map<std::string, int> hist;
+static std::multimap<int, std::string> rev_hist;
+
+static std::string resolve_comm(int pid)
+{
+ std::string comm = "<unknown>";
+ std::map<int, std::string>::const_iterator ci = comms.find(pid);
+ if (ci != comms.end())
+ comm = ci->second;
+
+ return comm;
+}
+
+static std::string resolve_user_symbol(int pid, uint64_t ip)
+{
+ std::string sym = "<unknown>";
+
+ maps_t &m = maps[pid];
+ maps_t::const_iterator mi = m.upper_bound(map(ip));
+ if (mi == m.end())
+ return sym;
+
+ ip -= mi->start + mi->pgoff;
+
+ symbols_t &s = dsos[mi->dso].syms;
+ symbols_t::const_iterator si = s.upper_bound(symbol(ip));
+
+ sym = mi->dso + ": <unknown>";
+
+ if (si == s.begin())
+ return sym;
+ si--;
+
+ if (si->start <= ip && ip < si->end)
+ sym = mi->dso + ": " + si->name;
+#if 0
+ else if (si->start <= ip)
+ sym = mi->dso + ": ?" + si->name;
+#endif
+
+ return sym;
+}
+
+static std::string resolve_kernel_symbol(uint64_t ip)
+{
+ std::string sym = "<unknown>";
+
+ symbols_t &s = dsos["[kernel]"].syms;
+ symbols_t::const_iterator si = s.upper_bound(symbol(ip));
+
+ if (si == s.begin())
+ return sym;
+ si--;
+
+ if (si->start <= ip && ip < si->end)
+ sym = si->name;
+
+ return sym;
+}
+
+static void display_help(void)
+{
+ printf(
+ "Usage: perf-report [<options>]\n"
+ " -i file --input=<file> # input file\n"
+ );
+
+ exit(0);
+}
+
+static void process_options(int argc, char *argv[])
+{
+ int error = 0;
+
+ for (;;) {
+ int option_index = 0;
+ /** Options for getopt */
+ static struct option long_options[] = {
+ {"input", required_argument, NULL, 'i'},
+ {NULL, 0, NULL, 0 }
+ };
+ int c = getopt_long(argc, argv, "+:i:",
+ long_options, &option_index);
+ if (c == -1)
+ break;
+
+ switch (c) {
+ case 'i': input_name = strdup(optarg); break;
+ default: error = 1; break;
+ }
+ }
+
+ if (error)
+ display_help();
+}
+
+int main(int argc, char *argv[])
+{
+ unsigned long offset = 0;
+ unsigned long head = 0;
+ struct stat stat;
+ char *buf;
+ event_t *event;
+ int ret;
+ unsigned long total = 0;
+
+ page_size = getpagesize();
+
+ process_options(argc, argv);
+
+ input = open(input_name, O_RDONLY);
+ if (input < 0) {
+ perror("failed to open file");
+ exit(-1);
+ }
+
+ ret = fstat(input, &stat);
+ if (ret < 0) {
+ perror("failed to stat file");
+ exit(-1);
+ }
+
+ load_kallsyms();
+
+remap:
+ buf = (char *)mmap(NULL, page_size * mmap_window, PROT_READ,
+ MAP_SHARED, input, offset);
+ if (buf == MAP_FAILED) {
+ perror("failed to mmap file");
+ exit(-1);
+ }
+
+more:
+ event = (event_t *)(buf + head);
+
+ if (head + event->header.size >= page_size * mmap_window) {
+ unsigned long shift = page_size * (head / page_size);
+
+ munmap(buf, page_size * mmap_window);
+ offset += shift;
+ head -= shift;
+ goto remap;
+ }
+ head += event->header.size;
+
+ if (event->header.misc & PERF_EVENT_MISC_OVERFLOW) {
+ std::string comm, sym, level;
+ char output[1024];
+
+ if (event->header.misc & PERF_EVENT_MISC_KERNEL) {
+ level = "[kernel]";
+ sym = resolve_kernel_symbol(event->ip.ip);
+ } else if (event->header.misc & PERF_EVENT_MISC_USER) {
+ level = "[ user ]";
+ sym = resolve_user_symbol(event->ip.pid, event->ip.ip);
+ } else {
+ level = "[ hv ]";
+ }
+ comm = resolve_comm(event->ip.pid);
+
+ snprintf(output, sizeof(output), "%16s %s %s",
+ comm.c_str(), level.c_str(), sym.c_str());
+ hist[output]++;
+
+ total++;
+
+ } else switch (event->header.type) {
+ case PERF_EVENT_MMAP:
+ maps[event->mmap.pid].insert(map(&event->mmap));
+ break;
+
+ case PERF_EVENT_COMM:
+ comms[event->comm.pid] = std::string(event->comm.comm);
+ break;
+ }
+
+ if (offset + head < stat.st_size)
+ goto more;
+
+ close(input);
+
+ std::map<std::string, int>::iterator hi = hist.begin();
+
+ while (hi != hist.end()) {
+ rev_hist.insert(std::pair<int, std::string>(hi->second, hi->first));
+ hist.erase(hi++);
+ }
+
+ std::multimap<int, std::string>::const_iterator ri = rev_hist.begin();
+
+ while (ri != rev_hist.end()) {
+ printf(" %5.2f %s\n", (100.0 * ri->first)/total, ri->second.c_str());
+ ri++;
+ }
+
+ return 0;
+}
+
^ permalink raw reply related [flat|nested] 24+ messages in thread
* [tip:perfcounters/core] perf_counter: move PERF_RECORD_TIME
2009-04-08 13:01 ` [PATCH 8/9] perf_counter: move PERF_RECORD_TIME Peter Zijlstra
@ 2009-04-08 16:58 ` Peter Zijlstra
2009-04-08 17:09 ` Peter Zijlstra
1 sibling, 0 replies; 24+ messages in thread
From: Peter Zijlstra @ 2009-04-08 16:58 UTC (permalink / raw)
To: linux-tip-commits
Cc: linux-kernel, paulus, hpa, mingo, a.p.zijlstra, tglx, cjashfor, mingo
Commit-ID: 58068ecf1953c1ffb315107255309ff3ee456ae5
Gitweb: http://git.kernel.org/tip/58068ecf1953c1ffb315107255309ff3ee456ae5
Author: Peter Zijlstra <a.p.zijlstra@chello.nl>
AuthorDate: Wed, 8 Apr 2009 15:01:32 +0200
Committer: Ingo Molnar <mingo@elte.hu>
CommitDate: Wed, 8 Apr 2009 18:53:31 +0200
perf_counter: move PERF_RECORD_TIME
Move PERF_RECORD_TIME so that all the fixed length items come before
the variable length ones.
Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl>
Cc: Paul Mackerras <paulus@samba.org>
Cc: Corey Ashford <cjashfor@linux.vnet.ibm.com>
LKML-Reference: <20090408130409.307926436@chello.nl>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
---
include/linux/perf_counter.h | 9 ++++-----
kernel/perf_counter.c | 26 +++++++++++++-------------
2 files changed, 17 insertions(+), 18 deletions(-)
diff --git a/include/linux/perf_counter.h b/include/linux/perf_counter.h
index fed9216..9e969c4 100644
--- a/include/linux/perf_counter.h
+++ b/include/linux/perf_counter.h
@@ -100,9 +100,9 @@ enum sw_event_ids {
enum perf_counter_record_format {
PERF_RECORD_IP = 1U << 0,
PERF_RECORD_TID = 1U << 1,
- PERF_RECORD_GROUP = 1U << 2,
- PERF_RECORD_CALLCHAIN = 1U << 3,
- PERF_RECORD_TIME = 1U << 4,
+ PERF_RECORD_TIME = 1U << 2,
+ PERF_RECORD_GROUP = 1U << 3,
+ PERF_RECORD_CALLCHAIN = 1U << 4,
};
/*
@@ -250,6 +250,7 @@ enum perf_event_type {
*
* { u64 ip; } && PERF_RECORD_IP
* { u32 pid, tid; } && PERF_RECORD_TID
+ * { u64 time; } && PERF_RECORD_TIME
*
* { u64 nr;
* { u64 event, val; } cnt[nr]; } && PERF_RECORD_GROUP
@@ -259,8 +260,6 @@ enum perf_event_type {
* kernel,
* user;
* u64 ips[nr]; } && PERF_RECORD_CALLCHAIN
- *
- * { u64 time; } && PERF_RECORD_TIME
* };
*/
};
diff --git a/kernel/perf_counter.c b/kernel/perf_counter.c
index 2d4aebb..4dc8600 100644
--- a/kernel/perf_counter.c
+++ b/kernel/perf_counter.c
@@ -1850,6 +1850,16 @@ static void perf_counter_output(struct perf_counter *counter,
header.size += sizeof(tid_entry);
}
+ if (record_type & PERF_RECORD_TIME) {
+ /*
+ * Maybe do better on x86 and provide cpu_clock_nmi()
+ */
+ time = sched_clock();
+
+ header.type |= PERF_RECORD_TIME;
+ header.size += sizeof(u64);
+ }
+
if (record_type & PERF_RECORD_GROUP) {
header.type |= PERF_RECORD_GROUP;
header.size += sizeof(u64) +
@@ -1867,16 +1877,6 @@ static void perf_counter_output(struct perf_counter *counter,
}
}
- if (record_type & PERF_RECORD_TIME) {
- /*
- * Maybe do better on x86 and provide cpu_clock_nmi()
- */
- time = sched_clock();
-
- header.type |= PERF_RECORD_TIME;
- header.size += sizeof(u64);
- }
-
ret = perf_output_begin(&handle, counter, header.size, nmi, 1);
if (ret)
return;
@@ -1889,6 +1889,9 @@ static void perf_counter_output(struct perf_counter *counter,
if (record_type & PERF_RECORD_TID)
perf_output_put(&handle, tid_entry);
+ if (record_type & PERF_RECORD_TIME)
+ perf_output_put(&handle, time);
+
if (record_type & PERF_RECORD_GROUP) {
struct perf_counter *leader, *sub;
u64 nr = counter->nr_siblings;
@@ -1910,9 +1913,6 @@ static void perf_counter_output(struct perf_counter *counter,
if (callchain)
perf_output_copy(&handle, callchain, callchain_size);
- if (record_type & PERF_RECORD_TIME)
- perf_output_put(&handle, time);
-
perf_output_end(&handle);
}
^ permalink raw reply related [flat|nested] 24+ messages in thread
* [tip:perfcounters/core] perf_counter: allow for data addresses to be recorded
2009-04-08 13:01 ` [PATCH 9/9] perf_counter: allow for data addresses to be recorded Peter Zijlstra
@ 2009-04-08 16:59 ` Peter Zijlstra
2009-04-08 17:10 ` Peter Zijlstra
1 sibling, 0 replies; 24+ messages in thread
From: Peter Zijlstra @ 2009-04-08 16:59 UTC (permalink / raw)
To: linux-tip-commits
Cc: linux-kernel, paulus, hpa, mingo, a.p.zijlstra, tglx, cjashfor, mingo
Commit-ID: 50b74e78bb5bdc0cd9722850d38411e3b84fefa7
Gitweb: http://git.kernel.org/tip/50b74e78bb5bdc0cd9722850d38411e3b84fefa7
Author: Peter Zijlstra <a.p.zijlstra@chello.nl>
AuthorDate: Wed, 8 Apr 2009 15:01:33 +0200
Committer: Ingo Molnar <mingo@elte.hu>
CommitDate: Wed, 8 Apr 2009 18:53:31 +0200
perf_counter: allow for data addresses to be recorded
Paul suggested we allow for data addresses to be recorded along with
the traditional IPs as power can provide these.
For now, only the software pagefault events provide data addresses,
but in the future power might as well for some events.
x86 doesn't seem capable of providing this atm.
Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl>
Cc: Paul Mackerras <paulus@samba.org>
Cc: Corey Ashford <cjashfor@linux.vnet.ibm.com>
LKML-Reference: <20090408130409.394816925@chello.nl>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
---
arch/powerpc/kernel/perf_counter.c | 2 +-
arch/powerpc/mm/fault.c | 8 ++++--
arch/x86/kernel/cpu/perf_counter.c | 2 +-
arch/x86/mm/fault.c | 8 ++++--
include/linux/perf_counter.h | 14 ++++++----
kernel/perf_counter.c | 46 ++++++++++++++++++++++-------------
6 files changed, 49 insertions(+), 31 deletions(-)
diff --git a/arch/powerpc/kernel/perf_counter.c b/arch/powerpc/kernel/perf_counter.c
index 0697ade..c9d019f 100644
--- a/arch/powerpc/kernel/perf_counter.c
+++ b/arch/powerpc/kernel/perf_counter.c
@@ -749,7 +749,7 @@ static void record_and_restart(struct perf_counter *counter, long val,
* Finally record data if requested.
*/
if (record)
- perf_counter_overflow(counter, 1, regs);
+ perf_counter_overflow(counter, 1, regs, 0);
}
/*
diff --git a/arch/powerpc/mm/fault.c b/arch/powerpc/mm/fault.c
index 17bbf6f..ac0e112 100644
--- a/arch/powerpc/mm/fault.c
+++ b/arch/powerpc/mm/fault.c
@@ -171,7 +171,7 @@ int __kprobes do_page_fault(struct pt_regs *regs, unsigned long address,
die("Weird page fault", regs, SIGSEGV);
}
- perf_swcounter_event(PERF_COUNT_PAGE_FAULTS, 1, 0, regs);
+ perf_swcounter_event(PERF_COUNT_PAGE_FAULTS, 1, 0, regs, address);
/* When running in the kernel we expect faults to occur only to
* addresses in user space. All other faults represent errors in the
@@ -312,7 +312,8 @@ good_area:
}
if (ret & VM_FAULT_MAJOR) {
current->maj_flt++;
- perf_swcounter_event(PERF_COUNT_PAGE_FAULTS_MAJ, 1, 0, regs);
+ perf_swcounter_event(PERF_COUNT_PAGE_FAULTS_MAJ, 1, 0,
+ regs, address);
#ifdef CONFIG_PPC_SMLPAR
if (firmware_has_feature(FW_FEATURE_CMO)) {
preempt_disable();
@@ -322,7 +323,8 @@ good_area:
#endif
} else {
current->min_flt++;
- perf_swcounter_event(PERF_COUNT_PAGE_FAULTS_MIN, 1, 0, regs);
+ perf_swcounter_event(PERF_COUNT_PAGE_FAULTS_MIN, 1, 0,
+ regs, address);
}
up_read(&mm->mmap_sem);
return 0;
diff --git a/arch/x86/kernel/cpu/perf_counter.c b/arch/x86/kernel/cpu/perf_counter.c
index 1116a41..0fcbaab 100644
--- a/arch/x86/kernel/cpu/perf_counter.c
+++ b/arch/x86/kernel/cpu/perf_counter.c
@@ -800,7 +800,7 @@ again:
continue;
perf_save_and_restart(counter);
- if (perf_counter_overflow(counter, nmi, regs))
+ if (perf_counter_overflow(counter, nmi, regs, 0))
__pmc_generic_disable(counter, &counter->hw, bit);
}
diff --git a/arch/x86/mm/fault.c b/arch/x86/mm/fault.c
index f2d3324..6f9df2b 100644
--- a/arch/x86/mm/fault.c
+++ b/arch/x86/mm/fault.c
@@ -1045,7 +1045,7 @@ do_page_fault(struct pt_regs *regs, unsigned long error_code)
if (unlikely(error_code & PF_RSVD))
pgtable_bad(regs, error_code, address);
- perf_swcounter_event(PERF_COUNT_PAGE_FAULTS, 1, 0, regs);
+ perf_swcounter_event(PERF_COUNT_PAGE_FAULTS, 1, 0, regs, address);
/*
* If we're in an interrupt, have no user context or are running
@@ -1142,10 +1142,12 @@ good_area:
if (fault & VM_FAULT_MAJOR) {
tsk->maj_flt++;
- perf_swcounter_event(PERF_COUNT_PAGE_FAULTS_MAJ, 1, 0, regs);
+ perf_swcounter_event(PERF_COUNT_PAGE_FAULTS_MAJ, 1, 0,
+ regs, address);
} else {
tsk->min_flt++;
- perf_swcounter_event(PERF_COUNT_PAGE_FAULTS_MIN, 1, 0, regs);
+ perf_swcounter_event(PERF_COUNT_PAGE_FAULTS_MIN, 1, 0,
+ regs, address);
}
check_v8086_mode(regs, address, tsk);
diff --git a/include/linux/perf_counter.h b/include/linux/perf_counter.h
index 9e969c4..471648c 100644
--- a/include/linux/perf_counter.h
+++ b/include/linux/perf_counter.h
@@ -101,8 +101,9 @@ enum perf_counter_record_format {
PERF_RECORD_IP = 1U << 0,
PERF_RECORD_TID = 1U << 1,
PERF_RECORD_TIME = 1U << 2,
- PERF_RECORD_GROUP = 1U << 3,
- PERF_RECORD_CALLCHAIN = 1U << 4,
+ PERF_RECORD_ADDR = 1U << 3,
+ PERF_RECORD_GROUP = 1U << 4,
+ PERF_RECORD_CALLCHAIN = 1U << 5,
};
/*
@@ -251,6 +252,7 @@ enum perf_event_type {
* { u64 ip; } && PERF_RECORD_IP
* { u32 pid, tid; } && PERF_RECORD_TID
* { u64 time; } && PERF_RECORD_TIME
+ * { u64 addr; } && PERF_RECORD_ADDR
*
* { u64 nr;
* { u64 event, val; } cnt[nr]; } && PERF_RECORD_GROUP
@@ -537,7 +539,7 @@ extern int hw_perf_group_sched_in(struct perf_counter *group_leader,
extern void perf_counter_update_userpage(struct perf_counter *counter);
extern int perf_counter_overflow(struct perf_counter *counter,
- int nmi, struct pt_regs *regs);
+ int nmi, struct pt_regs *regs, u64 addr);
/*
* Return 1 for a software counter, 0 for a hardware counter
*/
@@ -547,7 +549,7 @@ static inline int is_software_counter(struct perf_counter *counter)
perf_event_type(&counter->hw_event) != PERF_TYPE_HARDWARE;
}
-extern void perf_swcounter_event(u32, u64, int, struct pt_regs *);
+extern void perf_swcounter_event(u32, u64, int, struct pt_regs *, u64);
extern void perf_counter_mmap(unsigned long addr, unsigned long len,
unsigned long pgoff, struct file *file);
@@ -584,8 +586,8 @@ static inline int perf_counter_task_disable(void) { return -EINVAL; }
static inline int perf_counter_task_enable(void) { return -EINVAL; }
static inline void
-perf_swcounter_event(u32 event, u64 nr, int nmi, struct pt_regs *regs) { }
-
+perf_swcounter_event(u32 event, u64 nr, int nmi,
+ struct pt_regs *regs, u64 addr) { }
static inline void
perf_counter_mmap(unsigned long addr, unsigned long len,
diff --git a/kernel/perf_counter.c b/kernel/perf_counter.c
index 4dc8600..321c57e 100644
--- a/kernel/perf_counter.c
+++ b/kernel/perf_counter.c
@@ -800,7 +800,7 @@ void perf_counter_task_sched_out(struct task_struct *task, int cpu)
update_context_time(ctx);
regs = task_pt_regs(task);
- perf_swcounter_event(PERF_COUNT_CONTEXT_SWITCHES, 1, 1, regs);
+ perf_swcounter_event(PERF_COUNT_CONTEXT_SWITCHES, 1, 1, regs, 0);
__perf_counter_sched_out(ctx, cpuctx);
cpuctx->task_ctx = NULL;
@@ -1810,7 +1810,7 @@ static void perf_output_end(struct perf_output_handle *handle)
}
static void perf_counter_output(struct perf_counter *counter,
- int nmi, struct pt_regs *regs)
+ int nmi, struct pt_regs *regs, u64 addr)
{
int ret;
u64 record_type = counter->hw_event.record_type;
@@ -1860,6 +1860,11 @@ static void perf_counter_output(struct perf_counter *counter,
header.size += sizeof(u64);
}
+ if (record_type & PERF_RECORD_ADDR) {
+ header.type |= PERF_RECORD_ADDR;
+ header.size += sizeof(u64);
+ }
+
if (record_type & PERF_RECORD_GROUP) {
header.type |= PERF_RECORD_GROUP;
header.size += sizeof(u64) +
@@ -1892,6 +1897,9 @@ static void perf_counter_output(struct perf_counter *counter,
if (record_type & PERF_RECORD_TIME)
perf_output_put(&handle, time);
+ if (record_type & PERF_RECORD_ADDR)
+ perf_output_put(&handle, addr);
+
if (record_type & PERF_RECORD_GROUP) {
struct perf_counter *leader, *sub;
u64 nr = counter->nr_siblings;
@@ -2158,7 +2166,7 @@ void perf_counter_munmap(unsigned long addr, unsigned long len,
*/
int perf_counter_overflow(struct perf_counter *counter,
- int nmi, struct pt_regs *regs)
+ int nmi, struct pt_regs *regs, u64 addr)
{
int events = atomic_read(&counter->event_limit);
int ret = 0;
@@ -2175,7 +2183,7 @@ int perf_counter_overflow(struct perf_counter *counter,
perf_counter_disable(counter);
}
- perf_counter_output(counter, nmi, regs);
+ perf_counter_output(counter, nmi, regs, addr);
return ret;
}
@@ -2240,7 +2248,7 @@ static enum hrtimer_restart perf_swcounter_hrtimer(struct hrtimer *hrtimer)
regs = task_pt_regs(current);
if (regs) {
- if (perf_counter_overflow(counter, 0, regs))
+ if (perf_counter_overflow(counter, 0, regs, 0))
ret = HRTIMER_NORESTART;
}
@@ -2250,11 +2258,11 @@ static enum hrtimer_restart perf_swcounter_hrtimer(struct hrtimer *hrtimer)
}
static void perf_swcounter_overflow(struct perf_counter *counter,
- int nmi, struct pt_regs *regs)
+ int nmi, struct pt_regs *regs, u64 addr)
{
perf_swcounter_update(counter);
perf_swcounter_set_period(counter);
- if (perf_counter_overflow(counter, nmi, regs))
+ if (perf_counter_overflow(counter, nmi, regs, addr))
/* soft-disable the counter */
;
@@ -2286,16 +2294,17 @@ static int perf_swcounter_match(struct perf_counter *counter,
}
static void perf_swcounter_add(struct perf_counter *counter, u64 nr,
- int nmi, struct pt_regs *regs)
+ int nmi, struct pt_regs *regs, u64 addr)
{
int neg = atomic64_add_negative(nr, &counter->hw.count);
if (counter->hw.irq_period && !neg)
- perf_swcounter_overflow(counter, nmi, regs);
+ perf_swcounter_overflow(counter, nmi, regs, addr);
}
static void perf_swcounter_ctx_event(struct perf_counter_context *ctx,
enum perf_event_types type, u32 event,
- u64 nr, int nmi, struct pt_regs *regs)
+ u64 nr, int nmi, struct pt_regs *regs,
+ u64 addr)
{
struct perf_counter *counter;
@@ -2305,7 +2314,7 @@ static void perf_swcounter_ctx_event(struct perf_counter_context *ctx,
rcu_read_lock();
list_for_each_entry_rcu(counter, &ctx->event_list, event_entry) {
if (perf_swcounter_match(counter, type, event, regs))
- perf_swcounter_add(counter, nr, nmi, regs);
+ perf_swcounter_add(counter, nr, nmi, regs, addr);
}
rcu_read_unlock();
}
@@ -2325,7 +2334,8 @@ static int *perf_swcounter_recursion_context(struct perf_cpu_context *cpuctx)
}
static void __perf_swcounter_event(enum perf_event_types type, u32 event,
- u64 nr, int nmi, struct pt_regs *regs)
+ u64 nr, int nmi, struct pt_regs *regs,
+ u64 addr)
{
struct perf_cpu_context *cpuctx = &get_cpu_var(perf_cpu_context);
int *recursion = perf_swcounter_recursion_context(cpuctx);
@@ -2336,10 +2346,11 @@ static void __perf_swcounter_event(enum perf_event_types type, u32 event,
(*recursion)++;
barrier();
- perf_swcounter_ctx_event(&cpuctx->ctx, type, event, nr, nmi, regs);
+ perf_swcounter_ctx_event(&cpuctx->ctx, type, event,
+ nr, nmi, regs, addr);
if (cpuctx->task_ctx) {
perf_swcounter_ctx_event(cpuctx->task_ctx, type, event,
- nr, nmi, regs);
+ nr, nmi, regs, addr);
}
barrier();
@@ -2349,9 +2360,10 @@ out:
put_cpu_var(perf_cpu_context);
}
-void perf_swcounter_event(u32 event, u64 nr, int nmi, struct pt_regs *regs)
+void
+perf_swcounter_event(u32 event, u64 nr, int nmi, struct pt_regs *regs, u64 addr)
{
- __perf_swcounter_event(PERF_TYPE_SOFTWARE, event, nr, nmi, regs);
+ __perf_swcounter_event(PERF_TYPE_SOFTWARE, event, nr, nmi, regs, addr);
}
static void perf_swcounter_read(struct perf_counter *counter)
@@ -2548,7 +2560,7 @@ void perf_tpcounter_event(int event_id)
if (!regs)
regs = task_pt_regs(current);
- __perf_swcounter_event(PERF_TYPE_TRACEPOINT, event_id, 1, 1, regs);
+ __perf_swcounter_event(PERF_TYPE_TRACEPOINT, event_id, 1, 1, regs, 0);
}
extern int ftrace_profile_enable(int);
^ permalink raw reply related [flat|nested] 24+ messages in thread
* [PATCH 6.5/9] perf_counter: fix track task-comm data
2009-04-08 13:01 ` [PATCH 6/9] perf_counter: track task-comm data Peter Zijlstra
2009-04-08 16:58 ` [tip:perfcounters/core] " Peter Zijlstra
@ 2009-04-08 17:03 ` Peter Zijlstra
2009-04-08 17:09 ` [tip:perfcounters/core] perf_counter: " Peter Zijlstra
2 siblings, 0 replies; 24+ messages in thread
From: Peter Zijlstra @ 2009-04-08 17:03 UTC (permalink / raw)
To: Ingo Molnar; +Cc: Paul Mackerras, Corey Ashford, linux-kernel
On Wed, 2009-04-08 at 15:01 +0200, Peter Zijlstra wrote:
> plain text document attachment (perf_counter-comm.patch)
> Similar to the mmap data stream, add one that tracks the task COMM field,
> so that the userspace reporting knows what to call a task.
Seems I forgot the !PERF_COUNTER case.
Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl>
---
include/linux/perf_counter.h | 1 +
1 file changed, 1 insertion(+)
Index: linux-2.6/include/linux/perf_counter.h
===================================================================
--- linux-2.6.orig/include/linux/perf_counter.h
+++ linux-2.6/include/linux/perf_counter.h
@@ -597,6 +597,7 @@ static inline void
perf_counter_munmap(unsigned long addr, unsigned long len,
unsigned long pgoff, struct file *file) { }
+static inline void perf_counter_comm(struct task_struct *tsk) { }
#endif
#endif /* __KERNEL__ */
^ permalink raw reply [flat|nested] 24+ messages in thread
* [tip:perfcounters/core] perf_counter: track task-comm data
2009-04-08 13:01 ` [PATCH 6/9] perf_counter: track task-comm data Peter Zijlstra
2009-04-08 16:58 ` [tip:perfcounters/core] " Peter Zijlstra
2009-04-08 17:03 ` [PATCH 6.5/9] perf_counter: fix " Peter Zijlstra
@ 2009-04-08 17:09 ` Peter Zijlstra
2 siblings, 0 replies; 24+ messages in thread
From: Peter Zijlstra @ 2009-04-08 17:09 UTC (permalink / raw)
To: linux-tip-commits
Cc: linux-kernel, paulus, hpa, mingo, a.p.zijlstra, tglx, cjashfor, mingo
Commit-ID: 8d1b2d9361b494bfc761700c348c65ebbe3deb5b
Gitweb: http://git.kernel.org/tip/8d1b2d9361b494bfc761700c348c65ebbe3deb5b
Author: Peter Zijlstra <a.p.zijlstra@chello.nl>
AuthorDate: Wed, 8 Apr 2009 15:01:30 +0200
Committer: Ingo Molnar <mingo@elte.hu>
CommitDate: Wed, 8 Apr 2009 19:05:47 +0200
perf_counter: track task-comm data
Similar to the mmap data stream, add one that tracks the task COMM field,
so that the userspace reporting knows what to call a task.
Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl>
Cc: Paul Mackerras <paulus@samba.org>
Cc: Corey Ashford <cjashfor@linux.vnet.ibm.com>
LKML-Reference: <20090408130409.127422406@chello.nl>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
---
fs/exec.c | 1 +
include/linux/perf_counter.h | 16 +++++++-
kernel/perf_counter.c | 93 ++++++++++++++++++++++++++++++++++++++++++
3 files changed, 109 insertions(+), 1 deletions(-)
diff --git a/fs/exec.c b/fs/exec.c
index e015c0b..bf47ed0 100644
--- a/fs/exec.c
+++ b/fs/exec.c
@@ -951,6 +951,7 @@ void set_task_comm(struct task_struct *tsk, char *buf)
task_lock(tsk);
strlcpy(tsk->comm, buf, sizeof(tsk->comm));
task_unlock(tsk);
+ perf_counter_comm(tsk);
}
int flush_old_exec(struct linux_binprm * bprm)
diff --git a/include/linux/perf_counter.h b/include/linux/perf_counter.h
index 8bf764f..a70a55f 100644
--- a/include/linux/perf_counter.h
+++ b/include/linux/perf_counter.h
@@ -142,8 +142,9 @@ struct perf_counter_hw_event {
exclude_idle : 1, /* don't count when idle */
mmap : 1, /* include mmap data */
munmap : 1, /* include munmap data */
+ comm : 1, /* include comm data */
- __reserved_1 : 53;
+ __reserved_1 : 52;
__u32 extra_config_len;
__u32 wakeup_events; /* wakeup every n events */
@@ -231,6 +232,16 @@ enum perf_event_type {
PERF_EVENT_MUNMAP = 2,
/*
+ * struct {
+ * struct perf_event_header header;
+ *
+ * u32 pid, tid;
+ * char comm[];
+ * };
+ */
+ PERF_EVENT_COMM = 3,
+
+ /*
* When header.misc & PERF_EVENT_MISC_OVERFLOW the event_type field
* will be PERF_RECORD_*
*
@@ -545,6 +556,8 @@ extern void perf_counter_mmap(unsigned long addr, unsigned long len,
extern void perf_counter_munmap(unsigned long addr, unsigned long len,
unsigned long pgoff, struct file *file);
+extern void perf_counter_comm(struct task_struct *tsk);
+
#define MAX_STACK_DEPTH 255
struct perf_callchain_entry {
@@ -583,6 +596,7 @@ static inline void
perf_counter_munmap(unsigned long addr, unsigned long len,
unsigned long pgoff, struct file *file) { }
+static inline void perf_counter_comm(struct task_struct *tsk) { }
#endif
#endif /* __KERNEL__ */
diff --git a/kernel/perf_counter.c b/kernel/perf_counter.c
index bf12df6..2d4aebb 100644
--- a/kernel/perf_counter.c
+++ b/kernel/perf_counter.c
@@ -1917,6 +1917,99 @@ static void perf_counter_output(struct perf_counter *counter,
}
/*
+ * comm tracking
+ */
+
+struct perf_comm_event {
+ struct task_struct *task;
+ char *comm;
+ int comm_size;
+
+ struct {
+ struct perf_event_header header;
+
+ u32 pid;
+ u32 tid;
+ } event;
+};
+
+static void perf_counter_comm_output(struct perf_counter *counter,
+ struct perf_comm_event *comm_event)
+{
+ struct perf_output_handle handle;
+ int size = comm_event->event.header.size;
+ int ret = perf_output_begin(&handle, counter, size, 0, 0);
+
+ if (ret)
+ return;
+
+ perf_output_put(&handle, comm_event->event);
+ perf_output_copy(&handle, comm_event->comm,
+ comm_event->comm_size);
+ perf_output_end(&handle);
+}
+
+static int perf_counter_comm_match(struct perf_counter *counter,
+ struct perf_comm_event *comm_event)
+{
+ if (counter->hw_event.comm &&
+ comm_event->event.header.type == PERF_EVENT_COMM)
+ return 1;
+
+ return 0;
+}
+
+static void perf_counter_comm_ctx(struct perf_counter_context *ctx,
+ struct perf_comm_event *comm_event)
+{
+ struct perf_counter *counter;
+
+ if (system_state != SYSTEM_RUNNING || list_empty(&ctx->event_list))
+ return;
+
+ rcu_read_lock();
+ list_for_each_entry_rcu(counter, &ctx->event_list, event_entry) {
+ if (perf_counter_comm_match(counter, comm_event))
+ perf_counter_comm_output(counter, comm_event);
+ }
+ rcu_read_unlock();
+}
+
+static void perf_counter_comm_event(struct perf_comm_event *comm_event)
+{
+ struct perf_cpu_context *cpuctx;
+ unsigned int size;
+ char *comm = comm_event->task->comm;
+
+ size = ALIGN(strlen(comm), sizeof(u64));
+
+ comm_event->comm = comm;
+ comm_event->comm_size = size;
+
+ comm_event->event.header.size = sizeof(comm_event->event) + size;
+
+ cpuctx = &get_cpu_var(perf_cpu_context);
+ perf_counter_comm_ctx(&cpuctx->ctx, comm_event);
+ put_cpu_var(perf_cpu_context);
+
+ perf_counter_comm_ctx(¤t->perf_counter_ctx, comm_event);
+}
+
+void perf_counter_comm(struct task_struct *task)
+{
+ struct perf_comm_event comm_event = {
+ .task = task,
+ .event = {
+ .header = { .type = PERF_EVENT_COMM, },
+ .pid = task->group_leader->pid,
+ .tid = task->pid,
+ },
+ };
+
+ perf_counter_comm_event(&comm_event);
+}
+
+/*
* mmap tracking
*/
^ permalink raw reply related [flat|nested] 24+ messages in thread
* [tip:perfcounters/core] perf_counter: some simple userspace profiling
2009-04-08 13:01 ` [PATCH 7/9] perf_counter: some simple userspace profiling Peter Zijlstra
2009-04-08 16:58 ` [tip:perfcounters/core] " Peter Zijlstra
@ 2009-04-08 17:09 ` Peter Zijlstra
1 sibling, 0 replies; 24+ messages in thread
From: Peter Zijlstra @ 2009-04-08 17:09 UTC (permalink / raw)
To: linux-tip-commits
Cc: linux-kernel, paulus, hpa, mingo, a.p.zijlstra, acme, tglx,
cjashfor, mingo
Commit-ID: de9ac07bbf8f51e0ce40e5428c3a8f627bd237c2
Gitweb: http://git.kernel.org/tip/de9ac07bbf8f51e0ce40e5428c3a8f627bd237c2
Author: Peter Zijlstra <a.p.zijlstra@chello.nl>
AuthorDate: Wed, 8 Apr 2009 15:01:31 +0200
Committer: Ingo Molnar <mingo@elte.hu>
CommitDate: Wed, 8 Apr 2009 19:05:54 +0200
perf_counter: some simple userspace profiling
# perf-record make -j4 kernel/
# perf-report | tail -15
0.39 cc1 [kernel] lock_acquired
0.42 cc1 [kernel] lock_acquire
0.51 cc1 [ user ] /lib64/libc-2.8.90.so: _int_free
0.51 as [kernel] clear_page_c
0.53 cc1 [ user ] /lib64/libc-2.8.90.so: memcpy
0.56 cc1 [ user ] /lib64/libc-2.8.90.so: _IO_vfprintf
0.63 cc1 [kernel] lock_release
0.67 cc1 [ user ] /lib64/libc-2.8.90.so: strlen
0.68 cc1 [kernel] debug_smp_processor_id
1.38 cc1 [ user ] /lib64/libc-2.8.90.so: _int_malloc
1.55 cc1 [ user ] /lib64/libc-2.8.90.so: memset
1.77 cc1 [kernel] __lock_acquire
1.88 cc1 [kernel] clear_page_c
3.61 as [ user ] /usr/bin/as: <unknown>
59.16 cc1 [ user ] /usr/libexec/gcc/x86_64-redhat-linux/4.3.2/cc1: <unknown>
Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl>
Cc: Paul Mackerras <paulus@samba.org>
Cc: Corey Ashford <cjashfor@linux.vnet.ibm.com>
Cc: Arnaldo Carvalho de Melo <acme@ghostprotocols.net>
LKML-Reference: <20090408130409.220518450@chello.nl>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
---
Documentation/perf_counter/Makefile | 8 +-
Documentation/perf_counter/perf-record.c | 530 +++++++++++++++++++++++++++++
Documentation/perf_counter/perf-report.cc | 472 +++++++++++++++++++++++++
3 files changed, 1009 insertions(+), 1 deletions(-)
diff --git a/Documentation/perf_counter/Makefile b/Documentation/perf_counter/Makefile
index 194b662..1dd37ee 100644
--- a/Documentation/perf_counter/Makefile
+++ b/Documentation/perf_counter/Makefile
@@ -1,10 +1,16 @@
-BINS = kerneltop perfstat
+BINS = kerneltop perfstat perf-record perf-report
all: $(BINS)
kerneltop: kerneltop.c ../../include/linux/perf_counter.h
cc -O6 -Wall -lrt -o $@ $<
+perf-record: perf-record.c ../../include/linux/perf_counter.h
+ cc -O6 -Wall -lrt -o $@ $<
+
+perf-report: perf-report.cc ../../include/linux/perf_counter.h
+ g++ -O6 -Wall -lrt -o $@ $<
+
perfstat: kerneltop
ln -sf kerneltop perfstat
diff --git a/Documentation/perf_counter/perf-record.c b/Documentation/perf_counter/perf-record.c
new file mode 100644
index 0000000..614de7c
--- /dev/null
+++ b/Documentation/perf_counter/perf-record.c
@@ -0,0 +1,530 @@
+
+
+#define _GNU_SOURCE
+#include <sys/types.h>
+#include <sys/stat.h>
+#include <sys/time.h>
+#include <unistd.h>
+#include <stdint.h>
+#include <stdlib.h>
+#include <string.h>
+#include <limits.h>
+#include <getopt.h>
+#include <assert.h>
+#include <fcntl.h>
+#include <stdio.h>
+#include <errno.h>
+#include <ctype.h>
+#include <time.h>
+#include <sched.h>
+#include <pthread.h>
+
+#include <sys/syscall.h>
+#include <sys/ioctl.h>
+#include <sys/poll.h>
+#include <sys/prctl.h>
+#include <sys/wait.h>
+#include <sys/uio.h>
+#include <sys/mman.h>
+
+#include <linux/unistd.h>
+#include <linux/types.h>
+
+#include "../../include/linux/perf_counter.h"
+
+
+/*
+ * prctl(PR_TASK_PERF_COUNTERS_DISABLE) will (cheaply) disable all
+ * counters in the current task.
+ */
+#define PR_TASK_PERF_COUNTERS_DISABLE 31
+#define PR_TASK_PERF_COUNTERS_ENABLE 32
+
+#define ARRAY_SIZE(x) (sizeof(x) / sizeof((x)[0]))
+
+#define rdclock() \
+({ \
+ struct timespec ts; \
+ \
+ clock_gettime(CLOCK_MONOTONIC, &ts); \
+ ts.tv_sec * 1000000000ULL + ts.tv_nsec; \
+})
+
+/*
+ * Pick up some kernel type conventions:
+ */
+#define __user
+#define asmlinkage
+
+#ifdef __x86_64__
+#define __NR_perf_counter_open 295
+#define rmb() asm volatile("lfence" ::: "memory")
+#define cpu_relax() asm volatile("rep; nop" ::: "memory");
+#endif
+
+#ifdef __i386__
+#define __NR_perf_counter_open 333
+#define rmb() asm volatile("lfence" ::: "memory")
+#define cpu_relax() asm volatile("rep; nop" ::: "memory");
+#endif
+
+#ifdef __powerpc__
+#define __NR_perf_counter_open 319
+#define rmb() asm volatile ("sync" ::: "memory")
+#define cpu_relax() asm volatile ("" ::: "memory");
+#endif
+
+#define unlikely(x) __builtin_expect(!!(x), 0)
+#define min(x, y) ({ \
+ typeof(x) _min1 = (x); \
+ typeof(y) _min2 = (y); \
+ (void) (&_min1 == &_min2); \
+ _min1 < _min2 ? _min1 : _min2; })
+
+asmlinkage int sys_perf_counter_open(
+ struct perf_counter_hw_event *hw_event_uptr __user,
+ pid_t pid,
+ int cpu,
+ int group_fd,
+ unsigned long flags)
+{
+ return syscall(
+ __NR_perf_counter_open, hw_event_uptr, pid, cpu, group_fd, flags);
+}
+
+#define MAX_COUNTERS 64
+#define MAX_NR_CPUS 256
+
+#define EID(type, id) (((__u64)(type) << PERF_COUNTER_TYPE_SHIFT) | (id))
+
+static int nr_counters = 0;
+static __u64 event_id[MAX_COUNTERS] = { };
+static int default_interval = 100000;
+static int event_count[MAX_COUNTERS];
+static int fd[MAX_NR_CPUS][MAX_COUNTERS];
+static int nr_cpus = 0;
+static unsigned int page_size;
+static unsigned int mmap_pages = 16;
+static int output;
+static char *output_name = "output.perf";
+static int group = 0;
+static unsigned int realtime_prio = 0;
+
+const unsigned int default_count[] = {
+ 1000000,
+ 1000000,
+ 10000,
+ 10000,
+ 1000000,
+ 10000,
+};
+
+static char *hw_event_names[] = {
+ "CPU cycles",
+ "instructions",
+ "cache references",
+ "cache misses",
+ "branches",
+ "branch misses",
+ "bus cycles",
+};
+
+static char *sw_event_names[] = {
+ "cpu clock ticks",
+ "task clock ticks",
+ "pagefaults",
+ "context switches",
+ "CPU migrations",
+ "minor faults",
+ "major faults",
+};
+
+struct event_symbol {
+ __u64 event;
+ char *symbol;
+};
+
+static struct event_symbol event_symbols[] = {
+ {EID(PERF_TYPE_HARDWARE, PERF_COUNT_CPU_CYCLES), "cpu-cycles", },
+ {EID(PERF_TYPE_HARDWARE, PERF_COUNT_CPU_CYCLES), "cycles", },
+ {EID(PERF_TYPE_HARDWARE, PERF_COUNT_INSTRUCTIONS), "instructions", },
+ {EID(PERF_TYPE_HARDWARE, PERF_COUNT_CACHE_REFERENCES), "cache-references", },
+ {EID(PERF_TYPE_HARDWARE, PERF_COUNT_CACHE_MISSES), "cache-misses", },
+ {EID(PERF_TYPE_HARDWARE, PERF_COUNT_BRANCH_INSTRUCTIONS), "branch-instructions", },
+ {EID(PERF_TYPE_HARDWARE, PERF_COUNT_BRANCH_INSTRUCTIONS), "branches", },
+ {EID(PERF_TYPE_HARDWARE, PERF_COUNT_BRANCH_MISSES), "branch-misses", },
+ {EID(PERF_TYPE_HARDWARE, PERF_COUNT_BUS_CYCLES), "bus-cycles", },
+
+ {EID(PERF_TYPE_SOFTWARE, PERF_COUNT_CPU_CLOCK), "cpu-clock", },
+ {EID(PERF_TYPE_SOFTWARE, PERF_COUNT_TASK_CLOCK), "task-clock", },
+ {EID(PERF_TYPE_SOFTWARE, PERF_COUNT_PAGE_FAULTS), "page-faults", },
+ {EID(PERF_TYPE_SOFTWARE, PERF_COUNT_PAGE_FAULTS), "faults", },
+ {EID(PERF_TYPE_SOFTWARE, PERF_COUNT_PAGE_FAULTS_MIN), "minor-faults", },
+ {EID(PERF_TYPE_SOFTWARE, PERF_COUNT_PAGE_FAULTS_MAJ), "major-faults", },
+ {EID(PERF_TYPE_SOFTWARE, PERF_COUNT_CONTEXT_SWITCHES), "context-switches", },
+ {EID(PERF_TYPE_SOFTWARE, PERF_COUNT_CONTEXT_SWITCHES), "cs", },
+ {EID(PERF_TYPE_SOFTWARE, PERF_COUNT_CPU_MIGRATIONS), "cpu-migrations", },
+ {EID(PERF_TYPE_SOFTWARE, PERF_COUNT_CPU_MIGRATIONS), "migrations", },
+};
+
+/*
+ * Each event can have multiple symbolic names.
+ * Symbolic names are (almost) exactly matched.
+ */
+static __u64 match_event_symbols(char *str)
+{
+ __u64 config, id;
+ int type;
+ unsigned int i;
+
+ if (sscanf(str, "r%llx", &config) == 1)
+ return config | PERF_COUNTER_RAW_MASK;
+
+ if (sscanf(str, "%d:%llu", &type, &id) == 2)
+ return EID(type, id);
+
+ for (i = 0; i < ARRAY_SIZE(event_symbols); i++) {
+ if (!strncmp(str, event_symbols[i].symbol,
+ strlen(event_symbols[i].symbol)))
+ return event_symbols[i].event;
+ }
+
+ return ~0ULL;
+}
+
+static int parse_events(char *str)
+{
+ __u64 config;
+
+again:
+ if (nr_counters == MAX_COUNTERS)
+ return -1;
+
+ config = match_event_symbols(str);
+ if (config == ~0ULL)
+ return -1;
+
+ event_id[nr_counters] = config;
+ nr_counters++;
+
+ str = strstr(str, ",");
+ if (str) {
+ str++;
+ goto again;
+ }
+
+ return 0;
+}
+
+#define __PERF_COUNTER_FIELD(config, name) \
+ ((config & PERF_COUNTER_##name##_MASK) >> PERF_COUNTER_##name##_SHIFT)
+
+#define PERF_COUNTER_RAW(config) __PERF_COUNTER_FIELD(config, RAW)
+#define PERF_COUNTER_CONFIG(config) __PERF_COUNTER_FIELD(config, CONFIG)
+#define PERF_COUNTER_TYPE(config) __PERF_COUNTER_FIELD(config, TYPE)
+#define PERF_COUNTER_ID(config) __PERF_COUNTER_FIELD(config, EVENT)
+
+static void display_events_help(void)
+{
+ unsigned int i;
+ __u64 e;
+
+ printf(
+ " -e EVENT --event=EVENT # symbolic-name abbreviations");
+
+ for (i = 0; i < ARRAY_SIZE(event_symbols); i++) {
+ int type, id;
+
+ e = event_symbols[i].event;
+ type = PERF_COUNTER_TYPE(e);
+ id = PERF_COUNTER_ID(e);
+
+ printf("\n %d:%d: %-20s",
+ type, id, event_symbols[i].symbol);
+ }
+
+ printf("\n"
+ " rNNN: raw PMU events (eventsel+umask)\n\n");
+}
+
+static void display_help(void)
+{
+ printf(
+ "Usage: perf-record [<options>]\n"
+ "perf-record Options (up to %d event types can be specified at once):\n\n",
+ MAX_COUNTERS);
+
+ display_events_help();
+
+ printf(
+ " -c CNT --count=CNT # event period to sample\n"
+ " -m pages --mmap_pages=<pages> # number of mmap data pages\n"
+ " -o file --output=<file> # output file\n"
+ " -r prio --realtime=<prio> # use RT prio\n"
+ );
+
+ exit(0);
+}
+
+static void process_options(int argc, char *argv[])
+{
+ int error = 0, counter;
+
+ for (;;) {
+ int option_index = 0;
+ /** Options for getopt */
+ static struct option long_options[] = {
+ {"count", required_argument, NULL, 'c'},
+ {"event", required_argument, NULL, 'e'},
+ {"mmap_pages", required_argument, NULL, 'm'},
+ {"output", required_argument, NULL, 'o'},
+ {"realtime", required_argument, NULL, 'r'},
+ {NULL, 0, NULL, 0 }
+ };
+ int c = getopt_long(argc, argv, "+:c:e:m:o:r:",
+ long_options, &option_index);
+ if (c == -1)
+ break;
+
+ switch (c) {
+ case 'c': default_interval = atoi(optarg); break;
+ case 'e': error = parse_events(optarg); break;
+ case 'm': mmap_pages = atoi(optarg); break;
+ case 'o': output_name = strdup(optarg); break;
+ case 'r': realtime_prio = atoi(optarg); break;
+ default: error = 1; break;
+ }
+ }
+ if (error)
+ display_help();
+
+ if (!nr_counters) {
+ nr_counters = 1;
+ event_id[0] = 0;
+ }
+
+ for (counter = 0; counter < nr_counters; counter++) {
+ if (event_count[counter])
+ continue;
+
+ event_count[counter] = default_interval;
+ }
+}
+
+struct mmap_data {
+ int counter;
+ void *base;
+ unsigned int mask;
+ unsigned int prev;
+};
+
+static unsigned int mmap_read_head(struct mmap_data *md)
+{
+ struct perf_counter_mmap_page *pc = md->base;
+ int head;
+
+ head = pc->data_head;
+ rmb();
+
+ return head;
+}
+
+static long events;
+static struct timeval last_read, this_read;
+
+static void mmap_read(struct mmap_data *md)
+{
+ unsigned int head = mmap_read_head(md);
+ unsigned int old = md->prev;
+ unsigned char *data = md->base + page_size;
+ unsigned long size;
+ void *buf;
+ int diff;
+
+ gettimeofday(&this_read, NULL);
+
+ /*
+ * If we're further behind than half the buffer, there's a chance
+ * the writer will bite our tail and screw up the events under us.
+ *
+ * If we somehow ended up ahead of the head, we got messed up.
+ *
+ * In either case, truncate and restart at head.
+ */
+ diff = head - old;
+ if (diff > md->mask / 2 || diff < 0) {
+ struct timeval iv;
+ unsigned long msecs;
+
+ timersub(&this_read, &last_read, &iv);
+ msecs = iv.tv_sec*1000 + iv.tv_usec/1000;
+
+ fprintf(stderr, "WARNING: failed to keep up with mmap data."
+ " Last read %lu msecs ago.\n", msecs);
+
+ /*
+ * head points to a known good entry, start there.
+ */
+ old = head;
+ }
+
+ last_read = this_read;
+
+ if (old != head)
+ events++;
+
+ size = head - old;
+
+ if ((old & md->mask) + size != (head & md->mask)) {
+ buf = &data[old & md->mask];
+ size = md->mask + 1 - (old & md->mask);
+ old += size;
+ while (size) {
+ int ret = write(output, buf, size);
+ if (ret < 0) {
+ perror("failed to write");
+ exit(-1);
+ }
+ size -= ret;
+ buf += ret;
+ }
+ }
+
+ buf = &data[old & md->mask];
+ size = head - old;
+ old += size;
+ while (size) {
+ int ret = write(output, buf, size);
+ if (ret < 0) {
+ perror("failed to write");
+ exit(-1);
+ }
+ size -= ret;
+ buf += ret;
+ }
+
+ md->prev = old;
+}
+
+static volatile int done = 0;
+
+static void sigchld_handler(int sig)
+{
+ if (sig == SIGCHLD)
+ done = 1;
+}
+
+int main(int argc, char *argv[])
+{
+ struct pollfd event_array[MAX_NR_CPUS * MAX_COUNTERS];
+ struct mmap_data mmap_array[MAX_NR_CPUS][MAX_COUNTERS];
+ struct perf_counter_hw_event hw_event;
+ int i, counter, group_fd, nr_poll = 0;
+ pid_t pid;
+ int ret;
+
+ page_size = sysconf(_SC_PAGE_SIZE);
+
+ process_options(argc, argv);
+
+ nr_cpus = sysconf(_SC_NPROCESSORS_ONLN);
+ assert(nr_cpus <= MAX_NR_CPUS);
+ assert(nr_cpus >= 0);
+
+ output = open(output_name, O_CREAT|O_RDWR, S_IRWXU);
+ if (output < 0) {
+ perror("failed to create output file");
+ exit(-1);
+ }
+
+ argc -= optind;
+ argv += optind;
+
+ for (i = 0; i < nr_cpus; i++) {
+ group_fd = -1;
+ for (counter = 0; counter < nr_counters; counter++) {
+
+ memset(&hw_event, 0, sizeof(hw_event));
+ hw_event.config = event_id[counter];
+ hw_event.irq_period = event_count[counter];
+ hw_event.record_type = PERF_RECORD_IP | PERF_RECORD_TID;
+ hw_event.nmi = 1;
+ hw_event.mmap = 1;
+ hw_event.comm = 1;
+
+ fd[i][counter] = sys_perf_counter_open(&hw_event, -1, i, group_fd, 0);
+ if (fd[i][counter] < 0) {
+ int err = errno;
+ printf("kerneltop error: syscall returned with %d (%s)\n",
+ fd[i][counter], strerror(err));
+ if (err == EPERM)
+ printf("Are you root?\n");
+ exit(-1);
+ }
+ assert(fd[i][counter] >= 0);
+ fcntl(fd[i][counter], F_SETFL, O_NONBLOCK);
+
+ /*
+ * First counter acts as the group leader:
+ */
+ if (group && group_fd == -1)
+ group_fd = fd[i][counter];
+
+ event_array[nr_poll].fd = fd[i][counter];
+ event_array[nr_poll].events = POLLIN;
+ nr_poll++;
+
+ mmap_array[i][counter].counter = counter;
+ mmap_array[i][counter].prev = 0;
+ mmap_array[i][counter].mask = mmap_pages*page_size - 1;
+ mmap_array[i][counter].base = mmap(NULL, (mmap_pages+1)*page_size,
+ PROT_READ, MAP_SHARED, fd[i][counter], 0);
+ if (mmap_array[i][counter].base == MAP_FAILED) {
+ printf("kerneltop error: failed to mmap with %d (%s)\n",
+ errno, strerror(errno));
+ exit(-1);
+ }
+ }
+ }
+
+ signal(SIGCHLD, sigchld_handler);
+
+ pid = fork();
+ if (pid < 0)
+ perror("failed to fork");
+
+ if (!pid) {
+ if (execvp(argv[0], argv)) {
+ perror(argv[0]);
+ exit(-1);
+ }
+ }
+
+ if (realtime_prio) {
+ struct sched_param param;
+
+ param.sched_priority = realtime_prio;
+ if (sched_setscheduler(0, SCHED_FIFO, ¶m)) {
+ printf("Could not set realtime priority.\n");
+ exit(-1);
+ }
+ }
+
+ /*
+ * TODO: store the current /proc/$/maps information somewhere
+ */
+
+ while (!done) {
+ int hits = events;
+
+ for (i = 0; i < nr_cpus; i++) {
+ for (counter = 0; counter < nr_counters; counter++)
+ mmap_read(&mmap_array[i][counter]);
+ }
+
+ if (hits == events)
+ ret = poll(event_array, nr_poll, 100);
+ }
+
+ return 0;
+}
diff --git a/Documentation/perf_counter/perf-report.cc b/Documentation/perf_counter/perf-report.cc
new file mode 100644
index 0000000..09da0ba
--- /dev/null
+++ b/Documentation/perf_counter/perf-report.cc
@@ -0,0 +1,472 @@
+#define _GNU_SOURCE
+#include <sys/types.h>
+#include <sys/stat.h>
+#include <sys/time.h>
+#include <unistd.h>
+#include <stdint.h>
+#include <stdlib.h>
+#include <string.h>
+#include <limits.h>
+#include <fcntl.h>
+#include <stdio.h>
+#include <errno.h>
+#include <ctype.h>
+#include <time.h>
+#include <getopt.h>
+
+#include <sys/ioctl.h>
+#include <sys/poll.h>
+#include <sys/prctl.h>
+#include <sys/wait.h>
+#include <sys/mman.h>
+#include <sys/types.h>
+#include <sys/stat.h>
+
+#include <linux/unistd.h>
+#include <linux/types.h>
+
+#include "../../include/linux/perf_counter.h"
+
+#include <set>
+#include <map>
+#include <string>
+
+
+static char const *input_name = "output.perf";
+static int input;
+
+static unsigned long page_size;
+static unsigned long mmap_window = 32;
+
+struct ip_event {
+ struct perf_event_header header;
+ __u64 ip;
+ __u32 pid, tid;
+};
+struct mmap_event {
+ struct perf_event_header header;
+ __u32 pid, tid;
+ __u64 start;
+ __u64 len;
+ __u64 pgoff;
+ char filename[PATH_MAX];
+};
+struct comm_event {
+ struct perf_event_header header;
+ __u32 pid,tid;
+ char comm[16];
+};
+
+typedef union event_union {
+ struct perf_event_header header;
+ struct ip_event ip;
+ struct mmap_event mmap;
+ struct comm_event comm;
+} event_t;
+
+struct section {
+ uint64_t start;
+ uint64_t end;
+
+ uint64_t offset;
+
+ std::string name;
+
+ section() { };
+
+ section(uint64_t stab) : end(stab) { };
+
+ section(uint64_t start, uint64_t size, uint64_t offset, std::string name) :
+ start(start), end(start + size), offset(offset), name(name)
+ { };
+
+ bool operator < (const struct section &s) const {
+ return end < s.end;
+ };
+};
+
+typedef std::set<struct section> sections_t;
+
+struct symbol {
+ uint64_t start;
+ uint64_t end;
+
+ std::string name;
+
+ symbol() { };
+
+ symbol(uint64_t ip) : start(ip) { }
+
+ symbol(uint64_t start, uint64_t len, std::string name) :
+ start(start), end(start + len), name(name)
+ { };
+
+ bool operator < (const struct symbol &s) const {
+ return start < s.start;
+ };
+};
+
+typedef std::set<struct symbol> symbols_t;
+
+struct dso {
+ sections_t sections;
+ symbols_t syms;
+};
+
+static std::map<std::string, struct dso> dsos;
+
+static void load_dso_sections(std::string dso_name)
+{
+ struct dso &dso = dsos[dso_name];
+
+ std::string cmd = "readelf -DSW " + dso_name;
+
+ FILE *file = popen(cmd.c_str(), "r");
+ if (!file) {
+ perror("failed to open pipe");
+ exit(-1);
+ }
+
+ char *line = NULL;
+ size_t n = 0;
+
+ while (!feof(file)) {
+ uint64_t addr, off, size;
+ char name[32];
+
+ if (getline(&line, &n, file) < 0)
+ break;
+ if (!line)
+ break;
+
+ if (sscanf(line, " [%*2d] %16s %*14s %Lx %Lx %Lx",
+ name, &addr, &off, &size) == 4) {
+
+ dso.sections.insert(section(addr, size, addr - off, name));
+ }
+#if 0
+ /*
+ * for reading readelf symbols (-s), however these don't seem
+ * to include nearly everything, so use nm for that.
+ */
+ if (sscanf(line, " %*4d %*3d: %Lx %5Lu %*7s %*6s %*7s %3d %s",
+ &start, &size, §ion, sym) == 4) {
+
+ start -= dso.section_offsets[section];
+
+ dso.syms.insert(symbol(start, size, std::string(sym)));
+ }
+#endif
+ }
+ pclose(file);
+}
+
+static void load_dso_symbols(std::string dso_name, std::string args)
+{
+ struct dso &dso = dsos[dso_name];
+
+ std::string cmd = "nm -nSC " + args + " " + dso_name;
+
+ FILE *file = popen(cmd.c_str(), "r");
+ if (!file) {
+ perror("failed to open pipe");
+ exit(-1);
+ }
+
+ char *line = NULL;
+ size_t n = 0;
+
+ while (!feof(file)) {
+ uint64_t start, size;
+ char c;
+ char sym[1024];
+
+ if (getline(&line, &n, file) < 0)
+ break;
+ if (!line)
+ break;
+
+
+ if (sscanf(line, "%Lx %Lx %c %s", &start, &size, &c, sym) == 4) {
+ sections_t::const_iterator si =
+ dso.sections.upper_bound(section(start));
+ if (si == dso.sections.end()) {
+ printf("symbol in unknown section: %s\n", sym);
+ continue;
+ }
+
+ start -= si->offset;
+
+ dso.syms.insert(symbol(start, size, sym));
+ }
+ }
+ pclose(file);
+}
+
+static void load_dso(std::string dso_name)
+{
+ load_dso_sections(dso_name);
+ load_dso_symbols(dso_name, "-D"); /* dynamic symbols */
+ load_dso_symbols(dso_name, ""); /* regular ones */
+}
+
+void load_kallsyms(void)
+{
+ struct dso &dso = dsos["[kernel]"];
+
+ FILE *file = fopen("/proc/kallsyms", "r");
+ if (!file) {
+ perror("failed to open kallsyms");
+ exit(-1);
+ }
+
+ char *line;
+ size_t n;
+
+ while (!feof(file)) {
+ uint64_t start;
+ char c;
+ char sym[1024];
+
+ if (getline(&line, &n, file) < 0)
+ break;
+ if (!line)
+ break;
+
+ if (sscanf(line, "%Lx %c %s", &start, &c, sym) == 3)
+ dso.syms.insert(symbol(start, 0x1000000, std::string(sym)));
+ }
+ fclose(file);
+}
+
+struct map {
+ uint64_t start;
+ uint64_t end;
+ uint64_t pgoff;
+
+ std::string dso;
+
+ map() { };
+
+ map(uint64_t ip) : end(ip) { }
+
+ map(mmap_event *mmap) {
+ start = mmap->start;
+ end = mmap->start + mmap->len;
+ pgoff = mmap->pgoff;
+
+ dso = std::string(mmap->filename);
+
+ if (dsos.find(dso) == dsos.end())
+ load_dso(dso);
+ };
+
+ bool operator < (const struct map &m) const {
+ return end < m.end;
+ };
+};
+
+typedef std::set<struct map> maps_t;
+
+static std::map<int, maps_t> maps;
+
+static std::map<int, std::string> comms;
+
+static std::map<std::string, int> hist;
+static std::multimap<int, std::string> rev_hist;
+
+static std::string resolve_comm(int pid)
+{
+ std::string comm = "<unknown>";
+ std::map<int, std::string>::const_iterator ci = comms.find(pid);
+ if (ci != comms.end())
+ comm = ci->second;
+
+ return comm;
+}
+
+static std::string resolve_user_symbol(int pid, uint64_t ip)
+{
+ std::string sym = "<unknown>";
+
+ maps_t &m = maps[pid];
+ maps_t::const_iterator mi = m.upper_bound(map(ip));
+ if (mi == m.end())
+ return sym;
+
+ ip -= mi->start + mi->pgoff;
+
+ symbols_t &s = dsos[mi->dso].syms;
+ symbols_t::const_iterator si = s.upper_bound(symbol(ip));
+
+ sym = mi->dso + ": <unknown>";
+
+ if (si == s.begin())
+ return sym;
+ si--;
+
+ if (si->start <= ip && ip < si->end)
+ sym = mi->dso + ": " + si->name;
+#if 0
+ else if (si->start <= ip)
+ sym = mi->dso + ": ?" + si->name;
+#endif
+
+ return sym;
+}
+
+static std::string resolve_kernel_symbol(uint64_t ip)
+{
+ std::string sym = "<unknown>";
+
+ symbols_t &s = dsos["[kernel]"].syms;
+ symbols_t::const_iterator si = s.upper_bound(symbol(ip));
+
+ if (si == s.begin())
+ return sym;
+ si--;
+
+ if (si->start <= ip && ip < si->end)
+ sym = si->name;
+
+ return sym;
+}
+
+static void display_help(void)
+{
+ printf(
+ "Usage: perf-report [<options>]\n"
+ " -i file --input=<file> # input file\n"
+ );
+
+ exit(0);
+}
+
+static void process_options(int argc, char *argv[])
+{
+ int error = 0;
+
+ for (;;) {
+ int option_index = 0;
+ /** Options for getopt */
+ static struct option long_options[] = {
+ {"input", required_argument, NULL, 'i'},
+ {NULL, 0, NULL, 0 }
+ };
+ int c = getopt_long(argc, argv, "+:i:",
+ long_options, &option_index);
+ if (c == -1)
+ break;
+
+ switch (c) {
+ case 'i': input_name = strdup(optarg); break;
+ default: error = 1; break;
+ }
+ }
+
+ if (error)
+ display_help();
+}
+
+int main(int argc, char *argv[])
+{
+ unsigned long offset = 0;
+ unsigned long head = 0;
+ struct stat stat;
+ char *buf;
+ event_t *event;
+ int ret;
+ unsigned long total = 0;
+
+ page_size = getpagesize();
+
+ process_options(argc, argv);
+
+ input = open(input_name, O_RDONLY);
+ if (input < 0) {
+ perror("failed to open file");
+ exit(-1);
+ }
+
+ ret = fstat(input, &stat);
+ if (ret < 0) {
+ perror("failed to stat file");
+ exit(-1);
+ }
+
+ load_kallsyms();
+
+remap:
+ buf = (char *)mmap(NULL, page_size * mmap_window, PROT_READ,
+ MAP_SHARED, input, offset);
+ if (buf == MAP_FAILED) {
+ perror("failed to mmap file");
+ exit(-1);
+ }
+
+more:
+ event = (event_t *)(buf + head);
+
+ if (head + event->header.size >= page_size * mmap_window) {
+ unsigned long shift = page_size * (head / page_size);
+
+ munmap(buf, page_size * mmap_window);
+ offset += shift;
+ head -= shift;
+ goto remap;
+ }
+ head += event->header.size;
+
+ if (event->header.misc & PERF_EVENT_MISC_OVERFLOW) {
+ std::string comm, sym, level;
+ char output[1024];
+
+ if (event->header.misc & PERF_EVENT_MISC_KERNEL) {
+ level = "[kernel]";
+ sym = resolve_kernel_symbol(event->ip.ip);
+ } else if (event->header.misc & PERF_EVENT_MISC_USER) {
+ level = "[ user ]";
+ sym = resolve_user_symbol(event->ip.pid, event->ip.ip);
+ } else {
+ level = "[ hv ]";
+ }
+ comm = resolve_comm(event->ip.pid);
+
+ snprintf(output, sizeof(output), "%16s %s %s",
+ comm.c_str(), level.c_str(), sym.c_str());
+ hist[output]++;
+
+ total++;
+
+ } else switch (event->header.type) {
+ case PERF_EVENT_MMAP:
+ maps[event->mmap.pid].insert(map(&event->mmap));
+ break;
+
+ case PERF_EVENT_COMM:
+ comms[event->comm.pid] = std::string(event->comm.comm);
+ break;
+ }
+
+ if (offset + head < stat.st_size)
+ goto more;
+
+ close(input);
+
+ std::map<std::string, int>::iterator hi = hist.begin();
+
+ while (hi != hist.end()) {
+ rev_hist.insert(std::pair<int, std::string>(hi->second, hi->first));
+ hist.erase(hi++);
+ }
+
+ std::multimap<int, std::string>::const_iterator ri = rev_hist.begin();
+
+ while (ri != rev_hist.end()) {
+ printf(" %5.2f %s\n", (100.0 * ri->first)/total, ri->second.c_str());
+ ri++;
+ }
+
+ return 0;
+}
+
^ permalink raw reply related [flat|nested] 24+ messages in thread
* [tip:perfcounters/core] perf_counter: move PERF_RECORD_TIME
2009-04-08 13:01 ` [PATCH 8/9] perf_counter: move PERF_RECORD_TIME Peter Zijlstra
2009-04-08 16:58 ` [tip:perfcounters/core] " Peter Zijlstra
@ 2009-04-08 17:09 ` Peter Zijlstra
1 sibling, 0 replies; 24+ messages in thread
From: Peter Zijlstra @ 2009-04-08 17:09 UTC (permalink / raw)
To: linux-tip-commits
Cc: linux-kernel, paulus, hpa, mingo, a.p.zijlstra, tglx, cjashfor, mingo
Commit-ID: 4d855457d84b819fefcd1cd1b0a2a0a0ec475c07
Gitweb: http://git.kernel.org/tip/4d855457d84b819fefcd1cd1b0a2a0a0ec475c07
Author: Peter Zijlstra <a.p.zijlstra@chello.nl>
AuthorDate: Wed, 8 Apr 2009 15:01:32 +0200
Committer: Ingo Molnar <mingo@elte.hu>
CommitDate: Wed, 8 Apr 2009 19:05:55 +0200
perf_counter: move PERF_RECORD_TIME
Move PERF_RECORD_TIME so that all the fixed length items come before
the variable length ones.
Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl>
Cc: Paul Mackerras <paulus@samba.org>
Cc: Corey Ashford <cjashfor@linux.vnet.ibm.com>
LKML-Reference: <20090408130409.307926436@chello.nl>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
---
include/linux/perf_counter.h | 9 ++++-----
kernel/perf_counter.c | 26 +++++++++++++-------------
2 files changed, 17 insertions(+), 18 deletions(-)
diff --git a/include/linux/perf_counter.h b/include/linux/perf_counter.h
index a70a55f..8bd1be5 100644
--- a/include/linux/perf_counter.h
+++ b/include/linux/perf_counter.h
@@ -100,9 +100,9 @@ enum sw_event_ids {
enum perf_counter_record_format {
PERF_RECORD_IP = 1U << 0,
PERF_RECORD_TID = 1U << 1,
- PERF_RECORD_GROUP = 1U << 2,
- PERF_RECORD_CALLCHAIN = 1U << 3,
- PERF_RECORD_TIME = 1U << 4,
+ PERF_RECORD_TIME = 1U << 2,
+ PERF_RECORD_GROUP = 1U << 3,
+ PERF_RECORD_CALLCHAIN = 1U << 4,
};
/*
@@ -250,6 +250,7 @@ enum perf_event_type {
*
* { u64 ip; } && PERF_RECORD_IP
* { u32 pid, tid; } && PERF_RECORD_TID
+ * { u64 time; } && PERF_RECORD_TIME
*
* { u64 nr;
* { u64 event, val; } cnt[nr]; } && PERF_RECORD_GROUP
@@ -259,8 +260,6 @@ enum perf_event_type {
* kernel,
* user;
* u64 ips[nr]; } && PERF_RECORD_CALLCHAIN
- *
- * { u64 time; } && PERF_RECORD_TIME
* };
*/
};
diff --git a/kernel/perf_counter.c b/kernel/perf_counter.c
index 2d4aebb..4dc8600 100644
--- a/kernel/perf_counter.c
+++ b/kernel/perf_counter.c
@@ -1850,6 +1850,16 @@ static void perf_counter_output(struct perf_counter *counter,
header.size += sizeof(tid_entry);
}
+ if (record_type & PERF_RECORD_TIME) {
+ /*
+ * Maybe do better on x86 and provide cpu_clock_nmi()
+ */
+ time = sched_clock();
+
+ header.type |= PERF_RECORD_TIME;
+ header.size += sizeof(u64);
+ }
+
if (record_type & PERF_RECORD_GROUP) {
header.type |= PERF_RECORD_GROUP;
header.size += sizeof(u64) +
@@ -1867,16 +1877,6 @@ static void perf_counter_output(struct perf_counter *counter,
}
}
- if (record_type & PERF_RECORD_TIME) {
- /*
- * Maybe do better on x86 and provide cpu_clock_nmi()
- */
- time = sched_clock();
-
- header.type |= PERF_RECORD_TIME;
- header.size += sizeof(u64);
- }
-
ret = perf_output_begin(&handle, counter, header.size, nmi, 1);
if (ret)
return;
@@ -1889,6 +1889,9 @@ static void perf_counter_output(struct perf_counter *counter,
if (record_type & PERF_RECORD_TID)
perf_output_put(&handle, tid_entry);
+ if (record_type & PERF_RECORD_TIME)
+ perf_output_put(&handle, time);
+
if (record_type & PERF_RECORD_GROUP) {
struct perf_counter *leader, *sub;
u64 nr = counter->nr_siblings;
@@ -1910,9 +1913,6 @@ static void perf_counter_output(struct perf_counter *counter,
if (callchain)
perf_output_copy(&handle, callchain, callchain_size);
- if (record_type & PERF_RECORD_TIME)
- perf_output_put(&handle, time);
-
perf_output_end(&handle);
}
^ permalink raw reply related [flat|nested] 24+ messages in thread
* [tip:perfcounters/core] perf_counter: allow for data addresses to be recorded
2009-04-08 13:01 ` [PATCH 9/9] perf_counter: allow for data addresses to be recorded Peter Zijlstra
2009-04-08 16:59 ` [tip:perfcounters/core] " Peter Zijlstra
@ 2009-04-08 17:10 ` Peter Zijlstra
1 sibling, 0 replies; 24+ messages in thread
From: Peter Zijlstra @ 2009-04-08 17:10 UTC (permalink / raw)
To: linux-tip-commits
Cc: linux-kernel, paulus, hpa, mingo, a.p.zijlstra, tglx, cjashfor, mingo
Commit-ID: 78f13e9525ba777da25c4ddab89f28e9366a8b7c
Gitweb: http://git.kernel.org/tip/78f13e9525ba777da25c4ddab89f28e9366a8b7c
Author: Peter Zijlstra <a.p.zijlstra@chello.nl>
AuthorDate: Wed, 8 Apr 2009 15:01:33 +0200
Committer: Ingo Molnar <mingo@elte.hu>
CommitDate: Wed, 8 Apr 2009 19:05:56 +0200
perf_counter: allow for data addresses to be recorded
Paul suggested we allow for data addresses to be recorded along with
the traditional IPs as power can provide these.
For now, only the software pagefault events provide data addresses,
but in the future power might as well for some events.
x86 doesn't seem capable of providing this atm.
Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl>
Cc: Paul Mackerras <paulus@samba.org>
Cc: Corey Ashford <cjashfor@linux.vnet.ibm.com>
LKML-Reference: <20090408130409.394816925@chello.nl>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
---
arch/powerpc/kernel/perf_counter.c | 2 +-
arch/powerpc/mm/fault.c | 8 ++++--
arch/x86/kernel/cpu/perf_counter.c | 2 +-
arch/x86/mm/fault.c | 8 ++++--
include/linux/perf_counter.h | 14 ++++++----
kernel/perf_counter.c | 46 ++++++++++++++++++++++-------------
6 files changed, 49 insertions(+), 31 deletions(-)
diff --git a/arch/powerpc/kernel/perf_counter.c b/arch/powerpc/kernel/perf_counter.c
index 0697ade..c9d019f 100644
--- a/arch/powerpc/kernel/perf_counter.c
+++ b/arch/powerpc/kernel/perf_counter.c
@@ -749,7 +749,7 @@ static void record_and_restart(struct perf_counter *counter, long val,
* Finally record data if requested.
*/
if (record)
- perf_counter_overflow(counter, 1, regs);
+ perf_counter_overflow(counter, 1, regs, 0);
}
/*
diff --git a/arch/powerpc/mm/fault.c b/arch/powerpc/mm/fault.c
index 17bbf6f..ac0e112 100644
--- a/arch/powerpc/mm/fault.c
+++ b/arch/powerpc/mm/fault.c
@@ -171,7 +171,7 @@ int __kprobes do_page_fault(struct pt_regs *regs, unsigned long address,
die("Weird page fault", regs, SIGSEGV);
}
- perf_swcounter_event(PERF_COUNT_PAGE_FAULTS, 1, 0, regs);
+ perf_swcounter_event(PERF_COUNT_PAGE_FAULTS, 1, 0, regs, address);
/* When running in the kernel we expect faults to occur only to
* addresses in user space. All other faults represent errors in the
@@ -312,7 +312,8 @@ good_area:
}
if (ret & VM_FAULT_MAJOR) {
current->maj_flt++;
- perf_swcounter_event(PERF_COUNT_PAGE_FAULTS_MAJ, 1, 0, regs);
+ perf_swcounter_event(PERF_COUNT_PAGE_FAULTS_MAJ, 1, 0,
+ regs, address);
#ifdef CONFIG_PPC_SMLPAR
if (firmware_has_feature(FW_FEATURE_CMO)) {
preempt_disable();
@@ -322,7 +323,8 @@ good_area:
#endif
} else {
current->min_flt++;
- perf_swcounter_event(PERF_COUNT_PAGE_FAULTS_MIN, 1, 0, regs);
+ perf_swcounter_event(PERF_COUNT_PAGE_FAULTS_MIN, 1, 0,
+ regs, address);
}
up_read(&mm->mmap_sem);
return 0;
diff --git a/arch/x86/kernel/cpu/perf_counter.c b/arch/x86/kernel/cpu/perf_counter.c
index 1116a41..0fcbaab 100644
--- a/arch/x86/kernel/cpu/perf_counter.c
+++ b/arch/x86/kernel/cpu/perf_counter.c
@@ -800,7 +800,7 @@ again:
continue;
perf_save_and_restart(counter);
- if (perf_counter_overflow(counter, nmi, regs))
+ if (perf_counter_overflow(counter, nmi, regs, 0))
__pmc_generic_disable(counter, &counter->hw, bit);
}
diff --git a/arch/x86/mm/fault.c b/arch/x86/mm/fault.c
index f2d3324..6f9df2b 100644
--- a/arch/x86/mm/fault.c
+++ b/arch/x86/mm/fault.c
@@ -1045,7 +1045,7 @@ do_page_fault(struct pt_regs *regs, unsigned long error_code)
if (unlikely(error_code & PF_RSVD))
pgtable_bad(regs, error_code, address);
- perf_swcounter_event(PERF_COUNT_PAGE_FAULTS, 1, 0, regs);
+ perf_swcounter_event(PERF_COUNT_PAGE_FAULTS, 1, 0, regs, address);
/*
* If we're in an interrupt, have no user context or are running
@@ -1142,10 +1142,12 @@ good_area:
if (fault & VM_FAULT_MAJOR) {
tsk->maj_flt++;
- perf_swcounter_event(PERF_COUNT_PAGE_FAULTS_MAJ, 1, 0, regs);
+ perf_swcounter_event(PERF_COUNT_PAGE_FAULTS_MAJ, 1, 0,
+ regs, address);
} else {
tsk->min_flt++;
- perf_swcounter_event(PERF_COUNT_PAGE_FAULTS_MIN, 1, 0, regs);
+ perf_swcounter_event(PERF_COUNT_PAGE_FAULTS_MIN, 1, 0,
+ regs, address);
}
check_v8086_mode(regs, address, tsk);
diff --git a/include/linux/perf_counter.h b/include/linux/perf_counter.h
index 8bd1be5..c22363a 100644
--- a/include/linux/perf_counter.h
+++ b/include/linux/perf_counter.h
@@ -101,8 +101,9 @@ enum perf_counter_record_format {
PERF_RECORD_IP = 1U << 0,
PERF_RECORD_TID = 1U << 1,
PERF_RECORD_TIME = 1U << 2,
- PERF_RECORD_GROUP = 1U << 3,
- PERF_RECORD_CALLCHAIN = 1U << 4,
+ PERF_RECORD_ADDR = 1U << 3,
+ PERF_RECORD_GROUP = 1U << 4,
+ PERF_RECORD_CALLCHAIN = 1U << 5,
};
/*
@@ -251,6 +252,7 @@ enum perf_event_type {
* { u64 ip; } && PERF_RECORD_IP
* { u32 pid, tid; } && PERF_RECORD_TID
* { u64 time; } && PERF_RECORD_TIME
+ * { u64 addr; } && PERF_RECORD_ADDR
*
* { u64 nr;
* { u64 event, val; } cnt[nr]; } && PERF_RECORD_GROUP
@@ -537,7 +539,7 @@ extern int hw_perf_group_sched_in(struct perf_counter *group_leader,
extern void perf_counter_update_userpage(struct perf_counter *counter);
extern int perf_counter_overflow(struct perf_counter *counter,
- int nmi, struct pt_regs *regs);
+ int nmi, struct pt_regs *regs, u64 addr);
/*
* Return 1 for a software counter, 0 for a hardware counter
*/
@@ -547,7 +549,7 @@ static inline int is_software_counter(struct perf_counter *counter)
perf_event_type(&counter->hw_event) != PERF_TYPE_HARDWARE;
}
-extern void perf_swcounter_event(u32, u64, int, struct pt_regs *);
+extern void perf_swcounter_event(u32, u64, int, struct pt_regs *, u64);
extern void perf_counter_mmap(unsigned long addr, unsigned long len,
unsigned long pgoff, struct file *file);
@@ -584,8 +586,8 @@ static inline int perf_counter_task_disable(void) { return -EINVAL; }
static inline int perf_counter_task_enable(void) { return -EINVAL; }
static inline void
-perf_swcounter_event(u32 event, u64 nr, int nmi, struct pt_regs *regs) { }
-
+perf_swcounter_event(u32 event, u64 nr, int nmi,
+ struct pt_regs *regs, u64 addr) { }
static inline void
perf_counter_mmap(unsigned long addr, unsigned long len,
diff --git a/kernel/perf_counter.c b/kernel/perf_counter.c
index 4dc8600..321c57e 100644
--- a/kernel/perf_counter.c
+++ b/kernel/perf_counter.c
@@ -800,7 +800,7 @@ void perf_counter_task_sched_out(struct task_struct *task, int cpu)
update_context_time(ctx);
regs = task_pt_regs(task);
- perf_swcounter_event(PERF_COUNT_CONTEXT_SWITCHES, 1, 1, regs);
+ perf_swcounter_event(PERF_COUNT_CONTEXT_SWITCHES, 1, 1, regs, 0);
__perf_counter_sched_out(ctx, cpuctx);
cpuctx->task_ctx = NULL;
@@ -1810,7 +1810,7 @@ static void perf_output_end(struct perf_output_handle *handle)
}
static void perf_counter_output(struct perf_counter *counter,
- int nmi, struct pt_regs *regs)
+ int nmi, struct pt_regs *regs, u64 addr)
{
int ret;
u64 record_type = counter->hw_event.record_type;
@@ -1860,6 +1860,11 @@ static void perf_counter_output(struct perf_counter *counter,
header.size += sizeof(u64);
}
+ if (record_type & PERF_RECORD_ADDR) {
+ header.type |= PERF_RECORD_ADDR;
+ header.size += sizeof(u64);
+ }
+
if (record_type & PERF_RECORD_GROUP) {
header.type |= PERF_RECORD_GROUP;
header.size += sizeof(u64) +
@@ -1892,6 +1897,9 @@ static void perf_counter_output(struct perf_counter *counter,
if (record_type & PERF_RECORD_TIME)
perf_output_put(&handle, time);
+ if (record_type & PERF_RECORD_ADDR)
+ perf_output_put(&handle, addr);
+
if (record_type & PERF_RECORD_GROUP) {
struct perf_counter *leader, *sub;
u64 nr = counter->nr_siblings;
@@ -2158,7 +2166,7 @@ void perf_counter_munmap(unsigned long addr, unsigned long len,
*/
int perf_counter_overflow(struct perf_counter *counter,
- int nmi, struct pt_regs *regs)
+ int nmi, struct pt_regs *regs, u64 addr)
{
int events = atomic_read(&counter->event_limit);
int ret = 0;
@@ -2175,7 +2183,7 @@ int perf_counter_overflow(struct perf_counter *counter,
perf_counter_disable(counter);
}
- perf_counter_output(counter, nmi, regs);
+ perf_counter_output(counter, nmi, regs, addr);
return ret;
}
@@ -2240,7 +2248,7 @@ static enum hrtimer_restart perf_swcounter_hrtimer(struct hrtimer *hrtimer)
regs = task_pt_regs(current);
if (regs) {
- if (perf_counter_overflow(counter, 0, regs))
+ if (perf_counter_overflow(counter, 0, regs, 0))
ret = HRTIMER_NORESTART;
}
@@ -2250,11 +2258,11 @@ static enum hrtimer_restart perf_swcounter_hrtimer(struct hrtimer *hrtimer)
}
static void perf_swcounter_overflow(struct perf_counter *counter,
- int nmi, struct pt_regs *regs)
+ int nmi, struct pt_regs *regs, u64 addr)
{
perf_swcounter_update(counter);
perf_swcounter_set_period(counter);
- if (perf_counter_overflow(counter, nmi, regs))
+ if (perf_counter_overflow(counter, nmi, regs, addr))
/* soft-disable the counter */
;
@@ -2286,16 +2294,17 @@ static int perf_swcounter_match(struct perf_counter *counter,
}
static void perf_swcounter_add(struct perf_counter *counter, u64 nr,
- int nmi, struct pt_regs *regs)
+ int nmi, struct pt_regs *regs, u64 addr)
{
int neg = atomic64_add_negative(nr, &counter->hw.count);
if (counter->hw.irq_period && !neg)
- perf_swcounter_overflow(counter, nmi, regs);
+ perf_swcounter_overflow(counter, nmi, regs, addr);
}
static void perf_swcounter_ctx_event(struct perf_counter_context *ctx,
enum perf_event_types type, u32 event,
- u64 nr, int nmi, struct pt_regs *regs)
+ u64 nr, int nmi, struct pt_regs *regs,
+ u64 addr)
{
struct perf_counter *counter;
@@ -2305,7 +2314,7 @@ static void perf_swcounter_ctx_event(struct perf_counter_context *ctx,
rcu_read_lock();
list_for_each_entry_rcu(counter, &ctx->event_list, event_entry) {
if (perf_swcounter_match(counter, type, event, regs))
- perf_swcounter_add(counter, nr, nmi, regs);
+ perf_swcounter_add(counter, nr, nmi, regs, addr);
}
rcu_read_unlock();
}
@@ -2325,7 +2334,8 @@ static int *perf_swcounter_recursion_context(struct perf_cpu_context *cpuctx)
}
static void __perf_swcounter_event(enum perf_event_types type, u32 event,
- u64 nr, int nmi, struct pt_regs *regs)
+ u64 nr, int nmi, struct pt_regs *regs,
+ u64 addr)
{
struct perf_cpu_context *cpuctx = &get_cpu_var(perf_cpu_context);
int *recursion = perf_swcounter_recursion_context(cpuctx);
@@ -2336,10 +2346,11 @@ static void __perf_swcounter_event(enum perf_event_types type, u32 event,
(*recursion)++;
barrier();
- perf_swcounter_ctx_event(&cpuctx->ctx, type, event, nr, nmi, regs);
+ perf_swcounter_ctx_event(&cpuctx->ctx, type, event,
+ nr, nmi, regs, addr);
if (cpuctx->task_ctx) {
perf_swcounter_ctx_event(cpuctx->task_ctx, type, event,
- nr, nmi, regs);
+ nr, nmi, regs, addr);
}
barrier();
@@ -2349,9 +2360,10 @@ out:
put_cpu_var(perf_cpu_context);
}
-void perf_swcounter_event(u32 event, u64 nr, int nmi, struct pt_regs *regs)
+void
+perf_swcounter_event(u32 event, u64 nr, int nmi, struct pt_regs *regs, u64 addr)
{
- __perf_swcounter_event(PERF_TYPE_SOFTWARE, event, nr, nmi, regs);
+ __perf_swcounter_event(PERF_TYPE_SOFTWARE, event, nr, nmi, regs, addr);
}
static void perf_swcounter_read(struct perf_counter *counter)
@@ -2548,7 +2560,7 @@ void perf_tpcounter_event(int event_id)
if (!regs)
regs = task_pt_regs(current);
- __perf_swcounter_event(PERF_TYPE_TRACEPOINT, event_id, 1, 1, regs);
+ __perf_swcounter_event(PERF_TYPE_TRACEPOINT, event_id, 1, 1, regs, 0);
}
extern int ftrace_profile_enable(int);
^ permalink raw reply related [flat|nested] 24+ messages in thread
end of thread, other threads:[~2009-04-08 17:12 UTC | newest]
Thread overview: 24+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2009-04-08 13:01 [PATCH 0/9] yet another batch of perf_counter patches Peter Zijlstra
2009-04-08 13:01 ` [PATCH 1/9] perf_counter: fix NMI race in task clock Peter Zijlstra
2009-04-08 16:57 ` [tip:perfcounters/core] " Peter Zijlstra
2009-04-08 13:01 ` [PATCH 2/9] perf_counter: provide misc bits in the event header Peter Zijlstra
2009-04-08 16:57 ` [tip:perfcounters/core] " Peter Zijlstra
2009-04-08 13:01 ` [PATCH 3/9] perf_counter: use misc field to widen type Peter Zijlstra
2009-04-08 16:57 ` [tip:perfcounters/core] " Peter Zijlstra
2009-04-08 13:01 ` [PATCH 4/9] perf_counter: kerneltop: keep up with ABI changes Peter Zijlstra
2009-04-08 16:58 ` [tip:perfcounters/core] " Peter Zijlstra
2009-04-08 13:01 ` [PATCH 5/9] perf_counter: add some comments Peter Zijlstra
2009-04-08 16:58 ` [tip:perfcounters/core] " Peter Zijlstra
2009-04-08 13:01 ` [PATCH 6/9] perf_counter: track task-comm data Peter Zijlstra
2009-04-08 16:58 ` [tip:perfcounters/core] " Peter Zijlstra
2009-04-08 17:03 ` [PATCH 6.5/9] perf_counter: fix " Peter Zijlstra
2009-04-08 17:09 ` [tip:perfcounters/core] perf_counter: " Peter Zijlstra
2009-04-08 13:01 ` [PATCH 7/9] perf_counter: some simple userspace profiling Peter Zijlstra
2009-04-08 16:58 ` [tip:perfcounters/core] " Peter Zijlstra
2009-04-08 17:09 ` Peter Zijlstra
2009-04-08 13:01 ` [PATCH 8/9] perf_counter: move PERF_RECORD_TIME Peter Zijlstra
2009-04-08 16:58 ` [tip:perfcounters/core] " Peter Zijlstra
2009-04-08 17:09 ` Peter Zijlstra
2009-04-08 13:01 ` [PATCH 9/9] perf_counter: allow for data addresses to be recorded Peter Zijlstra
2009-04-08 16:59 ` [tip:perfcounters/core] " Peter Zijlstra
2009-04-08 17:10 ` Peter Zijlstra
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).