linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [PATCH] perf timechart: remove lock_depth from trace_entry
@ 2013-10-07  6:48 Chia-I Wu
  2013-10-22 10:32 ` Stanislav Fomichev
  0 siblings, 1 reply; 18+ messages in thread
From: Chia-I Wu @ 2013-10-07  6:48 UTC (permalink / raw)
  To: linux-kernel

struct trace_entry went out-of-sync with the kernel since

 commit b000c8065 "tracing: Remove the extra 4 bytes of padding in events"

causing "perf timechart" to be broken.

Signed-off-by: Chia-I Wu <olvaffe@gmail.com>
---
 tools/perf/builtin-timechart.c | 1 -
 1 file changed, 1 deletion(-)

diff --git a/tools/perf/builtin-timechart.c b/tools/perf/builtin-timechart.c
index c2e0231..f9cbc18 100644
--- a/tools/perf/builtin-timechart.c
+++ b/tools/perf/builtin-timechart.c
@@ -303,7 +303,6 @@ struct trace_entry {
 	unsigned char		flags;
 	unsigned char		preempt_count;
 	int			pid;
-	int			lock_depth;
 };
 
 #ifdef SUPPORT_OLD_POWER_EVENTS
-- 
1.8.3.1


^ permalink raw reply related	[flat|nested] 18+ messages in thread

* Re: [PATCH] perf timechart: remove lock_depth from trace_entry
  2013-10-07  6:48 [PATCH] perf timechart: remove lock_depth from trace_entry Chia-I Wu
@ 2013-10-22 10:32 ` Stanislav Fomichev
  2013-11-25 18:21   ` Arnaldo Carvalho de Melo
  0 siblings, 1 reply; 18+ messages in thread
From: Stanislav Fomichev @ 2013-10-22 10:32 UTC (permalink / raw)
  To: a.p.zijlstra, paulus, mingo, acme; +Cc: linux-kernel

On Mon, Oct 07, 2013 at 02:48:34PM +0800, Chia-I Wu wrote:
> struct trace_entry went out-of-sync with the kernel since
> 
>  commit b000c8065 "tracing: Remove the extra 4 bytes of padding in events"
> 
> causing "perf timechart" to be broken.
> 
> Signed-off-by: Chia-I Wu <olvaffe@gmail.com>
Reviewed-by: Stanislav Fomichev <stfomichev@yandex-team.ru>

For some reason nobody noticed this patch on lkml, so resending to the
maintainers...
This also makes sense to merge it into the stable line.

> ---
>  tools/perf/builtin-timechart.c | 1 -
>  1 file changed, 1 deletion(-)
> 
> diff --git a/tools/perf/builtin-timechart.c b/tools/perf/builtin-timechart.c
> index c2e0231..f9cbc18 100644
> --- a/tools/perf/builtin-timechart.c
> +++ b/tools/perf/builtin-timechart.c
> @@ -303,7 +303,6 @@ struct trace_entry {
>  	unsigned char		flags;
>  	unsigned char		preempt_count;
>  	int			pid;
> -	int			lock_depth;
>  };
>  
>  #ifdef SUPPORT_OLD_POWER_EVENTS
> -- 
> 1.8.3.1

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [PATCH] perf timechart: remove lock_depth from trace_entry
  2013-10-22 10:32 ` Stanislav Fomichev
@ 2013-11-25 18:21   ` Arnaldo Carvalho de Melo
  2013-11-26 11:05     ` Stanislav Fomichev
  0 siblings, 1 reply; 18+ messages in thread
From: Arnaldo Carvalho de Melo @ 2013-11-25 18:21 UTC (permalink / raw)
  To: Stanislav Fomichev
  Cc: Chia-I Wu, a.p.zijlstra, paulus, mingo, linux-kernel, Steven Rostedt

[-- Attachment #1: Type: text/plain, Size: 1491 bytes --]

Em Tue, Oct 22, 2013 at 02:32:23PM +0400, Stanislav Fomichev escreveu:
> On Mon, Oct 07, 2013 at 02:48:34PM +0800, Chia-I Wu wrote:
> > struct trace_entry went out-of-sync with the kernel since
> > 
> >  commit b000c8065 "tracing: Remove the extra 4 bytes of padding in events"
> > 
> > causing "perf timechart" to be broken.
> > 
> > Signed-off-by: Chia-I Wu <olvaffe@gmail.com>
> Reviewed-by: Stanislav Fomichev <stfomichev@yandex-team.ru>
> 
> For some reason nobody noticed this patch on lkml, so resending to the
> maintainers...
> This also makes sense to merge it into the stable line.

This makes the new tool stop processing old files, can you try the patch
attached instead?

Its only compile tested tho.

- Arnaldo
 
> > ---
> >  tools/perf/builtin-timechart.c | 1 -
> >  1 file changed, 1 deletion(-)
> > 
> > diff --git a/tools/perf/builtin-timechart.c b/tools/perf/builtin-timechart.c
> > index c2e0231..f9cbc18 100644
> > --- a/tools/perf/builtin-timechart.c
> > +++ b/tools/perf/builtin-timechart.c
> > @@ -303,7 +303,6 @@ struct trace_entry {
> >  	unsigned char		flags;
> >  	unsigned char		preempt_count;
> >  	int			pid;
> > -	int			lock_depth;
> >  };
> >  
> >  #ifdef SUPPORT_OLD_POWER_EVENTS
> > -- 
> > 1.8.3.1
> --
> To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html
> Please read the FAQ at  http://www.tux.org/lkml/

[-- Attachment #2: lock_depth_removal.patch --]
[-- Type: text/plain, Size: 4888 bytes --]

diff --git a/tools/perf/builtin-timechart.c b/tools/perf/builtin-timechart.c
index 41c9bde2fb67..e24834d35eec 100644
--- a/tools/perf/builtin-timechart.c
+++ b/tools/perf/builtin-timechart.c
@@ -41,6 +41,27 @@
 #define SUPPORT_OLD_POWER_EVENTS 1
 #define PWR_EVENT_EXIT -1
 
+static unsigned int payload_offset;
+
+static int timechart__set_payload_offset(struct perf_evlist *evlist)
+{
+	struct perf_evsel *evsel = perf_evlist__first(evlist);
+	struct format_field *field = perf_evsel__field(evsel, "lock_depth");
+	
+	if (field == NULL) {
+		field = perf_evsel__field(evsel, "common_pid");
+		if (field == NULL)
+			return -1;
+	}
+
+	payload_offset = field->next->offset;
+	return 0;
+}
+
+static void *timechart__payload(struct perf_sample *sample)
+{
+	return sample->raw_data + payload_offset;
+}
 
 static unsigned int	numcpus;
 static u64		min_freq;	/* Lowest CPU frequency seen */
@@ -304,13 +325,11 @@ struct trace_entry {
 	unsigned char		flags;
 	unsigned char		preempt_count;
 	int			pid;
-	int			lock_depth;
 };
 
 #ifdef SUPPORT_OLD_POWER_EVENTS
 static int use_old_power_events;
 struct power_entry_old {
-	struct trace_entry te;
 	u64	type;
 	u64	value;
 	u64	cpu_id;
@@ -318,14 +337,12 @@ struct power_entry_old {
 #endif
 
 struct power_processor_entry {
-	struct trace_entry te;
 	u32	state;
 	u32	cpu_id;
 };
 
 #define TASK_COMM_LEN 16
 struct wakeup_entry {
-	struct trace_entry te;
 	char comm[TASK_COMM_LEN];
 	int   pid;
 	int   prio;
@@ -333,7 +350,6 @@ struct wakeup_entry {
 };
 
 struct sched_switch {
-	struct trace_entry te;
 	char prev_comm[TASK_COMM_LEN];
 	int  prev_pid;
 	int  prev_prio;
@@ -402,11 +418,13 @@ static void p_state_change(int cpu, u64 timestamp, u64 new_freq)
 			turbo_frequency = max_freq;
 }
 
-static void
-sched_wakeup(int cpu, u64 timestamp, int pid, struct trace_entry *te)
+static void sched_wakeup(struct perf_sample *sample)
 {
+	struct trace_entry *te = sample->raw_data;
+	struct wakeup_entry *wake = timechart__payload(sample);
+	u64 timestamp = sample->time;
+	int pid = sample->pid, cpu = sample->cpu;
 	struct per_pid *p;
-	struct wakeup_entry *wake = (void *)te;
 	struct wake_event *we = zalloc(sizeof(*we));
 
 	if (!we)
@@ -434,11 +452,9 @@ sched_wakeup(int cpu, u64 timestamp, int pid, struct trace_entry *te)
 	}
 }
 
-static void sched_switch(int cpu, u64 timestamp, struct trace_entry *te)
+static void sched_switch(int cpu, u64 timestamp, struct sched_switch *sw)
 {
 	struct per_pid *p = NULL, *prev_p;
-	struct sched_switch *sw = (void *)te;
-
 
 	prev_p = find_create_pid(sw->prev_pid);
 
@@ -495,7 +511,7 @@ static int
 process_sample_cpu_idle(struct perf_evsel *evsel __maybe_unused,
 			struct perf_sample *sample)
 {
-	struct power_processor_entry *ppe = sample->raw_data;
+	struct power_processor_entry *ppe = timechart__payload(sample);
 
 	if (ppe->state == (u32) PWR_EVENT_EXIT)
 		c_state_end(ppe->cpu_id, sample->time);
@@ -508,7 +524,7 @@ static int
 process_sample_cpu_frequency(struct perf_evsel *evsel __maybe_unused,
 			     struct perf_sample *sample)
 {
-	struct power_processor_entry *ppe = sample->raw_data;
+	struct power_processor_entry *ppe = timechart__payload(sample);
 
 	p_state_change(ppe->cpu_id, sample->time, ppe->state);
 	return 0;
@@ -518,9 +534,7 @@ static int
 process_sample_sched_wakeup(struct perf_evsel *evsel __maybe_unused,
 			    struct perf_sample *sample)
 {
-	struct trace_entry *te = sample->raw_data;
-
-	sched_wakeup(sample->cpu, sample->time, sample->pid, te);
+	sched_wakeup(sample);
 	return 0;
 }
 
@@ -528,9 +542,9 @@ static int
 process_sample_sched_switch(struct perf_evsel *evsel __maybe_unused,
 			    struct perf_sample *sample)
 {
-	struct trace_entry *te = sample->raw_data;
+	struct sched_switch *sw = timechart__payload(sample);
 
-	sched_switch(sample->cpu, sample->time, te);
+	sched_switch(sample->cpu, sample->time, sw);
 	return 0;
 }
 
@@ -539,7 +553,7 @@ static int
 process_sample_power_start(struct perf_evsel *evsel __maybe_unused,
 			   struct perf_sample *sample)
 {
-	struct power_entry_old *peo = sample->raw_data;
+	struct power_entry_old *peo = timechart__payload(sample);
 
 	c_state_start(peo->cpu_id, sample->time, peo->value);
 	return 0;
@@ -557,7 +571,7 @@ static int
 process_sample_power_frequency(struct perf_evsel *evsel __maybe_unused,
 			       struct perf_sample *sample)
 {
-	struct power_entry_old *peo = sample->raw_data;
+	struct power_entry_old *peo = timechart__payload(sample);
 
 	p_state_change(peo->cpu_id, sample->time, peo->value);
 	return 0;
@@ -1012,6 +1026,11 @@ static int __cmd_timechart(const char *output_name)
 		goto out_delete;
 	}
 
+	if (timechart__set_payload_offset(session->evlist)) {
+		pr_err("\"common_pid\" field not found, please try updating this tool\n");
+		goto out_delete;
+	}
+
 	ret = perf_session__process_events(session, &perf_timechart);
 	if (ret)
 		goto out_delete;

^ permalink raw reply related	[flat|nested] 18+ messages in thread

* Re: [PATCH] perf timechart: remove lock_depth from trace_entry
  2013-11-25 18:21   ` Arnaldo Carvalho de Melo
@ 2013-11-26 11:05     ` Stanislav Fomichev
  2013-11-26 12:10       ` Arnaldo Carvalho de Melo
  0 siblings, 1 reply; 18+ messages in thread
From: Stanislav Fomichev @ 2013-11-26 11:05 UTC (permalink / raw)
  To: Arnaldo Carvalho de Melo
  Cc: Chia-I Wu, a.p.zijlstra, paulus, mingo, linux-kernel, Steven Rostedt

> This makes the new tool stop processing old files, can you try the patch
> attached instead?
I see two downsides to your approach:
1) with your patch I'm now required to run 'perf timechart record' and
'perf timechart' on the same machine (otherwise, on the 'perf timechart'
machine we may have wrong fields definitions that don't match perf.data).
Currently, it's possible to use 'perf timechart record' on the target and do
'perf timechart' on the host (at least I do it this way).
2) only root can run 'perf timechart' now (because of permissions on
/sys/kernel/debug).

Maybe we can we make some simple version check against the perf.data file and
just refuse to process the old one (not sure if it's possible)?

> Its only compile tested tho.
Ok, I'll test it if we are fine with the new limitations.

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [PATCH] perf timechart: remove lock_depth from trace_entry
  2013-11-26 11:05     ` Stanislav Fomichev
@ 2013-11-26 12:10       ` Arnaldo Carvalho de Melo
  2013-11-26 13:47         ` Stanislav Fomichev
  0 siblings, 1 reply; 18+ messages in thread
From: Arnaldo Carvalho de Melo @ 2013-11-26 12:10 UTC (permalink / raw)
  To: Stanislav Fomichev
  Cc: Chia-I Wu, a.p.zijlstra, paulus, mingo, linux-kernel, Steven Rostedt

Em Tue, Nov 26, 2013 at 03:05:36PM +0400, Stanislav Fomichev escreveu:
> > This makes the new tool stop processing old files, can you try the patch
> > attached instead?

> I see two downsides to your approach:

Huh?

> 1) with your patch I'm now required to run 'perf timechart record' and
> 'perf timechart' on the same machine (otherwise, on the 'perf timechart'

No, it should not be required to do that, when processing perf.data
files the tracepoint info should come from perf.data, recorded there at
the 'perf record' time, right? I'm assuming this, will check.

> machine we may have wrong fields definitions that don't match perf.data).
> Currently, it's possible to use 'perf timechart record' on the target and do
> 'perf timechart' on the host (at least I do it this way).

If it is, my assumption is correct, as the evlist must be populated from
perf.data, or is it this way and _only_ when it goes to look at fields
is that it looks at the local machine? Doesn't make sense.

> 2) only root can run 'perf timechart' now (because of permissions on
> /sys/kernel/debug).

Se above, if before this patch the format_field info was obtained from
the perf.data file, why should it now get it from the local machine?
 
> Maybe we can we make some simple version check against the perf.data file and
> just refuse to process the old one (not sure if it's possible)?
> 
> > Its only compile tested tho.
> Ok, I'll test it if we are fine with the new limitations.

Please try. There should be no limitations.

- Arnaldo

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [PATCH] perf timechart: remove lock_depth from trace_entry
  2013-11-26 12:10       ` Arnaldo Carvalho de Melo
@ 2013-11-26 13:47         ` Stanislav Fomichev
  2013-11-26 13:57           ` Arnaldo Carvalho de Melo
  0 siblings, 1 reply; 18+ messages in thread
From: Stanislav Fomichev @ 2013-11-26 13:47 UTC (permalink / raw)
  To: Arnaldo Carvalho de Melo
  Cc: Chia-I Wu, a.p.zijlstra, paulus, mingo, linux-kernel, Steven Rostedt

> Se above, if before this patch the format_field info was obtained from
> the perf.data file, why should it now get it from the local machine?
Yes, sorry for confusion, I didn't know that we can obtain trace events
format from the perf.data. I hence have no concerns regarding your
implementation :-)

> Please try. There should be no limitations.
It does not work because lock_depth and common_pid are common fields,
while perf_evsel__field looks in non-common fields. And even if we find
lock_depth/common_pid common field we can't do field->next, because
common and non-common fields are not linked and ->next in the last common
field is NULL (this is what I got when I tried to use pevent_find_any_field
instead of pevent_find_field in perf_evsel__field).

I slightly modified timechart__set_payload_offset from you patch to look
for the first non-common field and use its offset, but it looks kinda ugly:

static int timechart__set_payload_offset(struct perf_evlist *evlist)
{
       struct perf_evsel *evsel = perf_evlist__first(evlist);
       struct format_field *field = evsel->tp_format->format.fields;

       if (!field)
               return -1;

       payload_offset = field->offset;
       return 0;
}

Maybe we can add some helper routine which returns first non-common
field, like:

struct format_field *perf_evsel__first_field(struct perf_evsel *evsel, const char *name)
{
	return evsel->tp_format->format.fields;
}

and call it in timechart__set_payload_offset?

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [PATCH] perf timechart: remove lock_depth from trace_entry
  2013-11-26 13:47         ` Stanislav Fomichev
@ 2013-11-26 13:57           ` Arnaldo Carvalho de Melo
  2013-11-26 14:54             ` [PATCH] perf timechart: dynamically determine event data offset Stanislav Fomichev
  0 siblings, 1 reply; 18+ messages in thread
From: Arnaldo Carvalho de Melo @ 2013-11-26 13:57 UTC (permalink / raw)
  To: Stanislav Fomichev
  Cc: Chia-I Wu, a.p.zijlstra, paulus, mingo, linux-kernel, Steven Rostedt

Em Tue, Nov 26, 2013 at 05:47:55PM +0400, Stanislav Fomichev escreveu:
> > Se above, if before this patch the format_field info was obtained from
> > the perf.data file, why should it now get it from the local machine?

> Yes, sorry for confusion, I didn't know that we can obtain trace events
> format from the perf.data. I hence have no concerns regarding your
> implementation :-)
> 
> > Please try. There should be no limitations.

> It does not work because lock_depth and common_pid are common fields,
> while perf_evsel__field looks in non-common fields. And even if we find

Oh, I thought they were linked, my bad, so that makes the implementation
even easier :-)

> lock_depth/common_pid common field we can't do field->next, because
> common and non-common fields are not linked and ->next in the last common
> field is NULL (this is what I got when I tried to use pevent_find_any_field
> instead of pevent_find_field in perf_evsel__field).
> 
> I slightly modified timechart__set_payload_offset from you patch to look
> for the first non-common field and use its offset, but it looks kinda ugly:
> 
> static int timechart__set_payload_offset(struct perf_evlist *evlist)
> {
>        struct perf_evsel *evsel = perf_evlist__first(evlist);
>        struct format_field *field = evsel->tp_format->format.fields;
> 
>        if (!field)
>                return -1;
> 
>        payload_offset = field->offset;
>        return 0;
> }
> 
> Maybe we can add some helper routine which returns first non-common
> field, like:

Humm, I think the above is enough but then having a a helper doesn't
hurt, but I would prefer it to instead be:

struct format_field *perf_evsel__fields(struct perf_evsel *evsel)
{
	return evsel->tp_format->format.fields;
}

The "name" parameter is not needed, and in essence getting "all the
fields" and the "first field" is the same operation, right? We can use
the shorter form.

In the evlist class we need this:

static inline
struct perf_evsel *perf_evlist__first(struct perf_evlist *evlist)
{
        return list_entry(evlist->entries.next, struct perf_evsel, node);
}

Because the open coded equivalent is long, as we use list_head to
maintain that list, but format.fields is a singly linked list.

Can you send a respinned patch with all we discussed, please?

- Arnaldo

> 
> struct format_field *perf_evsel__first_field(struct perf_evsel *evsel, const char *name)
> {
> 	return evsel->tp_format->format.fields;
> }
> 
> and call it in timechart__set_payload_offset?

^ permalink raw reply	[flat|nested] 18+ messages in thread

* [PATCH] perf timechart: dynamically determine event data offset
  2013-11-26 13:57           ` Arnaldo Carvalho de Melo
@ 2013-11-26 14:54             ` Stanislav Fomichev
  2013-11-26 22:04               ` Arnaldo Carvalho de Melo
  2013-11-27  8:49               ` Namhyung Kim
  0 siblings, 2 replies; 18+ messages in thread
From: Stanislav Fomichev @ 2013-11-26 14:54 UTC (permalink / raw)
  To: Arnaldo Carvalho de Melo
  Cc: Chia-I Wu, a.p.zijlstra, paulus, mingo, linux-kernel, Steven Rostedt

Since b000c8065a92 "tracing: Remove the extra 4 bytes of padding in events"
removed padding bytes, perf timechart got out of sync with the kernel's
trace_entry structure.
We can't just align perf's trace_entry definition with the kernel because we
want timechart to continue working with old perf.data. Instead, we now
calculate event data offset dynamically using offset of first non-common
event field in the perf.data.

Signed-off-by: Stanislav Fomichev <stfomichev@yandex-team.ru>
---
 tools/perf/builtin-timechart.c | 56 +++++++++++++++++++++++++++---------------
 tools/perf/util/evsel.c        |  5 ++++
 tools/perf/util/evsel.h        |  1 +
 3 files changed, 42 insertions(+), 20 deletions(-)

diff --git a/tools/perf/builtin-timechart.c b/tools/perf/builtin-timechart.c
index 41c9bde2fb67..5c76a914031b 100644
--- a/tools/perf/builtin-timechart.c
+++ b/tools/perf/builtin-timechart.c
@@ -41,6 +41,24 @@
 #define SUPPORT_OLD_POWER_EVENTS 1
 #define PWR_EVENT_EXIT -1
 
+static unsigned int payload_offset;
+
+static int timechart__set_payload_offset(struct perf_evlist *evlist)
+{
+	struct perf_evsel *evsel = perf_evlist__first(evlist);
+	struct format_field *field = perf_evsel__fields(evsel);
+
+	if (!field)
+		return -1;
+
+	payload_offset = field->offset;
+	return 0;
+}
+
+static void *timechart__payload(struct perf_sample *sample)
+{
+	return sample->raw_data + payload_offset;
+}
 
 static unsigned int	numcpus;
 static u64		min_freq;	/* Lowest CPU frequency seen */
@@ -304,13 +322,11 @@ struct trace_entry {
 	unsigned char		flags;
 	unsigned char		preempt_count;
 	int			pid;
-	int			lock_depth;
 };
 
 #ifdef SUPPORT_OLD_POWER_EVENTS
 static int use_old_power_events;
 struct power_entry_old {
-	struct trace_entry te;
 	u64	type;
 	u64	value;
 	u64	cpu_id;
@@ -318,14 +334,12 @@ struct power_entry_old {
 #endif
 
 struct power_processor_entry {
-	struct trace_entry te;
 	u32	state;
 	u32	cpu_id;
 };
 
 #define TASK_COMM_LEN 16
 struct wakeup_entry {
-	struct trace_entry te;
 	char comm[TASK_COMM_LEN];
 	int   pid;
 	int   prio;
@@ -333,7 +347,6 @@ struct wakeup_entry {
 };
 
 struct sched_switch {
-	struct trace_entry te;
 	char prev_comm[TASK_COMM_LEN];
 	int  prev_pid;
 	int  prev_prio;
@@ -402,11 +415,13 @@ static void p_state_change(int cpu, u64 timestamp, u64 new_freq)
 			turbo_frequency = max_freq;
 }
 
-static void
-sched_wakeup(int cpu, u64 timestamp, int pid, struct trace_entry *te)
+static void sched_wakeup(struct perf_sample *sample)
 {
+	struct trace_entry *te = sample->raw_data;
+	struct wakeup_entry *wake = timechart__payload(sample);
+	u64 timestamp = sample->time;
+	int pid = sample->pid, cpu = sample->cpu;
 	struct per_pid *p;
-	struct wakeup_entry *wake = (void *)te;
 	struct wake_event *we = zalloc(sizeof(*we));
 
 	if (!we)
@@ -434,11 +449,9 @@ sched_wakeup(int cpu, u64 timestamp, int pid, struct trace_entry *te)
 	}
 }
 
-static void sched_switch(int cpu, u64 timestamp, struct trace_entry *te)
+static void sched_switch(int cpu, u64 timestamp, struct sched_switch *sw)
 {
 	struct per_pid *p = NULL, *prev_p;
-	struct sched_switch *sw = (void *)te;
-
 
 	prev_p = find_create_pid(sw->prev_pid);
 
@@ -495,7 +508,7 @@ static int
 process_sample_cpu_idle(struct perf_evsel *evsel __maybe_unused,
 			struct perf_sample *sample)
 {
-	struct power_processor_entry *ppe = sample->raw_data;
+	struct power_processor_entry *ppe = timechart__payload(sample);
 
 	if (ppe->state == (u32) PWR_EVENT_EXIT)
 		c_state_end(ppe->cpu_id, sample->time);
@@ -508,7 +521,7 @@ static int
 process_sample_cpu_frequency(struct perf_evsel *evsel __maybe_unused,
 			     struct perf_sample *sample)
 {
-	struct power_processor_entry *ppe = sample->raw_data;
+	struct power_processor_entry *ppe = timechart__payload(sample);
 
 	p_state_change(ppe->cpu_id, sample->time, ppe->state);
 	return 0;
@@ -518,9 +531,7 @@ static int
 process_sample_sched_wakeup(struct perf_evsel *evsel __maybe_unused,
 			    struct perf_sample *sample)
 {
-	struct trace_entry *te = sample->raw_data;
-
-	sched_wakeup(sample->cpu, sample->time, sample->pid, te);
+	sched_wakeup(sample);
 	return 0;
 }
 
@@ -528,9 +539,9 @@ static int
 process_sample_sched_switch(struct perf_evsel *evsel __maybe_unused,
 			    struct perf_sample *sample)
 {
-	struct trace_entry *te = sample->raw_data;
+	struct sched_switch *sw = timechart__payload(sample);
 
-	sched_switch(sample->cpu, sample->time, te);
+	sched_switch(sample->cpu, sample->time, sw);
 	return 0;
 }
 
@@ -539,7 +550,7 @@ static int
 process_sample_power_start(struct perf_evsel *evsel __maybe_unused,
 			   struct perf_sample *sample)
 {
-	struct power_entry_old *peo = sample->raw_data;
+	struct power_entry_old *peo = timechart__payload(sample);
 
 	c_state_start(peo->cpu_id, sample->time, peo->value);
 	return 0;
@@ -557,7 +568,7 @@ static int
 process_sample_power_frequency(struct perf_evsel *evsel __maybe_unused,
 			       struct perf_sample *sample)
 {
-	struct power_entry_old *peo = sample->raw_data;
+	struct power_entry_old *peo = timechart__payload(sample);
 
 	p_state_change(peo->cpu_id, sample->time, peo->value);
 	return 0;
@@ -1012,6 +1023,11 @@ static int __cmd_timechart(const char *output_name)
 		goto out_delete;
 	}
 
+	if (timechart__set_payload_offset(session->evlist)) {
+		pr_err("Field format not found, please try updating this tool\n");
+		goto out_delete;
+	}
+
 	ret = perf_session__process_events(session, &perf_timechart);
 	if (ret)
 		goto out_delete;
diff --git a/tools/perf/util/evsel.c b/tools/perf/util/evsel.c
index 46dd4c2a41ce..4847d373fe2a 100644
--- a/tools/perf/util/evsel.c
+++ b/tools/perf/util/evsel.c
@@ -1767,6 +1767,11 @@ struct format_field *perf_evsel__field(struct perf_evsel *evsel, const char *nam
 	return pevent_find_field(evsel->tp_format, name);
 }
 
+struct format_field *perf_evsel__fields(struct perf_evsel *evsel)
+{
+	return evsel->tp_format->format.fields;
+}
+
 void *perf_evsel__rawptr(struct perf_evsel *evsel, struct perf_sample *sample,
 			 const char *name)
 {
diff --git a/tools/perf/util/evsel.h b/tools/perf/util/evsel.h
index 1ea7c92e6e33..3d50dc01bb1d 100644
--- a/tools/perf/util/evsel.h
+++ b/tools/perf/util/evsel.h
@@ -193,6 +193,7 @@ static inline char *perf_evsel__strval(struct perf_evsel *evsel,
 struct format_field;
 
 struct format_field *perf_evsel__field(struct perf_evsel *evsel, const char *name);
+struct format_field *perf_evsel__fields(struct perf_evsel *evsel);
 
 #define perf_evsel__match(evsel, t, c)		\
 	(evsel->attr.type == PERF_TYPE_##t &&	\
-- 
1.8.3.2


^ permalink raw reply related	[flat|nested] 18+ messages in thread

* Re: [PATCH] perf timechart: dynamically determine event data offset
  2013-11-26 14:54             ` [PATCH] perf timechart: dynamically determine event data offset Stanislav Fomichev
@ 2013-11-26 22:04               ` Arnaldo Carvalho de Melo
  2013-11-27  8:49               ` Namhyung Kim
  1 sibling, 0 replies; 18+ messages in thread
From: Arnaldo Carvalho de Melo @ 2013-11-26 22:04 UTC (permalink / raw)
  To: Stanislav Fomichev
  Cc: Chia-I Wu, a.p.zijlstra, paulus, mingo, linux-kernel, Steven Rostedt

Em Tue, Nov 26, 2013 at 06:54:37PM +0400, Stanislav Fomichev escreveu:
> Since b000c8065a92 "tracing: Remove the extra 4 bytes of padding in events"
> removed padding bytes, perf timechart got out of sync with the kernel's
> trace_entry structure.
> We can't just align perf's trace_entry definition with the kernel because we
> want timechart to continue working with old perf.data. Instead, we now
> calculate event data offset dynamically using offset of first non-common
> event field in the perf.data.

This doesn't apply on top of my acme/perf/core branch, that was where I
did the initial work on this patch, can you please redo on top of it?

Available at:

git://git.kernel.org/pub/scm/linux/kernel/git/acme/linux.git

perf/core branch

- Arnaldo
 
> Signed-off-by: Stanislav Fomichev <stfomichev@yandex-team.ru>
> ---
>  tools/perf/builtin-timechart.c | 56 +++++++++++++++++++++++++++---------------
>  tools/perf/util/evsel.c        |  5 ++++
>  tools/perf/util/evsel.h        |  1 +
>  3 files changed, 42 insertions(+), 20 deletions(-)
> 
> diff --git a/tools/perf/builtin-timechart.c b/tools/perf/builtin-timechart.c
> index 41c9bde2fb67..5c76a914031b 100644
> --- a/tools/perf/builtin-timechart.c
> +++ b/tools/perf/builtin-timechart.c
> @@ -41,6 +41,24 @@
>  #define SUPPORT_OLD_POWER_EVENTS 1
>  #define PWR_EVENT_EXIT -1
>  
> +static unsigned int payload_offset;
> +
> +static int timechart__set_payload_offset(struct perf_evlist *evlist)
> +{
> +	struct perf_evsel *evsel = perf_evlist__first(evlist);
> +	struct format_field *field = perf_evsel__fields(evsel);
> +
> +	if (!field)
> +		return -1;
> +
> +	payload_offset = field->offset;
> +	return 0;
> +}
> +
> +static void *timechart__payload(struct perf_sample *sample)
> +{
> +	return sample->raw_data + payload_offset;
> +}
>  
>  static unsigned int	numcpus;
>  static u64		min_freq;	/* Lowest CPU frequency seen */
> @@ -304,13 +322,11 @@ struct trace_entry {
>  	unsigned char		flags;
>  	unsigned char		preempt_count;
>  	int			pid;
> -	int			lock_depth;
>  };
>  
>  #ifdef SUPPORT_OLD_POWER_EVENTS
>  static int use_old_power_events;
>  struct power_entry_old {
> -	struct trace_entry te;
>  	u64	type;
>  	u64	value;
>  	u64	cpu_id;
> @@ -318,14 +334,12 @@ struct power_entry_old {
>  #endif
>  
>  struct power_processor_entry {
> -	struct trace_entry te;
>  	u32	state;
>  	u32	cpu_id;
>  };
>  
>  #define TASK_COMM_LEN 16
>  struct wakeup_entry {
> -	struct trace_entry te;
>  	char comm[TASK_COMM_LEN];
>  	int   pid;
>  	int   prio;
> @@ -333,7 +347,6 @@ struct wakeup_entry {
>  };
>  
>  struct sched_switch {
> -	struct trace_entry te;
>  	char prev_comm[TASK_COMM_LEN];
>  	int  prev_pid;
>  	int  prev_prio;
> @@ -402,11 +415,13 @@ static void p_state_change(int cpu, u64 timestamp, u64 new_freq)
>  			turbo_frequency = max_freq;
>  }
>  
> -static void
> -sched_wakeup(int cpu, u64 timestamp, int pid, struct trace_entry *te)
> +static void sched_wakeup(struct perf_sample *sample)
>  {
> +	struct trace_entry *te = sample->raw_data;
> +	struct wakeup_entry *wake = timechart__payload(sample);
> +	u64 timestamp = sample->time;
> +	int pid = sample->pid, cpu = sample->cpu;
>  	struct per_pid *p;
> -	struct wakeup_entry *wake = (void *)te;
>  	struct wake_event *we = zalloc(sizeof(*we));
>  
>  	if (!we)
> @@ -434,11 +449,9 @@ sched_wakeup(int cpu, u64 timestamp, int pid, struct trace_entry *te)
>  	}
>  }
>  
> -static void sched_switch(int cpu, u64 timestamp, struct trace_entry *te)
> +static void sched_switch(int cpu, u64 timestamp, struct sched_switch *sw)
>  {
>  	struct per_pid *p = NULL, *prev_p;
> -	struct sched_switch *sw = (void *)te;
> -
>  
>  	prev_p = find_create_pid(sw->prev_pid);
>  
> @@ -495,7 +508,7 @@ static int
>  process_sample_cpu_idle(struct perf_evsel *evsel __maybe_unused,
>  			struct perf_sample *sample)
>  {
> -	struct power_processor_entry *ppe = sample->raw_data;
> +	struct power_processor_entry *ppe = timechart__payload(sample);
>  
>  	if (ppe->state == (u32) PWR_EVENT_EXIT)
>  		c_state_end(ppe->cpu_id, sample->time);
> @@ -508,7 +521,7 @@ static int
>  process_sample_cpu_frequency(struct perf_evsel *evsel __maybe_unused,
>  			     struct perf_sample *sample)
>  {
> -	struct power_processor_entry *ppe = sample->raw_data;
> +	struct power_processor_entry *ppe = timechart__payload(sample);
>  
>  	p_state_change(ppe->cpu_id, sample->time, ppe->state);
>  	return 0;
> @@ -518,9 +531,7 @@ static int
>  process_sample_sched_wakeup(struct perf_evsel *evsel __maybe_unused,
>  			    struct perf_sample *sample)
>  {
> -	struct trace_entry *te = sample->raw_data;
> -
> -	sched_wakeup(sample->cpu, sample->time, sample->pid, te);
> +	sched_wakeup(sample);
>  	return 0;
>  }
>  
> @@ -528,9 +539,9 @@ static int
>  process_sample_sched_switch(struct perf_evsel *evsel __maybe_unused,
>  			    struct perf_sample *sample)
>  {
> -	struct trace_entry *te = sample->raw_data;
> +	struct sched_switch *sw = timechart__payload(sample);
>  
> -	sched_switch(sample->cpu, sample->time, te);
> +	sched_switch(sample->cpu, sample->time, sw);
>  	return 0;
>  }
>  
> @@ -539,7 +550,7 @@ static int
>  process_sample_power_start(struct perf_evsel *evsel __maybe_unused,
>  			   struct perf_sample *sample)
>  {
> -	struct power_entry_old *peo = sample->raw_data;
> +	struct power_entry_old *peo = timechart__payload(sample);
>  
>  	c_state_start(peo->cpu_id, sample->time, peo->value);
>  	return 0;
> @@ -557,7 +568,7 @@ static int
>  process_sample_power_frequency(struct perf_evsel *evsel __maybe_unused,
>  			       struct perf_sample *sample)
>  {
> -	struct power_entry_old *peo = sample->raw_data;
> +	struct power_entry_old *peo = timechart__payload(sample);
>  
>  	p_state_change(peo->cpu_id, sample->time, peo->value);
>  	return 0;
> @@ -1012,6 +1023,11 @@ static int __cmd_timechart(const char *output_name)
>  		goto out_delete;
>  	}
>  
> +	if (timechart__set_payload_offset(session->evlist)) {
> +		pr_err("Field format not found, please try updating this tool\n");
> +		goto out_delete;
> +	}
> +
>  	ret = perf_session__process_events(session, &perf_timechart);
>  	if (ret)
>  		goto out_delete;
> diff --git a/tools/perf/util/evsel.c b/tools/perf/util/evsel.c
> index 46dd4c2a41ce..4847d373fe2a 100644
> --- a/tools/perf/util/evsel.c
> +++ b/tools/perf/util/evsel.c
> @@ -1767,6 +1767,11 @@ struct format_field *perf_evsel__field(struct perf_evsel *evsel, const char *nam
>  	return pevent_find_field(evsel->tp_format, name);
>  }
>  
> +struct format_field *perf_evsel__fields(struct perf_evsel *evsel)
> +{
> +	return evsel->tp_format->format.fields;
> +}
> +
>  void *perf_evsel__rawptr(struct perf_evsel *evsel, struct perf_sample *sample,
>  			 const char *name)
>  {
> diff --git a/tools/perf/util/evsel.h b/tools/perf/util/evsel.h
> index 1ea7c92e6e33..3d50dc01bb1d 100644
> --- a/tools/perf/util/evsel.h
> +++ b/tools/perf/util/evsel.h
> @@ -193,6 +193,7 @@ static inline char *perf_evsel__strval(struct perf_evsel *evsel,
>  struct format_field;
>  
>  struct format_field *perf_evsel__field(struct perf_evsel *evsel, const char *name);
> +struct format_field *perf_evsel__fields(struct perf_evsel *evsel);
>  
>  #define perf_evsel__match(evsel, t, c)		\
>  	(evsel->attr.type == PERF_TYPE_##t &&	\
> -- 
> 1.8.3.2

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [PATCH] perf timechart: dynamically determine event data offset
  2013-11-26 14:54             ` [PATCH] perf timechart: dynamically determine event data offset Stanislav Fomichev
  2013-11-26 22:04               ` Arnaldo Carvalho de Melo
@ 2013-11-27  8:49               ` Namhyung Kim
  2013-11-27  9:01                 ` Stanislav Fomichev
  2013-11-27 13:44                 ` [PATCH] perf timechart: dynamically determine event data offset Arnaldo Carvalho de Melo
  1 sibling, 2 replies; 18+ messages in thread
From: Namhyung Kim @ 2013-11-27  8:49 UTC (permalink / raw)
  To: Stanislav Fomichev
  Cc: Arnaldo Carvalho de Melo, Chia-I Wu, a.p.zijlstra, paulus, mingo,
	linux-kernel, Steven Rostedt

Hi Stanislav,

On Tue, 26 Nov 2013 18:54:37 +0400, Stanislav Fomichev wrote:
> Since b000c8065a92 "tracing: Remove the extra 4 bytes of padding in events"
> removed padding bytes, perf timechart got out of sync with the kernel's
> trace_entry structure.
> We can't just align perf's trace_entry definition with the kernel because we
> want timechart to continue working with old perf.data. Instead, we now
> calculate event data offset dynamically using offset of first non-common
> event field in the perf.data.

[SNIP]
> @@ -304,13 +322,11 @@ struct trace_entry {
>  	unsigned char		flags;
>  	unsigned char		preempt_count;
>  	int			pid;
> -	int			lock_depth;
>  };

I had no chance to look into the timechart code in detail, but this is
not good.  The format of each trace event (so the struct trace_entry)
was described in the format file under the event directory on debugfs.
For cpu_frequency event, I get the following:

  $ cat /sys/kernel/debug/tracing/events/power/cpu_frequency/format 
  name: cpu_frequency
  ID: 315
  format:
	field:unsigned short common_type;	offset:0;	size:2;	signed:0;
	field:unsigned char common_flags;	offset:2;	size:1;	signed:0;
	field:unsigned char common_preempt_count;	offset:3;	size:1;	signed:0;
	field:int common_pid;	offset:4;	size:4;	signed:1;

	field:u32 state;	offset:8;	size:4;	signed:0;
	field:u32 cpu_id;	offset:12;	size:4;	signed:0;

  print fmt: "state=%lu cpu_id=%lu", (unsigned long)REC->state, (unsigned long)REC->cpu_id

So it's not same as above struct trace_entry even with your change.

And the thing is we should not access it as a binary format.  IOW we
need to access those (common) fields by libtraceevent or something that
honors the event format description.  This way we can access the data
reliably even after the format change like this.

This is why other perf commands wasn't affected by those change IMHO.
The timechart command should be changed to follow the same rule.  I
think perf_evsel__intval() and perf_evsel__strval() will do most of the
job you want here.  So,

Nacked-by: Namhyung Kim <namhyung@kernel.org>

Thanks,
Namhyung

>  
>  #ifdef SUPPORT_OLD_POWER_EVENTS
>  static int use_old_power_events;
>  struct power_entry_old {
> -	struct trace_entry te;
>  	u64	type;
>  	u64	value;
>  	u64	cpu_id;
> @@ -318,14 +334,12 @@ struct power_entry_old {
>  #endif
>  
>  struct power_processor_entry {
> -	struct trace_entry te;
>  	u32	state;
>  	u32	cpu_id;
>  };
>  
>  #define TASK_COMM_LEN 16
>  struct wakeup_entry {
> -	struct trace_entry te;
>  	char comm[TASK_COMM_LEN];
>  	int   pid;
>  	int   prio;
> @@ -333,7 +347,6 @@ struct wakeup_entry {
>  };
>  
>  struct sched_switch {
> -	struct trace_entry te;
>  	char prev_comm[TASK_COMM_LEN];
>  	int  prev_pid;
>  	int  prev_prio;
> @@ -402,11 +415,13 @@ static void p_state_change(int cpu, u64 timestamp, u64 new_freq)
>  			turbo_frequency = max_freq;
>  }
>  
> -static void
> -sched_wakeup(int cpu, u64 timestamp, int pid, struct trace_entry *te)
> +static void sched_wakeup(struct perf_sample *sample)
>  {
> +	struct trace_entry *te = sample->raw_data;
> +	struct wakeup_entry *wake = timechart__payload(sample);
> +	u64 timestamp = sample->time;
> +	int pid = sample->pid, cpu = sample->cpu;
>  	struct per_pid *p;
> -	struct wakeup_entry *wake = (void *)te;
>  	struct wake_event *we = zalloc(sizeof(*we));
>  
>  	if (!we)
> @@ -434,11 +449,9 @@ sched_wakeup(int cpu, u64 timestamp, int pid, struct trace_entry *te)
>  	}
>  }
>  
> -static void sched_switch(int cpu, u64 timestamp, struct trace_entry *te)
> +static void sched_switch(int cpu, u64 timestamp, struct sched_switch *sw)
>  {
>  	struct per_pid *p = NULL, *prev_p;
> -	struct sched_switch *sw = (void *)te;
> -
>  
>  	prev_p = find_create_pid(sw->prev_pid);
>  
> @@ -495,7 +508,7 @@ static int
>  process_sample_cpu_idle(struct perf_evsel *evsel __maybe_unused,
>  			struct perf_sample *sample)
>  {
> -	struct power_processor_entry *ppe = sample->raw_data;
> +	struct power_processor_entry *ppe = timechart__payload(sample);
>  
>  	if (ppe->state == (u32) PWR_EVENT_EXIT)
>  		c_state_end(ppe->cpu_id, sample->time);
> @@ -508,7 +521,7 @@ static int
>  process_sample_cpu_frequency(struct perf_evsel *evsel __maybe_unused,
>  			     struct perf_sample *sample)
>  {
> -	struct power_processor_entry *ppe = sample->raw_data;
> +	struct power_processor_entry *ppe = timechart__payload(sample);
>  
>  	p_state_change(ppe->cpu_id, sample->time, ppe->state);
>  	return 0;
> @@ -518,9 +531,7 @@ static int
>  process_sample_sched_wakeup(struct perf_evsel *evsel __maybe_unused,
>  			    struct perf_sample *sample)
>  {
> -	struct trace_entry *te = sample->raw_data;
> -
> -	sched_wakeup(sample->cpu, sample->time, sample->pid, te);
> +	sched_wakeup(sample);
>  	return 0;
>  }
>  
> @@ -528,9 +539,9 @@ static int
>  process_sample_sched_switch(struct perf_evsel *evsel __maybe_unused,
>  			    struct perf_sample *sample)
>  {
> -	struct trace_entry *te = sample->raw_data;
> +	struct sched_switch *sw = timechart__payload(sample);
>  
> -	sched_switch(sample->cpu, sample->time, te);
> +	sched_switch(sample->cpu, sample->time, sw);
>  	return 0;
>  }
>  
> @@ -539,7 +550,7 @@ static int
>  process_sample_power_start(struct perf_evsel *evsel __maybe_unused,
>  			   struct perf_sample *sample)
>  {
> -	struct power_entry_old *peo = sample->raw_data;
> +	struct power_entry_old *peo = timechart__payload(sample);
>  
>  	c_state_start(peo->cpu_id, sample->time, peo->value);
>  	return 0;
> @@ -557,7 +568,7 @@ static int
>  process_sample_power_frequency(struct perf_evsel *evsel __maybe_unused,
>  			       struct perf_sample *sample)
>  {
> -	struct power_entry_old *peo = sample->raw_data;
> +	struct power_entry_old *peo = timechart__payload(sample);
>  
>  	p_state_change(peo->cpu_id, sample->time, peo->value);
>  	return 0;
> @@ -1012,6 +1023,11 @@ static int __cmd_timechart(const char *output_name)
>  		goto out_delete;
>  	}
>  
> +	if (timechart__set_payload_offset(session->evlist)) {
> +		pr_err("Field format not found, please try updating this tool\n");
> +		goto out_delete;
> +	}
> +
>  	ret = perf_session__process_events(session, &perf_timechart);
>  	if (ret)
>  		goto out_delete;
> diff --git a/tools/perf/util/evsel.c b/tools/perf/util/evsel.c
> index 46dd4c2a41ce..4847d373fe2a 100644
> --- a/tools/perf/util/evsel.c
> +++ b/tools/perf/util/evsel.c
> @@ -1767,6 +1767,11 @@ struct format_field *perf_evsel__field(struct perf_evsel *evsel, const char *nam
>  	return pevent_find_field(evsel->tp_format, name);
>  }
>  
> +struct format_field *perf_evsel__fields(struct perf_evsel *evsel)
> +{
> +	return evsel->tp_format->format.fields;
> +}
> +
>  void *perf_evsel__rawptr(struct perf_evsel *evsel, struct perf_sample *sample,
>  			 const char *name)
>  {
> diff --git a/tools/perf/util/evsel.h b/tools/perf/util/evsel.h
> index 1ea7c92e6e33..3d50dc01bb1d 100644
> --- a/tools/perf/util/evsel.h
> +++ b/tools/perf/util/evsel.h
> @@ -193,6 +193,7 @@ static inline char *perf_evsel__strval(struct perf_evsel *evsel,
>  struct format_field;
>  
>  struct format_field *perf_evsel__field(struct perf_evsel *evsel, const char *name);
> +struct format_field *perf_evsel__fields(struct perf_evsel *evsel);
>  
>  #define perf_evsel__match(evsel, t, c)		\
>  	(evsel->attr.type == PERF_TYPE_##t &&	\

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [PATCH] perf timechart: dynamically determine event data offset
  2013-11-27  8:49               ` Namhyung Kim
@ 2013-11-27  9:01                 ` Stanislav Fomichev
  2013-11-27 10:45                   ` [PATCH] perf timechart: dynamically determine event fields offset Stanislav Fomichev
  2013-11-27 13:44                 ` [PATCH] perf timechart: dynamically determine event data offset Arnaldo Carvalho de Melo
  1 sibling, 1 reply; 18+ messages in thread
From: Stanislav Fomichev @ 2013-11-27  9:01 UTC (permalink / raw)
  To: Namhyung Kim
  Cc: Arnaldo Carvalho de Melo, Chia-I Wu, a.p.zijlstra, paulus, mingo,
	linux-kernel, Steven Rostedt

> Nacked-by: Namhyung Kim <namhyung@kernel.org>
Thanks for a review. I'll try to convert all binary structures
to perf_evsel__intval() and perf_evsel__strval() and will rebase on
Arnaldo's perf/core branch.

^ permalink raw reply	[flat|nested] 18+ messages in thread

* [PATCH] perf timechart: dynamically determine event fields offset
  2013-11-27  9:01                 ` Stanislav Fomichev
@ 2013-11-27 10:45                   ` Stanislav Fomichev
  2013-11-30 12:53                     ` [tip:perf/core] " tip-bot for Stanislav Fomichev
  0 siblings, 1 reply; 18+ messages in thread
From: Stanislav Fomichev @ 2013-11-27 10:45 UTC (permalink / raw)
  To: Namhyung Kim, Arnaldo Carvalho de Melo
  Cc: Chia-I Wu, a.p.zijlstra, paulus, mingo, linux-kernel, Steven Rostedt

Since b000c8065a92 "tracing: Remove the extra 4 bytes of padding in events"
removed padding bytes, perf timechart got out of sync with the kernel's
trace_entry structure.
Convert perf timechart to use dynamic fields offsets (via perf_evsel__intval)
and don't rely on the hardcoded copy of fields layout from the kernel.

Signed-off-by: Stanislav Fomichev <stfomichev@yandex-team.ru>
---
 tools/perf/builtin-timechart.c | 119 +++++++++++++++--------------------------
 1 file changed, 42 insertions(+), 77 deletions(-)

diff --git a/tools/perf/builtin-timechart.c b/tools/perf/builtin-timechart.c
index 491662bdfe0b..436cb5f9d751 100644
--- a/tools/perf/builtin-timechart.c
+++ b/tools/perf/builtin-timechart.c
@@ -306,50 +306,10 @@ static int process_exit_event(struct perf_tool *tool __maybe_unused,
 	return 0;
 }
 
-struct trace_entry {
-	unsigned short		type;
-	unsigned char		flags;
-	unsigned char		preempt_count;
-	int			pid;
-	int			lock_depth;
-};
-
 #ifdef SUPPORT_OLD_POWER_EVENTS
 static int use_old_power_events;
-struct power_entry_old {
-	struct trace_entry te;
-	u64	type;
-	u64	value;
-	u64	cpu_id;
-};
 #endif
 
-struct power_processor_entry {
-	struct trace_entry te;
-	u32	state;
-	u32	cpu_id;
-};
-
-#define TASK_COMM_LEN 16
-struct wakeup_entry {
-	struct trace_entry te;
-	char comm[TASK_COMM_LEN];
-	int   pid;
-	int   prio;
-	int   success;
-};
-
-struct sched_switch {
-	struct trace_entry te;
-	char prev_comm[TASK_COMM_LEN];
-	int  prev_pid;
-	int  prev_prio;
-	long prev_state; /* Arjan weeps. */
-	char next_comm[TASK_COMM_LEN];
-	int  next_pid;
-	int  next_prio;
-};
-
 static void c_state_start(int cpu, u64 timestamp, int state)
 {
 	cpus_cstate_start_times[cpu] = timestamp;
@@ -409,25 +369,23 @@ static void p_state_change(int cpu, u64 timestamp, u64 new_freq)
 			turbo_frequency = max_freq;
 }
 
-static void
-sched_wakeup(int cpu, u64 timestamp, int pid, struct trace_entry *te,
-	     const char *backtrace)
+static void sched_wakeup(int cpu, u64 timestamp, int waker, int wakee,
+			 u8 flags, const char *backtrace)
 {
 	struct per_pid *p;
-	struct wakeup_entry *wake = (void *)te;
 	struct wake_event *we = zalloc(sizeof(*we));
 
 	if (!we)
 		return;
 
 	we->time = timestamp;
-	we->waker = pid;
+	we->waker = waker;
 	we->backtrace = backtrace;
 
-	if ((te->flags & TRACE_FLAG_HARDIRQ) || (te->flags & TRACE_FLAG_SOFTIRQ))
+	if ((flags & TRACE_FLAG_HARDIRQ) || (flags & TRACE_FLAG_SOFTIRQ))
 		we->waker = -1;
 
-	we->wakee = wake->pid;
+	we->wakee = wakee;
 	we->next = wake_events;
 	wake_events = we;
 	p = find_create_pid(we->wakee);
@@ -444,24 +402,22 @@ sched_wakeup(int cpu, u64 timestamp, int pid, struct trace_entry *te,
 	}
 }
 
-static void sched_switch(int cpu, u64 timestamp, struct trace_entry *te,
-			 const char *backtrace)
+static void sched_switch(int cpu, u64 timestamp, int prev_pid, int next_pid,
+			 u64 prev_state, const char *backtrace)
 {
 	struct per_pid *p = NULL, *prev_p;
-	struct sched_switch *sw = (void *)te;
-
 
-	prev_p = find_create_pid(sw->prev_pid);
+	prev_p = find_create_pid(prev_pid);
 
-	p = find_create_pid(sw->next_pid);
+	p = find_create_pid(next_pid);
 
 	if (prev_p->current && prev_p->current->state != TYPE_NONE)
-		pid_put_sample(sw->prev_pid, TYPE_RUNNING, cpu,
+		pid_put_sample(prev_pid, TYPE_RUNNING, cpu,
 			       prev_p->current->state_since, timestamp,
 			       backtrace);
 	if (p && p->current) {
 		if (p->current->state != TYPE_NONE)
-			pid_put_sample(sw->next_pid, p->current->state, cpu,
+			pid_put_sample(next_pid, p->current->state, cpu,
 				       p->current->state_since, timestamp,
 				       backtrace);
 
@@ -472,9 +428,9 @@ static void sched_switch(int cpu, u64 timestamp, struct trace_entry *te,
 	if (prev_p->current) {
 		prev_p->current->state = TYPE_NONE;
 		prev_p->current->state_since = timestamp;
-		if (sw->prev_state & 2)
+		if (prev_state & 2)
 			prev_p->current->state = TYPE_BLOCKED;
-		if (sw->prev_state == 0)
+		if (prev_state == 0)
 			prev_p->current->state = TYPE_WAITING;
 	}
 }
@@ -586,61 +542,69 @@ static int process_sample_event(struct perf_tool *tool __maybe_unused,
 }
 
 static int
-process_sample_cpu_idle(struct perf_evsel *evsel __maybe_unused,
+process_sample_cpu_idle(struct perf_evsel *evsel,
 			struct perf_sample *sample,
 			const char *backtrace __maybe_unused)
 {
-	struct power_processor_entry *ppe = sample->raw_data;
+	u32 state = perf_evsel__intval(evsel, sample, "state");
+	u32 cpu_id = perf_evsel__intval(evsel, sample, "cpu_id");
 
-	if (ppe->state == (u32) PWR_EVENT_EXIT)
-		c_state_end(ppe->cpu_id, sample->time);
+	if (state == (u32)PWR_EVENT_EXIT)
+		c_state_end(cpu_id, sample->time);
 	else
-		c_state_start(ppe->cpu_id, sample->time, ppe->state);
+		c_state_start(cpu_id, sample->time, state);
 	return 0;
 }
 
 static int
-process_sample_cpu_frequency(struct perf_evsel *evsel __maybe_unused,
+process_sample_cpu_frequency(struct perf_evsel *evsel,
 			     struct perf_sample *sample,
 			     const char *backtrace __maybe_unused)
 {
-	struct power_processor_entry *ppe = sample->raw_data;
+	u32 state = perf_evsel__intval(evsel, sample, "state");
+	u32 cpu_id = perf_evsel__intval(evsel, sample, "cpu_id");
 
-	p_state_change(ppe->cpu_id, sample->time, ppe->state);
+	p_state_change(cpu_id, sample->time, state);
 	return 0;
 }
 
 static int
-process_sample_sched_wakeup(struct perf_evsel *evsel __maybe_unused,
+process_sample_sched_wakeup(struct perf_evsel *evsel,
 			    struct perf_sample *sample,
 			    const char *backtrace)
 {
-	struct trace_entry *te = sample->raw_data;
+	u8 flags = perf_evsel__intval(evsel, sample, "common_flags");
+	int waker = perf_evsel__intval(evsel, sample, "common_pid");
+	int wakee = perf_evsel__intval(evsel, sample, "pid");
 
-	sched_wakeup(sample->cpu, sample->time, sample->pid, te, backtrace);
+	sched_wakeup(sample->cpu, sample->time, waker, wakee, flags, backtrace);
 	return 0;
 }
 
 static int
-process_sample_sched_switch(struct perf_evsel *evsel __maybe_unused,
+process_sample_sched_switch(struct perf_evsel *evsel,
 			    struct perf_sample *sample,
 			    const char *backtrace)
 {
-	struct trace_entry *te = sample->raw_data;
+	int prev_pid = perf_evsel__intval(evsel, sample, "prev_pid");
+	int next_pid = perf_evsel__intval(evsel, sample, "next_pid");
+	u64 prev_state = perf_evsel__intval(evsel, sample, "prev_state");
 
-	sched_switch(sample->cpu, sample->time, te, backtrace);
+	sched_switch(sample->cpu, sample->time, prev_pid, next_pid, prev_state,
+		     backtrace);
 	return 0;
 }
 
 #ifdef SUPPORT_OLD_POWER_EVENTS
 static int
-process_sample_power_start(struct perf_evsel *evsel __maybe_unused,
+process_sample_power_start(struct perf_evsel *evsel,
 			   struct perf_sample *sample,
 			   const char *backtrace __maybe_unused)
 {
-	struct power_entry_old *peo = sample->raw_data;
+	u64 cpu_id = perf_evsel__intval(evsel, sample, "cpu_id");
+	u64 value = perf_evsel__intval(evsel, sample, "value");
 
-	c_state_start(peo->cpu_id, sample->time, peo->value);
+	c_state_start(cpu_id, sample->time, value);
 	return 0;
 }
 
@@ -654,13 +618,14 @@ process_sample_power_end(struct perf_evsel *evsel __maybe_unused,
 }
 
 static int
-process_sample_power_frequency(struct perf_evsel *evsel __maybe_unused,
+process_sample_power_frequency(struct perf_evsel *evsel,
 			       struct perf_sample *sample,
 			       const char *backtrace __maybe_unused)
 {
-	struct power_entry_old *peo = sample->raw_data;
+	u64 cpu_id = perf_evsel__intval(evsel, sample, "cpu_id");
+	u64 value = perf_evsel__intval(evsel, sample, "value");
 
-	p_state_change(peo->cpu_id, sample->time, peo->value);
+	p_state_change(cpu_id, sample->time, value);
 	return 0;
 }
 #endif /* SUPPORT_OLD_POWER_EVENTS */
-- 
1.8.3.2


^ permalink raw reply related	[flat|nested] 18+ messages in thread

* Re: [PATCH] perf timechart: dynamically determine event data offset
  2013-11-27  8:49               ` Namhyung Kim
  2013-11-27  9:01                 ` Stanislav Fomichev
@ 2013-11-27 13:44                 ` Arnaldo Carvalho de Melo
  2013-11-27 14:17                   ` Namhyung Kim
  1 sibling, 1 reply; 18+ messages in thread
From: Arnaldo Carvalho de Melo @ 2013-11-27 13:44 UTC (permalink / raw)
  To: Namhyung Kim
  Cc: Stanislav Fomichev, Chia-I Wu, a.p.zijlstra, paulus, mingo,
	linux-kernel, Steven Rostedt

Em Wed, Nov 27, 2013 at 05:49:02PM +0900, Namhyung Kim escreveu:
> On Tue, 26 Nov 2013 18:54:37 +0400, Stanislav Fomichev wrote:
> > Since b000c8065a92 "tracing: Remove the extra 4 bytes of padding in events"
> > removed padding bytes, perf timechart got out of sync with the kernel's
> > trace_entry structure.

> > We can't just align perf's trace_entry definition with the kernel because we
> > want timechart to continue working with old perf.data. Instead, we now
> > calculate event data offset dynamically using offset of first non-common
> > event field in the perf.data.
 
> [SNIP]
> > @@ -304,13 +322,11 @@ struct trace_entry {
> >  	unsigned char		flags;
> >  	unsigned char		preempt_count;
> >  	int			pid;
> > -	int			lock_depth;
> >  };
 
> I had no chance to look into the timechart code in detail, but this is
> not good.  The format of each trace event (so the struct trace_entry)
> was described in the format file under the event directory on debugfs.
> For cpu_frequency event, I get the following:
 
>   $ cat /sys/kernel/debug/tracing/events/power/cpu_frequency/format 
>   name: cpu_frequency
>   ID: 315
>   format:
> 	field:unsigned short common_type;	offset:0;	size:2;	signed:0;
> 	field:unsigned char common_flags;	offset:2;	size:1;	signed:0;
> 	field:unsigned char common_preempt_count;	offset:3;	size:1;	signed:0;
> 	field:int common_pid;	offset:4;	size:4;	signed:1;
 
> 	field:u32 state;	offset:8;	size:4;	signed:0;
> 	field:u32 cpu_id;	offset:12;	size:4;	signed:0;
 
>   print fmt: "state=%lu cpu_id=%lu", (unsigned long)REC->state, (unsigned long)REC->cpu_id
 
> So it's not same as above struct trace_entry even with your change.

It is... Albeit working just with patches is good and dandy and I even
was discussing with Jiri how patches should be clean and small, you're
not seeing the whole picture, see what is at the line just before
"unsigned char flags":

struct trace_entry {
        unsigned short          type;
        unsigned char           flags;
        unsigned char           preempt_count;
        int                     pid;
        int                     lock_depth;
};

Looking with pahole to see the offsets on a 64-bit arch:

[acme@ssdandy linux]$ pahole -C trace_entry /tmp/build/perf/builtin-timechart.o 
struct trace_entry {
	short unsigned int         type;                 /*     0     2 */
	unsigned char              flags;                /*     2     1 */
	unsigned char              preempt_count;        /*     3     1 */
	int                        pid;                  /*     4     4 */
	int                        lock_depth;           /*     8     4 */

	/* size: 12, cachelines: 1, members: 5 */
	/* last cacheline: 12 bytes */
};
[acme@ssdandy linux]$
[acme@ssdandy linux]$ uname -a
Linux ssdandy 3.12.0-rc6+ #2 SMP Mon Oct 21 11:19:07 BRT 2013 x86_64 x86_64 x86_64 GNU/Linux

And now on a 32-bit arch:

acme@linux-goap:~> pahole -C trace_entry /tmp/build/perf/builtin-timechart.o 
struct trace_entry {
	short unsigned int         type;                 /*     0     2 */
	unsigned char              flags;                /*     2     1 */
	unsigned char              preempt_count;        /*     3     1 */
	int                        pid;                  /*     4     4 */
	int                        lock_depth;           /*     8     4 */

	/* size: 12, cachelines: 1, members: 5 */
	/* last cacheline: 12 bytes */
};
acme@linux-goap:~> uname -a
Linux linux-goap 3.7.10-1.16-desktop #1 SMP PREEMPT Fri May 31 20:21:23 UTC 2013 (97c14ba) i686 i686 i386 GNU/Linux
acme@linux-goap:~>

Same signature, 32-bit, 64-bit userland, so whoever wrote timechart, Arjan, I
think, made no mistakes at using the kernel exported interface, choosing the
most efficient way to extract the data, casting to a struct.

Yeah, using the interface that searches for a field name to get the offset and
then add it to a pointer to then cast to the type will allow maximum flexibility,
but is not really efficient.

Doing that for something that is not performance critical (probably) as
timechart probably is not a problem, but so is not a problem using the patch
that does the cast after finding the offset of the first non-common field.

Having said that, I'll take the latest patch, using perf_evsel__intval et all,
if Namhyung sees no further problems with it.

BTW: in 'perf trace' I recently did some work to cache the offsets on a per evsel
private area, so that the string lookup is not done anymore in the raw_syscalls
tracepoint processing.

- Arnaldo

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [PATCH] perf timechart: dynamically determine event data offset
  2013-11-27 13:44                 ` [PATCH] perf timechart: dynamically determine event data offset Arnaldo Carvalho de Melo
@ 2013-11-27 14:17                   ` Namhyung Kim
  2013-11-27 14:41                     ` Arnaldo Carvalho de Melo
  0 siblings, 1 reply; 18+ messages in thread
From: Namhyung Kim @ 2013-11-27 14:17 UTC (permalink / raw)
  To: Arnaldo Carvalho de Melo
  Cc: Stanislav Fomichev, Chia-I Wu, a.p.zijlstra, paulus, mingo,
	linux-kernel, Steven Rostedt

Hi Arnaldo,

2013-11-27 (수), 10:44 -0300, Arnaldo Carvalho de Melo:
> Em Wed, Nov 27, 2013 at 05:49:02PM +0900, Namhyung Kim escreveu:
> > On Tue, 26 Nov 2013 18:54:37 +0400, Stanislav Fomichev wrote:
> > > Since b000c8065a92 "tracing: Remove the extra 4 bytes of padding in events"
> > > removed padding bytes, perf timechart got out of sync with the kernel's
> > > trace_entry structure.
> 
> > > We can't just align perf's trace_entry definition with the kernel because we
> > > want timechart to continue working with old perf.data. Instead, we now
> > > calculate event data offset dynamically using offset of first non-common
> > > event field in the perf.data.
>  
> > [SNIP]
> > > @@ -304,13 +322,11 @@ struct trace_entry {
> > >  	unsigned char		flags;
> > >  	unsigned char		preempt_count;
> > >  	int			pid;
> > > -	int			lock_depth;
> > >  };
>  
> > I had no chance to look into the timechart code in detail, but this is
> > not good.  The format of each trace event (so the struct trace_entry)
> > was described in the format file under the event directory on debugfs.
> > For cpu_frequency event, I get the following:
>  
> >   $ cat /sys/kernel/debug/tracing/events/power/cpu_frequency/format 
> >   name: cpu_frequency
> >   ID: 315
> >   format:
> > 	field:unsigned short common_type;	offset:0;	size:2;	signed:0;
> > 	field:unsigned char common_flags;	offset:2;	size:1;	signed:0;
> > 	field:unsigned char common_preempt_count;	offset:3;	size:1;	signed:0;
> > 	field:int common_pid;	offset:4;	size:4;	signed:1;
>  
> > 	field:u32 state;	offset:8;	size:4;	signed:0;
> > 	field:u32 cpu_id;	offset:12;	size:4;	signed:0;
>  
> >   print fmt: "state=%lu cpu_id=%lu", (unsigned long)REC->state, (unsigned long)REC->cpu_id
>  
> > So it's not same as above struct trace_entry even with your change.
> 
> It is... Albeit working just with patches is good and dandy and I even
> was discussing with Jiri how patches should be clean and small, you're
> not seeing the whole picture, see what is at the line just before
> "unsigned char flags":
> 
> struct trace_entry {
>         unsigned short          type;
>         unsigned char           flags;
>         unsigned char           preempt_count;
>         int                     pid;
>         int                     lock_depth;
> };

Argh, I missed that part, sorry. :)

> 
> Looking with pahole to see the offsets on a 64-bit arch:
> 
> [acme@ssdandy linux]$ pahole -C trace_entry /tmp/build/perf/builtin-timechart.o 
> struct trace_entry {
> 	short unsigned int         type;                 /*     0     2 */
> 	unsigned char              flags;                /*     2     1 */
> 	unsigned char              preempt_count;        /*     3     1 */
> 	int                        pid;                  /*     4     4 */
> 	int                        lock_depth;           /*     8     4 */
> 
> 	/* size: 12, cachelines: 1, members: 5 */
> 	/* last cacheline: 12 bytes */
> };
> [acme@ssdandy linux]$
> [acme@ssdandy linux]$ uname -a
> Linux ssdandy 3.12.0-rc6+ #2 SMP Mon Oct 21 11:19:07 BRT 2013 x86_64 x86_64 x86_64 GNU/Linux
> 
> And now on a 32-bit arch:
> 
> acme@linux-goap:~> pahole -C trace_entry /tmp/build/perf/builtin-timechart.o 
> struct trace_entry {
> 	short unsigned int         type;                 /*     0     2 */
> 	unsigned char              flags;                /*     2     1 */
> 	unsigned char              preempt_count;        /*     3     1 */
> 	int                        pid;                  /*     4     4 */
> 	int                        lock_depth;           /*     8     4 */
> 
> 	/* size: 12, cachelines: 1, members: 5 */
> 	/* last cacheline: 12 bytes */
> };
> acme@linux-goap:~> uname -a
> Linux linux-goap 3.7.10-1.16-desktop #1 SMP PREEMPT Fri May 31 20:21:23 UTC 2013 (97c14ba) i686 i686 i386 GNU/Linux
> acme@linux-goap:~>
> 
> Same signature, 32-bit, 64-bit userland, so whoever wrote timechart, Arjan, I
> think, made no mistakes at using the kernel exported interface, choosing the
> most efficient way to extract the data, casting to a struct.
> 
> Yeah, using the interface that searches for a field name to get the offset and
> then add it to a pointer to then cast to the type will allow maximum flexibility,
> but is not really efficient.
> 
> Doing that for something that is not performance critical (probably) as
> timechart probably is not a problem, but so is not a problem using the patch
> that does the cast after finding the offset of the first non-common field.

I'm not sure how it affects the performance really.  But in most cases,
using the libtraceevent would be a better solution IMHO.

> 
> Having said that, I'll take the latest patch, using perf_evsel__intval et all,
> if Namhyung sees no further problems with it.

Okay, after a quick glance I don't see anymore issue.  You can add my
Ack there.

> 
> BTW: in 'perf trace' I recently did some work to cache the offsets on a per evsel
> private area, so that the string lookup is not done anymore in the raw_syscalls
> tracepoint processing.

Good job. :)

Thanks,
Namhyung




^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [PATCH] perf timechart: dynamically determine event data offset
  2013-11-27 14:17                   ` Namhyung Kim
@ 2013-11-27 14:41                     ` Arnaldo Carvalho de Melo
  2013-11-27 14:51                       ` Namhyung Kim
  0 siblings, 1 reply; 18+ messages in thread
From: Arnaldo Carvalho de Melo @ 2013-11-27 14:41 UTC (permalink / raw)
  To: Namhyung Kim
  Cc: Stanislav Fomichev, Chia-I Wu, a.p.zijlstra, paulus, mingo,
	linux-kernel, Steven Rostedt

Em Wed, Nov 27, 2013 at 11:17:43PM +0900, Namhyung Kim escreveu:
> 2013-11-27 (수), 10:44 -0300, Arnaldo Carvalho de Melo:
> > Same signature, 32-bit, 64-bit userland, so whoever wrote timechart, Arjan, I
> > think, made no mistakes at using the kernel exported interface, choosing the
> > most efficient way to extract the data, casting to a struct.

> > Yeah, using the interface that searches for a field name to get the offset and
> > then add it to a pointer to then cast to the type will allow maximum flexibility,
> > but is not really efficient.

> > Doing that for something that is not performance critical (probably) as
> > timechart probably is not a problem, but so is not a problem using the patch
> > that does the cast after finding the offset of the first non-common field.
 
> I'm not sure how it affects the performance really.

Hey, everytime we need to get the value of a field when processing each
tracepoint sample we need to do it via:

struct format_field *pevent_find_field(struct event_format *event, const char *name)
{
        struct format_field *format;

        for (format = event->format.fields; format; format = format->next) {
                if (strcmp(format->name, name) == 0)
                        break;
        }

        return format;
}

I don't think this is optimal, no.

But yeah, for things that don't have performance needs, don't process that many
samples and hey, machines are fast and cheap these days ;-)

> But in most cases, using the libtraceevent would be a better solution IMHO.

It is the one that gives the most flexibily to change the kernel exported
tracepoint interface at will, yes.
 
> > Having said that, I'll take the latest patch, using perf_evsel__intval et all,
> > if Namhyung sees no further problems with it.
 
> Okay, after a quick glance I don't see anymore issue.  You can add my
> Ack there.

Ok, thanks for checking!

- Arnaldo

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [PATCH] perf timechart: dynamically determine event data offset
  2013-11-27 14:41                     ` Arnaldo Carvalho de Melo
@ 2013-11-27 14:51                       ` Namhyung Kim
  2013-11-27 14:55                         ` Arnaldo Carvalho de Melo
  0 siblings, 1 reply; 18+ messages in thread
From: Namhyung Kim @ 2013-11-27 14:51 UTC (permalink / raw)
  To: Arnaldo Carvalho de Melo
  Cc: Stanislav Fomichev, Chia-I Wu, a.p.zijlstra, paulus, mingo,
	linux-kernel, Steven Rostedt

2013-11-27 (수), 11:41 -0300, Arnaldo Carvalho de Melo:
> Em Wed, Nov 27, 2013 at 11:17:43PM +0900, Namhyung Kim escreveu:
> > 2013-11-27 (수), 10:44 -0300, Arnaldo Carvalho de Melo:
> > > Same signature, 32-bit, 64-bit userland, so whoever wrote timechart, Arjan, I
> > > think, made no mistakes at using the kernel exported interface, choosing the
> > > most efficient way to extract the data, casting to a struct.
> 
> > > Yeah, using the interface that searches for a field name to get the offset and
> > > then add it to a pointer to then cast to the type will allow maximum flexibility,
> > > but is not really efficient.
> 
> > > Doing that for something that is not performance critical (probably) as
> > > timechart probably is not a problem, but so is not a problem using the patch
> > > that does the cast after finding the offset of the first non-common field.
>  
> > I'm not sure how it affects the performance really.
> 
> Hey, everytime we need to get the value of a field when processing each
> tracepoint sample we need to do it via:
> 
> struct format_field *pevent_find_field(struct event_format *event, const char *name)
> {
>         struct format_field *format;
> 
>         for (format = event->format.fields; format; format = format->next) {
>                 if (strcmp(format->name, name) == 0)
>                         break;
>         }
> 
>         return format;
> }
> 
> I don't think this is optimal, no.
> 
> But yeah, for things that don't have performance needs, don't process that many
> samples and hey, machines are fast and cheap these days ;-)

Right.  What I wanted to say actually was like s/how/how much/. :)

Thanks,
Namhyung



^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [PATCH] perf timechart: dynamically determine event data offset
  2013-11-27 14:51                       ` Namhyung Kim
@ 2013-11-27 14:55                         ` Arnaldo Carvalho de Melo
  0 siblings, 0 replies; 18+ messages in thread
From: Arnaldo Carvalho de Melo @ 2013-11-27 14:55 UTC (permalink / raw)
  To: Namhyung Kim
  Cc: Stanislav Fomichev, Chia-I Wu, a.p.zijlstra, paulus, mingo,
	linux-kernel, Steven Rostedt

Em Wed, Nov 27, 2013 at 11:51:25PM +0900, Namhyung Kim escreveu:
> 2013-11-27 (수), 11:41 -0300, Arnaldo Carvalho de Melo:
> > Em Wed, Nov 27, 2013 at 11:17:43PM +0900, Namhyung Kim escreveu:
> > > 2013-11-27 (수), 10:44 -0300, Arnaldo Carvalho de Melo:
> > > > Same signature, 32-bit, 64-bit userland, so whoever wrote timechart, Arjan, I
> > > > think, made no mistakes at using the kernel exported interface, choosing the
> > > > most efficient way to extract the data, casting to a struct.

> > > > Yeah, using the interface that searches for a field name to get the offset and
> > > > then add it to a pointer to then cast to the type will allow maximum flexibility,
> > > > but is not really efficient.

> > > > Doing that for something that is not performance critical (probably) as
> > > > timechart probably is not a problem, but so is not a problem using the patch
> > > > that does the cast after finding the offset of the first non-common field.

> > > I'm not sure how it affects the performance really.

> > Hey, everytime we need to get the value of a field when processing each
> > tracepoint sample we need to do it via:

> > struct format_field *pevent_find_field(struct event_format *event, const char *name)
> > {
> >         struct format_field *format;
> > 
> >         for (format = event->format.fields; format; format = format->next) {
> >                 if (strcmp(format->name, name) == 0)
> >                         break;
> >         }
> > 
> >         return format;
> > }
> > 
> > I don't think this is optimal, no.
> > 
> > But yeah, for things that don't have performance needs, don't process that many
> > samples and hey, machines are fast and cheap these days ;-)
> 
> Right.  What I wanted to say actually was like s/how/how much/. :)

We would have to use perf for that 8-)

Its just that doing a string search on a doubly linked list for each
field on each sample doesn't _look_ optimal to me, so I didn't bother
using a profiler for that :-)

Anyway, lets move on, will apply the patch after lunch and send Ingo's
way.

- Arnaldo

^ permalink raw reply	[flat|nested] 18+ messages in thread

* [tip:perf/core] perf timechart: dynamically determine event fields offset
  2013-11-27 10:45                   ` [PATCH] perf timechart: dynamically determine event fields offset Stanislav Fomichev
@ 2013-11-30 12:53                     ` tip-bot for Stanislav Fomichev
  0 siblings, 0 replies; 18+ messages in thread
From: tip-bot for Stanislav Fomichev @ 2013-11-30 12:53 UTC (permalink / raw)
  To: linux-tip-commits
  Cc: acme, linux-kernel, paulus, mingo, hpa, mingo, a.p.zijlstra,
	namhyung, rostedt, stfomichev, tglx, olvaffe

Commit-ID:  3ed0d21e1103787e21ca38bed2ff50c9f087bedb
Gitweb:     http://git.kernel.org/tip/3ed0d21e1103787e21ca38bed2ff50c9f087bedb
Author:     Stanislav Fomichev <stfomichev@yandex-team.ru>
AuthorDate: Wed, 27 Nov 2013 14:45:00 +0400
Committer:  Arnaldo Carvalho de Melo <acme@redhat.com>
CommitDate: Wed, 27 Nov 2013 15:10:11 -0300

perf timechart: dynamically determine event fields offset

Since b000c8065a92 "tracing: Remove the extra 4 bytes of padding in
events" removed padding bytes, perf timechart got out of sync with the
kernel's trace_entry structure.

Convert perf timechart to use dynamic fields offsets (via
perf_evsel__intval) not relying on a hardcoded copy of fields layout
from the kernel.

Signed-off-by: Stanislav Fomichev <stfomichev@yandex-team.ru>
Acked-by: Namhyung Kim <namhyung@kernel.org>
Cc: Chia-I Wu <olvaffe@gmail.com>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Paul Mackerras <paulus@samba.org>
Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
Cc: Steven Rostedt <rostedt@goodmis.org>
Link: http://lkml.kernel.org/r/20131127104459.GB3309@stfomichev-desktop
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
---
 tools/perf/builtin-timechart.c | 119 +++++++++++++++--------------------------
 1 file changed, 42 insertions(+), 77 deletions(-)

diff --git a/tools/perf/builtin-timechart.c b/tools/perf/builtin-timechart.c
index 491662b..436cb5f 100644
--- a/tools/perf/builtin-timechart.c
+++ b/tools/perf/builtin-timechart.c
@@ -306,50 +306,10 @@ static int process_exit_event(struct perf_tool *tool __maybe_unused,
 	return 0;
 }
 
-struct trace_entry {
-	unsigned short		type;
-	unsigned char		flags;
-	unsigned char		preempt_count;
-	int			pid;
-	int			lock_depth;
-};
-
 #ifdef SUPPORT_OLD_POWER_EVENTS
 static int use_old_power_events;
-struct power_entry_old {
-	struct trace_entry te;
-	u64	type;
-	u64	value;
-	u64	cpu_id;
-};
 #endif
 
-struct power_processor_entry {
-	struct trace_entry te;
-	u32	state;
-	u32	cpu_id;
-};
-
-#define TASK_COMM_LEN 16
-struct wakeup_entry {
-	struct trace_entry te;
-	char comm[TASK_COMM_LEN];
-	int   pid;
-	int   prio;
-	int   success;
-};
-
-struct sched_switch {
-	struct trace_entry te;
-	char prev_comm[TASK_COMM_LEN];
-	int  prev_pid;
-	int  prev_prio;
-	long prev_state; /* Arjan weeps. */
-	char next_comm[TASK_COMM_LEN];
-	int  next_pid;
-	int  next_prio;
-};
-
 static void c_state_start(int cpu, u64 timestamp, int state)
 {
 	cpus_cstate_start_times[cpu] = timestamp;
@@ -409,25 +369,23 @@ static void p_state_change(int cpu, u64 timestamp, u64 new_freq)
 			turbo_frequency = max_freq;
 }
 
-static void
-sched_wakeup(int cpu, u64 timestamp, int pid, struct trace_entry *te,
-	     const char *backtrace)
+static void sched_wakeup(int cpu, u64 timestamp, int waker, int wakee,
+			 u8 flags, const char *backtrace)
 {
 	struct per_pid *p;
-	struct wakeup_entry *wake = (void *)te;
 	struct wake_event *we = zalloc(sizeof(*we));
 
 	if (!we)
 		return;
 
 	we->time = timestamp;
-	we->waker = pid;
+	we->waker = waker;
 	we->backtrace = backtrace;
 
-	if ((te->flags & TRACE_FLAG_HARDIRQ) || (te->flags & TRACE_FLAG_SOFTIRQ))
+	if ((flags & TRACE_FLAG_HARDIRQ) || (flags & TRACE_FLAG_SOFTIRQ))
 		we->waker = -1;
 
-	we->wakee = wake->pid;
+	we->wakee = wakee;
 	we->next = wake_events;
 	wake_events = we;
 	p = find_create_pid(we->wakee);
@@ -444,24 +402,22 @@ sched_wakeup(int cpu, u64 timestamp, int pid, struct trace_entry *te,
 	}
 }
 
-static void sched_switch(int cpu, u64 timestamp, struct trace_entry *te,
-			 const char *backtrace)
+static void sched_switch(int cpu, u64 timestamp, int prev_pid, int next_pid,
+			 u64 prev_state, const char *backtrace)
 {
 	struct per_pid *p = NULL, *prev_p;
-	struct sched_switch *sw = (void *)te;
-
 
-	prev_p = find_create_pid(sw->prev_pid);
+	prev_p = find_create_pid(prev_pid);
 
-	p = find_create_pid(sw->next_pid);
+	p = find_create_pid(next_pid);
 
 	if (prev_p->current && prev_p->current->state != TYPE_NONE)
-		pid_put_sample(sw->prev_pid, TYPE_RUNNING, cpu,
+		pid_put_sample(prev_pid, TYPE_RUNNING, cpu,
 			       prev_p->current->state_since, timestamp,
 			       backtrace);
 	if (p && p->current) {
 		if (p->current->state != TYPE_NONE)
-			pid_put_sample(sw->next_pid, p->current->state, cpu,
+			pid_put_sample(next_pid, p->current->state, cpu,
 				       p->current->state_since, timestamp,
 				       backtrace);
 
@@ -472,9 +428,9 @@ static void sched_switch(int cpu, u64 timestamp, struct trace_entry *te,
 	if (prev_p->current) {
 		prev_p->current->state = TYPE_NONE;
 		prev_p->current->state_since = timestamp;
-		if (sw->prev_state & 2)
+		if (prev_state & 2)
 			prev_p->current->state = TYPE_BLOCKED;
-		if (sw->prev_state == 0)
+		if (prev_state == 0)
 			prev_p->current->state = TYPE_WAITING;
 	}
 }
@@ -586,61 +542,69 @@ static int process_sample_event(struct perf_tool *tool __maybe_unused,
 }
 
 static int
-process_sample_cpu_idle(struct perf_evsel *evsel __maybe_unused,
+process_sample_cpu_idle(struct perf_evsel *evsel,
 			struct perf_sample *sample,
 			const char *backtrace __maybe_unused)
 {
-	struct power_processor_entry *ppe = sample->raw_data;
+	u32 state = perf_evsel__intval(evsel, sample, "state");
+	u32 cpu_id = perf_evsel__intval(evsel, sample, "cpu_id");
 
-	if (ppe->state == (u32) PWR_EVENT_EXIT)
-		c_state_end(ppe->cpu_id, sample->time);
+	if (state == (u32)PWR_EVENT_EXIT)
+		c_state_end(cpu_id, sample->time);
 	else
-		c_state_start(ppe->cpu_id, sample->time, ppe->state);
+		c_state_start(cpu_id, sample->time, state);
 	return 0;
 }
 
 static int
-process_sample_cpu_frequency(struct perf_evsel *evsel __maybe_unused,
+process_sample_cpu_frequency(struct perf_evsel *evsel,
 			     struct perf_sample *sample,
 			     const char *backtrace __maybe_unused)
 {
-	struct power_processor_entry *ppe = sample->raw_data;
+	u32 state = perf_evsel__intval(evsel, sample, "state");
+	u32 cpu_id = perf_evsel__intval(evsel, sample, "cpu_id");
 
-	p_state_change(ppe->cpu_id, sample->time, ppe->state);
+	p_state_change(cpu_id, sample->time, state);
 	return 0;
 }
 
 static int
-process_sample_sched_wakeup(struct perf_evsel *evsel __maybe_unused,
+process_sample_sched_wakeup(struct perf_evsel *evsel,
 			    struct perf_sample *sample,
 			    const char *backtrace)
 {
-	struct trace_entry *te = sample->raw_data;
+	u8 flags = perf_evsel__intval(evsel, sample, "common_flags");
+	int waker = perf_evsel__intval(evsel, sample, "common_pid");
+	int wakee = perf_evsel__intval(evsel, sample, "pid");
 
-	sched_wakeup(sample->cpu, sample->time, sample->pid, te, backtrace);
+	sched_wakeup(sample->cpu, sample->time, waker, wakee, flags, backtrace);
 	return 0;
 }
 
 static int
-process_sample_sched_switch(struct perf_evsel *evsel __maybe_unused,
+process_sample_sched_switch(struct perf_evsel *evsel,
 			    struct perf_sample *sample,
 			    const char *backtrace)
 {
-	struct trace_entry *te = sample->raw_data;
+	int prev_pid = perf_evsel__intval(evsel, sample, "prev_pid");
+	int next_pid = perf_evsel__intval(evsel, sample, "next_pid");
+	u64 prev_state = perf_evsel__intval(evsel, sample, "prev_state");
 
-	sched_switch(sample->cpu, sample->time, te, backtrace);
+	sched_switch(sample->cpu, sample->time, prev_pid, next_pid, prev_state,
+		     backtrace);
 	return 0;
 }
 
 #ifdef SUPPORT_OLD_POWER_EVENTS
 static int
-process_sample_power_start(struct perf_evsel *evsel __maybe_unused,
+process_sample_power_start(struct perf_evsel *evsel,
 			   struct perf_sample *sample,
 			   const char *backtrace __maybe_unused)
 {
-	struct power_entry_old *peo = sample->raw_data;
+	u64 cpu_id = perf_evsel__intval(evsel, sample, "cpu_id");
+	u64 value = perf_evsel__intval(evsel, sample, "value");
 
-	c_state_start(peo->cpu_id, sample->time, peo->value);
+	c_state_start(cpu_id, sample->time, value);
 	return 0;
 }
 
@@ -654,13 +618,14 @@ process_sample_power_end(struct perf_evsel *evsel __maybe_unused,
 }
 
 static int
-process_sample_power_frequency(struct perf_evsel *evsel __maybe_unused,
+process_sample_power_frequency(struct perf_evsel *evsel,
 			       struct perf_sample *sample,
 			       const char *backtrace __maybe_unused)
 {
-	struct power_entry_old *peo = sample->raw_data;
+	u64 cpu_id = perf_evsel__intval(evsel, sample, "cpu_id");
+	u64 value = perf_evsel__intval(evsel, sample, "value");
 
-	p_state_change(peo->cpu_id, sample->time, peo->value);
+	p_state_change(cpu_id, sample->time, value);
 	return 0;
 }
 #endif /* SUPPORT_OLD_POWER_EVENTS */

^ permalink raw reply related	[flat|nested] 18+ messages in thread

end of thread, other threads:[~2013-11-30 12:54 UTC | newest]

Thread overview: 18+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2013-10-07  6:48 [PATCH] perf timechart: remove lock_depth from trace_entry Chia-I Wu
2013-10-22 10:32 ` Stanislav Fomichev
2013-11-25 18:21   ` Arnaldo Carvalho de Melo
2013-11-26 11:05     ` Stanislav Fomichev
2013-11-26 12:10       ` Arnaldo Carvalho de Melo
2013-11-26 13:47         ` Stanislav Fomichev
2013-11-26 13:57           ` Arnaldo Carvalho de Melo
2013-11-26 14:54             ` [PATCH] perf timechart: dynamically determine event data offset Stanislav Fomichev
2013-11-26 22:04               ` Arnaldo Carvalho de Melo
2013-11-27  8:49               ` Namhyung Kim
2013-11-27  9:01                 ` Stanislav Fomichev
2013-11-27 10:45                   ` [PATCH] perf timechart: dynamically determine event fields offset Stanislav Fomichev
2013-11-30 12:53                     ` [tip:perf/core] " tip-bot for Stanislav Fomichev
2013-11-27 13:44                 ` [PATCH] perf timechart: dynamically determine event data offset Arnaldo Carvalho de Melo
2013-11-27 14:17                   ` Namhyung Kim
2013-11-27 14:41                     ` Arnaldo Carvalho de Melo
2013-11-27 14:51                       ` Namhyung Kim
2013-11-27 14:55                         ` Arnaldo Carvalho de Melo

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).