linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* ftrace trace_raw_pipe format
@ 2019-12-17 17:44 David Laight
  2019-12-17 22:34 ` Steven Rostedt
  0 siblings, 1 reply; 10+ messages in thread
From: David Laight @ 2019-12-17 17:44 UTC (permalink / raw)
  To: linux-kernel, 'Steven Rostedt'

I'm trying to 'grok' the trace_raw_pipe data that ftrace generates.
I've some 3rd party code that post-processes it, but doesn't like wrapped traces
because (I think) the traces from different cpus start at different times.

I can't seem to find any documentation at all...

I can find the format of the trace entries (I only need the length) from the 'format' files.
There seems to be 4 bytes between the entries that looks like a time difference.
I presume this is the interval before the trace item that follows.
(There is a time-delta of 1 before the first entry.)

The overall file seems to be a series of 4k blocks.
All but the first have a 16 byte header (possibly) described by 'header_page'
that has a timestamp and length (and a sign extended flag).

The first 4k page is confusing me.
The first 8 bytes might be the time the actual trace started.
The next 8 contain a length (short for a wrapped trace).
I've no idea what the next 8 are, they look like count of something.

Given that I've stopped tracing before reading the files I'd
expect the last buffer to be the partial one, not the first.
I'm also pretty sure the partial block contains the last trace data
(it seems to refer to the shell script sleep used to time the trace.)

I did find some old mailing list messages about these files being
corrupt - but that was about the time the splice code was
added to read them out - best part of 10 years ago.

If I can sort the headers for the 4k block (and reorder them??),
then delete entries so the start times match I should be able to
make the post-processing code work.

	David

-
Registered Address Lakeside, Bramley Road, Mount Farm, Milton Keynes, MK1 1PT, UK
Registration No: 1397386 (Wales)


^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: ftrace trace_raw_pipe format
  2019-12-17 17:44 ftrace trace_raw_pipe format David Laight
@ 2019-12-17 22:34 ` Steven Rostedt
  2019-12-17 22:43   ` Steven Rostedt
                     ` (4 more replies)
  0 siblings, 5 replies; 10+ messages in thread
From: Steven Rostedt @ 2019-12-17 22:34 UTC (permalink / raw)
  To: David Laight; +Cc: linux-kernel, Sudip Mukherjee

[-- Attachment #1: Type: text/plain, Size: 2620 bytes --]

On Tue, 17 Dec 2019 17:44:40 +0000
David Laight <David.Laight@ACULAB.COM> wrote:

> I'm trying to 'grok' the trace_raw_pipe data that ftrace generates.
> I've some 3rd party code that post-processes it, but doesn't like wrapped traces
> because (I think) the traces from different cpus start at different times.
> 
> I can't seem to find any documentation at all...
> 
> I can find the format of the trace entries (I only need the length) from the 'format' files.
> There seems to be 4 bytes between the entries that looks like a time difference.
> I presume this is the interval before the trace item that follows.
> (There is a time-delta of 1 before the first entry.)
> 
> The overall file seems to be a series of 4k blocks.
> All but the first have a 16 byte header (possibly) described by 'header_page'
> that has a timestamp and length (and a sign extended flag).
> 
> The first 4k page is confusing me.
> The first 8 bytes might be the time the actual trace started.
> The next 8 contain a length (short for a wrapped trace).
> I've no idea what the next 8 are, they look like count of something.
> 
> Given that I've stopped tracing before reading the files I'd
> expect the last buffer to be the partial one, not the first.
> I'm also pretty sure the partial block contains the last trace data
> (it seems to refer to the shell script sleep used to time the trace.)
> 
> I did find some old mailing list messages about these files being
> corrupt - but that was about the time the splice code was
> added to read them out - best part of 10 years ago.
> 
> If I can sort the headers for the 4k block (and reorder them??),
> then delete entries so the start times match I should be able to
> make the post-processing code work.
>

You may want to use libtraceevent (which will, hopefully, soon
be in debian!). Attached is a simple program that reads the data using
it and prints out the format.

I should add this file to the libtraceevent source code, which is found
in the Linux kernel source under tools/lib/traceevent. To build this,
use:

 $ cd linux.git/tools/lib/traceevent
 $ make
 $ sudo make install
 $ cd to-path-of-attached-file
 $ gcc -g -Wall read-trace-pipe-raw.c -o read-trace-pipe-raw -ltraceevent -ldl

Now you may need to add the path of libtraceevent into /etc/ld.so.conf
and run ldconfig to get it to work.

Let me know if this helps.

Note, this tool requires there to be data in the ring buffers (all of
them, as it will block on the first empty buffer). I just wrote this
for sample code. I'll fix this blocking and also add more comments for
when I add this to the source code.

-- Steve

[-- Attachment #2: read-trace-pipe-raw.c --]
[-- Type: text/x-c++src, Size: 7042 bytes --]

#define _GNU_SOURCE
#include <stdio.h>
#include <stdlib.h>
#include <stdarg.h>
#include <string.h>
#include <unistd.h>
#include <fcntl.h>
#include <errno.h>
#include <dirent.h>
#include <sys/stat.h>
#include <sys/types.h>
#include <sys/mount.h>
#include <traceevent/event-parse.h>
#include <traceevent/kbuffer.h>
#include <traceevent/trace-seq.h>

struct tep_handle *tep;
struct kbuffer *kbuf;
static int page_size;

#define __weak __attribute__((weak))
#define __noreturn __attribute__((noreturn))

#define TRACEFS_PATH "/sys/kernel/tracing"

#define _STR(x) #x
#define STR(x) _STR(x)

static int ignore_warning;

void warning(const char *fmt, ...)
{
	va_list ap;

	if (ignore_warning)
		return;

	va_start(ap, fmt);
	fprintf(stderr, "Warning:  ");
	vfprintf(stderr, fmt, ap);
	va_end(ap);

	fprintf(stderr, "\n");
}

void __noreturn __vdie(const char *fmt, va_list ap)
{
	int ret = errno ? errno : -1;

	fprintf(stderr, "Error:  ");
	vfprintf(stderr, fmt, ap);

	fprintf(stderr, "\n");
	exit(ret);
}

void __noreturn __die(const char *fmt, ...)
{
	va_list ap;

	va_start(ap, fmt);
	__vdie(fmt, ap);
	va_end(ap);
}

void __weak __noreturn die(const char *fmt, ...)
{
	va_list ap;

	va_start(ap, fmt);
	__vdie(fmt, ap);
	va_end(ap);
}

void __weak __noreturn pdie(const char *fmt, ...)
{
	va_list ap;

	if (errno)
		perror("read-trace-pipe-raw");

	va_start(ap, fmt);
	__vdie(fmt, ap);
	va_end(ap);
}

static int mount_tracefs(void)
{
	struct stat st;
	int ret;

	/* make sure debugfs exists */
	ret = stat(TRACEFS_PATH, &st);
	if (ret < 0)
		return -1;

	ret = mount("nodev", TRACEFS_PATH,
		    "tracefs", 0, NULL);

	return ret;
}

static char *find_tracing_dir(void)
{
	char fspath[PATH_MAX+1];
	char *tracing_dir;
	char type[100];
	FILE *fp;

	if ((fp = fopen("/proc/mounts","r")) == NULL) {
		warning("Can't open /proc/mounts for read");
		return NULL;
	}

	while (fscanf(fp, "%*s %"
		      STR(PATH_MAX)
		      "s %99s %*s %*d %*d\n",
		      fspath, type) == 2) {
		if (strcmp(type, "tracefs") == 0)
			break;
	}
	fclose(fp);

	if (strcmp(type, "tracefs") != 0) {
		if (mount_tracefs() < 0) {
			warning("tracefs not mounted, please mount");
			return NULL;
		} else
			strcpy(fspath, TRACEFS_PATH);
	}
	tracing_dir = strdup(fspath);
	if (!tracing_dir)
		return NULL;

	return tracing_dir;
}

static void load_format(const char *system, const char *format)
{
	char *line;
	char *buf = NULL;
	FILE *fp;
	size_t len = 0;
	size_t size = 0;
	int s;
	int ret;

	fp = fopen(format, "r");
	if (!fp)
		die("Could not open %s to read", format);

	while ((ret = getline(&line, &len, fp)) > 0) {
		s = strlen(line);
		buf = realloc(buf, size + s + 1);
		if (!buf)
			pdie("Allocating memory to read %s\n", format);
		strcpy(buf + size, line);
		size += s;
	}
	free(line);
	ignore_warning = 1;
	tep_parse_event(tep, buf, size, system);
	ignore_warning = 0;
	free(buf);
	fclose(fp);
}

static void read_event(const char *system, const char *event)
{
	struct stat st;
	char *format;
	int ret;

	ret = asprintf(&format, "%s/format", event);
	if (ret < 0)
		pdie("Could not allocate memory for %s format\n", event);

	ret = stat(format, &st);
	if (ret < 0 /* ? */ || !S_ISREG(st.st_mode))
		return;

	load_format(system, format);
	free(format);
}

static void read_system(const char *system)
{
	struct dirent *dent;
	struct stat st;
	DIR *dir;
	int ret;

	dir = opendir(system);
	if (!dir)
		pdie("Can't read directory '%s'", system);

	while ((dent = readdir(dir))) {
		char *event;

		if (strcmp(dent->d_name, ".") == 0 ||
		    strcmp(dent->d_name, "..") == 0)
			continue;

		ret = asprintf(&event, "%s/%s", system, dent->d_name);
		if (ret < 0)
			pdie("could not allocate memory for event name %s\n",
			     dent->d_name);
		ret = stat(event, &st);
		if (ret < 0 /* ? */ || !S_ISDIR(st.st_mode))
			continue;

		read_event(system, event);
		free(event);
	}
}

static void read_raw_buffer(int i, const char *buffer)
{
	struct trace_seq s;
	char buf[page_size];
	int fd;
	int r;

	printf("Parsing CPU %d buffer\n", i);

	fd = open(buffer, O_RDONLY);
	if (fd < 0)
		pdie("Failed to open %s", buffer);

	while ((r = read(fd, buf, page_size)) > 0) {
		kbuffer_load_subbuffer(kbuf, buf);

		for (;;) {
			struct tep_record record;

			record.data = kbuffer_read_event(kbuf, &record.ts);
			if (!record.data)
				break;
			kbuffer_next_event(kbuf, NULL);

			trace_seq_init(&s);
			tep_print_event(tep, &s, &record,
					"%s-%d %9d\t%s: %s\n",
					TEP_PRINT_COMM,
					TEP_PRINT_PID,
					TEP_PRINT_TIME,
					TEP_PRINT_NAME,
					TEP_PRINT_INFO);
			trace_seq_do_printf(&s);
		}
	}

	close(fd);
}

static void load_cmdlines(const char *cmdlines)
{
	FILE *fp;
	char *line = NULL;
	size_t len = 0;
	char comm[1024];
	int pid;
	int ret;

	printf("read %s\n", cmdlines);
	fp = fopen(cmdlines, "r");
	if (!fp)
		pdie("Opening %s", cmdlines);

	while ((ret = getline(&line, &len, fp)) > 0) {
		if (sscanf(line, "%d %1024s", &pid, comm) == 2)
			tep_register_comm(tep, comm, pid);
	}

	free(line);
	fclose(fp);
}

int main(int argc, char *argv[])
{
	char *tracefs = find_tracing_dir();
	enum kbuffer_long_size lsize;
	enum kbuffer_endian endian;
	struct dirent *dent;
	struct stat st;
	char *per_cpu;
	char *events;
	char *comms;
	DIR *dir;
	int ret;
	int i;

	page_size = getpagesize();

	if (!tracefs)
		die("Can not find tracefs");

	tep = tep_alloc();
	if (!tep)
		pdie("Could not allocate tep handle");

	lsize = sizeof(long) == 4 ? KBUFFER_LSIZE_4 : KBUFFER_LSIZE_8;
	endian = tep_is_bigendian() ? KBUFFER_ENDIAN_BIG : KBUFFER_ENDIAN_LITTLE;

	kbuf = kbuffer_alloc(lsize, endian);
	if (!kbuf)
		pdie("Could not allocate kbuffer handle");

	ret = asprintf(&comms, "%s/saved_cmdlines", tracefs);
	if (ret < 0)
		pdie("Could not allocate saved_cmdlines path name");

	load_cmdlines(comms);
	free(comms);

	ret = asprintf(&events, "%s/events", tracefs);
	if (ret < 0)
		pdie("Could not allocate memory for events path");

	dir = opendir(events);
	if (!dir)
		pdie("Can't read directory '%s'", events);

	while ((dent = readdir(dir))) {
		char *system;

		if (strcmp(dent->d_name, ".") == 0 ||
		    strcmp(dent->d_name, "..") == 0)
			continue;

		ret = asprintf(&system, "%s/%s", events, dent->d_name);
		if (ret < 0)
			pdie("could not allocate memory for system name %s\n",
			     dent->d_name);
		ret = stat(system, &st);
		if (ret < 0 /* ? */ || !S_ISDIR(st.st_mode))
			continue;

		read_system(system);
		free(system);
	}
	closedir(dir);
	free(events);

	asprintf(&per_cpu, "%s/per_cpu", tracefs);
	if (!per_cpu)
		pdie("Could not allocate memory for per_cpu path");

	for (i = 0; ; i++) {
		char *raw_buf;
		char *cpu;

		ret = asprintf(&cpu, "%s/cpu%d", per_cpu, i);
		if (ret < 0)
			pdie("Could not allocate memory for cpu buffer %d name", i);

		ret = stat(cpu, &st);
		if (ret < 0 || !S_ISDIR(st.st_mode))
			break;

		ret = asprintf(&raw_buf, "%s/trace_pipe_raw", cpu);
		if (ret < 0)
			pdie("Could not allocate memory for cpu %d raw buffer name", i);

		read_raw_buffer(i, raw_buf);
		free(raw_buf);
		free(cpu);
	}
	free(per_cpu);
}

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: ftrace trace_raw_pipe format
  2019-12-17 22:34 ` Steven Rostedt
@ 2019-12-17 22:43   ` Steven Rostedt
  2019-12-17 22:47   ` Steven Rostedt
                     ` (3 subsequent siblings)
  4 siblings, 0 replies; 10+ messages in thread
From: Steven Rostedt @ 2019-12-17 22:43 UTC (permalink / raw)
  To: David Laight; +Cc: linux-kernel, Sudip Mukherjee

On Tue, 17 Dec 2019 17:34:03 -0500
Steven Rostedt <rostedt@goodmis.org> wrote:

>  $ gcc -g -Wall read-trace-pipe-raw.c -o read-trace-pipe-raw -ltraceevent -ldl

I plan on adding this to the top of the file:

  // SPDX-License-Identifier: LGPL-2.1

In case you are wondering what license it will be under.

-- Steve

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: ftrace trace_raw_pipe format
  2019-12-17 22:34 ` Steven Rostedt
  2019-12-17 22:43   ` Steven Rostedt
@ 2019-12-17 22:47   ` Steven Rostedt
  2019-12-17 23:36   ` Steven Rostedt
                     ` (2 subsequent siblings)
  4 siblings, 0 replies; 10+ messages in thread
From: Steven Rostedt @ 2019-12-17 22:47 UTC (permalink / raw)
  To: David Laight; +Cc: linux-kernel, Sudip Mukherjee

On Tue, 17 Dec 2019 17:34:03 -0500
Steven Rostedt <rostedt@goodmis.org> wrote:

> static void read_raw_buffer(int i, const char *buffer)
> {
> 	struct trace_seq s;
> 	char buf[page_size];
> 	int fd;
> 	int r;
> 
> 	printf("Parsing CPU %d buffer\n", i);
> 
> 	fd = open(buffer, O_RDONLY);

Changing the above to:

	fd = open(buffer, O_RDONLY|O_NONBLOCK);

Will then not make it block on empty buffers.

-- Steve

> 	if (fd < 0)
> 		pdie("Failed to open %s", buffer);

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: ftrace trace_raw_pipe format
  2019-12-17 22:34 ` Steven Rostedt
  2019-12-17 22:43   ` Steven Rostedt
  2019-12-17 22:47   ` Steven Rostedt
@ 2019-12-17 23:36   ` Steven Rostedt
  2019-12-17 23:54     ` Steven Rostedt
  2019-12-18 16:31   ` Sudip Mukherjee
  2019-12-18 17:28   ` David Laight
  4 siblings, 1 reply; 10+ messages in thread
From: Steven Rostedt @ 2019-12-17 23:36 UTC (permalink / raw)
  To: David Laight; +Cc: linux-kernel, Sudip Mukherjee

[-- Attachment #1: Type: text/plain, Size: 1239 bytes --]

On Tue, 17 Dec 2019 17:34:03 -0500
Steven Rostedt <rostedt@goodmis.org> wrote:

> static void read_raw_buffer(int i, const char *buffer)
> {
> 	struct trace_seq s;
> 	char buf[page_size];
> 	int fd;
> 	int r;
> 
> 	printf("Parsing CPU %d buffer\n", i);
> 
> 	fd = open(buffer, O_RDONLY);
> 	if (fd < 0)
> 		pdie("Failed to open %s", buffer);
> 
> 	while ((r = read(fd, buf, page_size)) > 0) {
> 		kbuffer_load_subbuffer(kbuf, buf);
> 
> 		for (;;) {
> 			struct tep_record record;
> 
> 			record.data = kbuffer_read_event(kbuf, &record.ts);
> 			if (!record.data)
> 				break;
> 			kbuffer_next_event(kbuf, NULL);
> 

Also note, once you are here, you don't need to use the
tep_print_event() to print out the fields of record. You can also
extract the data from the event directly. Or you could register a
handler that will get called via the tep_print_event().

Attached is a new version that has some fixes as well as adds its own
sched_switch handler.

-- Steve


> 			trace_seq_init(&s);
> 			tep_print_event(tep, &s, &record,
> 					"%s-%d %9d\t%s: %s\n",
> 					TEP_PRINT_COMM,
> 					TEP_PRINT_PID,
> 					TEP_PRINT_TIME,
> 					TEP_PRINT_NAME,
> 					TEP_PRINT_INFO);
> 			trace_seq_do_printf(&s);
> 		}
> 	}
> 
> 	close(fd);
> }


[-- Attachment #2: read-trace-pipe-raw.c --]
[-- Type: text/x-c++src, Size: 9619 bytes --]

// SPDX-License-Identifier: LGPL-2.1
#define _GNU_SOURCE
#include <stdio.h>
#include <stdlib.h>
#include <stdarg.h>
#include <string.h>
#include <unistd.h>
#include <fcntl.h>
#include <errno.h>
#include <dirent.h>
#include <sys/stat.h>
#include <sys/types.h>
#include <sys/mount.h>
#include <traceevent/event-parse.h>
#include <traceevent/kbuffer.h>
#include <traceevent/trace-seq.h>

struct tep_handle *tep;
struct kbuffer *kbuf;
static int page_size;

#define __weak __attribute__((weak))
#define __noreturn __attribute__((noreturn))

#define TRACEFS_PATH "/sys/kernel/tracing"

#define _STR(x) #x
#define STR(x) _STR(x)

static int ignore_warning;

void warning(const char *fmt, ...)
{
	va_list ap;

	if (ignore_warning)
		return;

	va_start(ap, fmt);
	fprintf(stderr, "Warning:  ");
	vfprintf(stderr, fmt, ap);
	va_end(ap);

	fprintf(stderr, "\n");
}

void __noreturn __vdie(const char *fmt, va_list ap)
{
	int ret = errno ? errno : -1;

	fprintf(stderr, "Error:  ");
	vfprintf(stderr, fmt, ap);

	fprintf(stderr, "\n");
	exit(ret);
}

void __noreturn __die(const char *fmt, ...)
{
	va_list ap;

	va_start(ap, fmt);
	__vdie(fmt, ap);
	va_end(ap);
}

void __weak __noreturn die(const char *fmt, ...)
{
	va_list ap;

	va_start(ap, fmt);
	__vdie(fmt, ap);
	va_end(ap);
}

void __weak __noreturn pdie(const char *fmt, ...)
{
	va_list ap;

	if (errno)
		perror("read-trace-pipe-raw");

	va_start(ap, fmt);
	__vdie(fmt, ap);
	va_end(ap);
}

static int mount_tracefs(void)
{
	struct stat st;
	int ret;

	/* make sure debugfs exists */
	ret = stat(TRACEFS_PATH, &st);
	if (ret < 0)
		return -1;

	ret = mount("nodev", TRACEFS_PATH,
		    "tracefs", 0, NULL);

	return ret;
}

static char *find_tracing_dir(void)
{
	char fspath[PATH_MAX+1];
	char *tracing_dir;
	char type[100];
	FILE *fp;

	if ((fp = fopen("/proc/mounts","r")) == NULL) {
		warning("Can't open /proc/mounts for read");
		return NULL;
	}

	while (fscanf(fp, "%*s %"
		      STR(PATH_MAX)
		      "s %99s %*s %*d %*d\n",
		      fspath, type) == 2) {
		if (strcmp(type, "tracefs") == 0)
			break;
	}
	fclose(fp);

	if (strcmp(type, "tracefs") != 0) {
		if (mount_tracefs() < 0) {
			warning("tracefs not mounted, please mount");
			return NULL;
		} else
			strcpy(fspath, TRACEFS_PATH);
	}
	tracing_dir = strdup(fspath);
	if (!tracing_dir)
		return NULL;

	return tracing_dir;
}

static char *read_file(const char *file, size_t *file_size)
{
	char *line;
	char *buf = NULL;
	FILE *fp;
	size_t len = 0;
	size_t size = 0;
	int s;
	int ret;

	fp = fopen(file, "r");
	if (!fp)
		die("Could not open %s to read", file);

	while ((ret = getline(&line, &len, fp)) > 0) {
		s = strlen(line);
		buf = realloc(buf, size + s + 1);
		if (!buf)
			pdie("Allocating memory to read %s\n", file);
		strcpy(buf + size, line);
		size += s;
	}
	free(line);
	fclose(fp);

	*file_size = size;
	return buf;
}

static void load_format(const char *system, const char *format)
{
	size_t size;
	char *buf = read_file(format, &size);

	ignore_warning = 1;
	tep_parse_event(tep, buf, size, system);
	ignore_warning = 0;
	free(buf);
}

static void read_event(const char *system, const char *event)
{
	struct stat st;
	char *format;
	int ret;

	ret = asprintf(&format, "%s/format", event);
	if (ret < 0)
		pdie("Could not allocate memory for %s format\n", event);

	ret = stat(format, &st);
	if (ret < 0 /* ? */ || !S_ISREG(st.st_mode))
		return;

	load_format(system, format);
	free(format);
}

static void read_system(const char *system, const char *system_path)
{
	struct dirent *dent;
	struct stat st;
	DIR *dir;
	int ret;

	dir = opendir(system_path);
	if (!dir)
		pdie("Can't read directory '%s'", system_path);

	while ((dent = readdir(dir))) {
		char *event;

		if (strcmp(dent->d_name, ".") == 0 ||
		    strcmp(dent->d_name, "..") == 0)
			continue;

		ret = asprintf(&event, "%s/%s", system_path, dent->d_name);
		if (ret < 0)
			pdie("could not allocate memory for event name %s\n",
			     dent->d_name);
		ret = stat(event, &st);
		if (ret < 0 /* ? */ || !S_ISDIR(st.st_mode))
			continue;

		read_event(system, event);
		free(event);
	}
}

static void read_raw_buffer(int i, const char *buffer)
{
	struct trace_seq s;
	char buf[page_size];
	int fd;
	int r;

	printf("Parsing CPU %d buffer\n", i);

	fd = open(buffer, O_RDONLY|O_NONBLOCK);
	if (fd < 0)
		pdie("Failed to open %s", buffer);

	while ((r = read(fd, buf, page_size)) > 0) {
		kbuffer_load_subbuffer(kbuf, buf);

		for (;;) {
			struct tep_record record;

			record.data = kbuffer_read_event(kbuf, &record.ts);
			if (!record.data)
				break;
			kbuffer_next_event(kbuf, NULL);

			trace_seq_init(&s);
			tep_print_event(tep, &s, &record,
					"%s-%d %9d\t%s: %s\n",
					TEP_PRINT_COMM,
					TEP_PRINT_PID,
					TEP_PRINT_TIME,
					TEP_PRINT_NAME,
					TEP_PRINT_INFO);
			trace_seq_do_printf(&s);
		}
	}

	close(fd);
}

static void load_cmdlines(const char *cmdlines)
{
	FILE *fp;
	char *line = NULL;
	size_t len = 0;
	char comm[1024];
	int pid;
	int ret;

	printf("read %s\n", cmdlines);
	fp = fopen(cmdlines, "r");
	if (!fp)
		pdie("Opening %s", cmdlines);

	while ((ret = getline(&line, &len, fp)) > 0) {
		if (sscanf(line, "%d %1024s", &pid, comm) == 2)
			tep_register_comm(tep, comm, pid);
	}

	free(line);
	fclose(fp);
}

static void write_state(struct trace_seq *s, int val)
{
	const char states[] = "SDTtZXxW";
	int found = 0;
	int i;

	for (i=0; i < (sizeof(states) - 1); i++) {
		if (!(val & (1 << i)))
			continue;

		if (found)
			trace_seq_putc(s, '|');

		found = 1;
		trace_seq_putc(s, states[i]);
	}

	if (!found)
		trace_seq_putc(s, 'R');
}

static void write_and_save_comm(struct tep_format_field *field,
				struct tep_record *record,
				struct trace_seq *s, int pid)
{
	const char *comm;
	int len;

	comm = (char *)(record->data + field->offset);
	len = s->len;
	trace_seq_printf(s, "%.*s",
			 field->size, comm);

	/* make sure the comm has a \0 at the end. */
	trace_seq_terminate(s);
	comm = &s->buffer[len];

	/* Help out the comm to ids. This will handle dups */
	tep_register_comm(field->event->tep, comm, pid);
}

static int my_sched_switch(struct trace_seq *s, struct tep_record *record,
			   struct tep_event *event, void *context)
{
	struct tep_format_field *field;
	unsigned long long val;

	if (tep_get_field_val(s, event, "prev_pid", record, &val, 1))
		return trace_seq_putc(s, '!');

	field = tep_find_any_field(event, "prev_comm");
	if (field) {
		write_and_save_comm(field, record, s, val);
		trace_seq_putc(s, ':');
	}
	trace_seq_printf(s, "%lld ", val);

	if (tep_get_field_val(s, event, "prev_prio", record, &val, 0) == 0)
		trace_seq_printf(s, "[%lld] ", val);

	if (tep_get_field_val(s,  event, "prev_state", record, &val, 0) == 0)
		write_state(s, val);

	trace_seq_puts(s, " ==> ");

	if (tep_get_field_val(s, event, "next_pid", record, &val, 1))
		return trace_seq_putc(s, '!');

	field = tep_find_any_field(event, "next_comm");
	if (field) {
		write_and_save_comm(field, record, s, val);
		trace_seq_putc(s, ':');
	}
	trace_seq_printf(s, "%lld", val);

	if (tep_get_field_val(s, event, "next_prio", record, &val, 0) == 0)
		trace_seq_printf(s, " [%lld]", val);

	return 0;
}

int main(int argc, char *argv[])
{
	char *tracefs = find_tracing_dir();
	enum kbuffer_long_size lsize;
	enum kbuffer_endian endian;
	struct dirent *dent;
	struct stat st;
	size_t size;
	char *header_page;
	char *per_cpu;
	char *events;
	char *comms;
	char *buf;
	DIR *dir;
	int ret;
	int i;

	page_size = getpagesize();

	if (!tracefs)
		die("Can not find tracefs");

	tep = tep_alloc();
	if (!tep)
		pdie("Could not allocate tep handle");

	lsize = sizeof(long) == 4 ? KBUFFER_LSIZE_4 : KBUFFER_LSIZE_8;
	endian = tep_is_bigendian() ? KBUFFER_ENDIAN_BIG : KBUFFER_ENDIAN_LITTLE;

	kbuf = kbuffer_alloc(lsize, endian);
	if (!kbuf)
		pdie("Could not allocate kbuffer handle");

	ret = asprintf(&comms, "%s/saved_cmdlines", tracefs);
	if (ret < 0)
		pdie("Could not allocate saved_cmdlines path name");

	load_cmdlines(comms);
	free(comms);

	ret = asprintf(&events, "%s/events", tracefs);
	if (ret < 0)
		pdie("Could not allocate memory for events path");

	ret = asprintf(&header_page, "%s/header_page", events);
	if (ret < 0)
		pdie("Could not allocate memory for header page");

	buf = read_file(header_page, &size);
	tep_parse_header_page(tep, buf, size, sizeof(long));
	free(header_page);

	dir = opendir(events);
	if (!dir)
		pdie("Can't read directory '%s'", events);

	while ((dent = readdir(dir))) {
		char *system;

		if (strcmp(dent->d_name, ".") == 0 ||
		    strcmp(dent->d_name, "..") == 0)
			continue;

		ret = asprintf(&system, "%s/%s", events, dent->d_name);
		if (ret < 0)
			pdie("could not allocate memory for system name %s\n",
			     dent->d_name);
		ret = stat(system, &st);
		if (ret < 0 /* ? */ || !S_ISDIR(st.st_mode))
			continue;

		read_system(dent->d_name, system);
		free(system);
	}
	closedir(dir);
	free(events);

	ret = tep_register_event_handler(tep, -1, "sched", "sched_switch",
				   my_sched_switch, NULL);
	if (ret < 0)
		die("register event handle?");

	asprintf(&per_cpu, "%s/per_cpu", tracefs);
	if (!per_cpu)
		pdie("Could not allocate memory for per_cpu path");

	for (i = 0; ; i++) {
		char *raw_buf;
		char *cpu;

		ret = asprintf(&cpu, "%s/cpu%d", per_cpu, i);
		if (ret < 0)
			pdie("Could not allocate memory for cpu buffer %d name", i);

		ret = stat(cpu, &st);
		if (ret < 0 || !S_ISDIR(st.st_mode))
			break;

		ret = asprintf(&raw_buf, "%s/trace_pipe_raw", cpu);
		if (ret < 0)
			pdie("Could not allocate memory for cpu %d raw buffer name", i);

		read_raw_buffer(i, raw_buf);
		free(raw_buf);
		free(cpu);
	}
	free(per_cpu);
}

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: ftrace trace_raw_pipe format
  2019-12-17 23:36   ` Steven Rostedt
@ 2019-12-17 23:54     ` Steven Rostedt
  0 siblings, 0 replies; 10+ messages in thread
From: Steven Rostedt @ 2019-12-17 23:54 UTC (permalink / raw)
  To: David Laight; +Cc: linux-kernel, Sudip Mukherjee

On Tue, 17 Dec 2019 18:36:41 -0500
Steven Rostedt <rostedt@goodmis.org> wrote:

> Also note, once you are here, you don't need to use the
> tep_print_event() to print out the fields of record. You can also
> extract the data from the event directly. Or you could register a
> handler that will get called via the tep_print_event().

And of course, if you are unaware, trace-cmd does all this for you too:

 http://www.trace-cmd.org

;-)

-- Steve

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: ftrace trace_raw_pipe format
  2019-12-17 22:34 ` Steven Rostedt
                     ` (2 preceding siblings ...)
  2019-12-17 23:36   ` Steven Rostedt
@ 2019-12-18 16:31   ` Sudip Mukherjee
  2019-12-18 17:28   ` David Laight
  4 siblings, 0 replies; 10+ messages in thread
From: Sudip Mukherjee @ 2019-12-18 16:31 UTC (permalink / raw)
  To: Steven Rostedt; +Cc: David Laight, linux-kernel

On Tue, Dec 17, 2019 at 10:34 PM Steven Rostedt <rostedt@goodmis.org> wrote:
>
> On Tue, 17 Dec 2019 17:44:40 +0000
> David Laight <David.Laight@ACULAB.COM> wrote:
>
> > I'm trying to 'grok' the trace_raw_pipe data that ftrace generates.
> > I've some 3rd party code that post-processes it, but doesn't like wrapped traces
> > because (I think) the traces from different cpus start at different times.
> >
<snip>
>
> You may want to use libtraceevent (which will, hopefully, soon
> be in debian!). Attached is a simple program that reads the data using
> it and prints out the format.

merge request has been opened in debian, waiting for the debian
kernel-team to review and merge it,


-- 
Regards
Sudip

^ permalink raw reply	[flat|nested] 10+ messages in thread

* RE: ftrace trace_raw_pipe format
  2019-12-17 22:34 ` Steven Rostedt
                     ` (3 preceding siblings ...)
  2019-12-18 16:31   ` Sudip Mukherjee
@ 2019-12-18 17:28   ` David Laight
  2019-12-18 17:59     ` Steven Rostedt
  2020-01-07 13:49     ` David Laight
  4 siblings, 2 replies; 10+ messages in thread
From: David Laight @ 2019-12-18 17:28 UTC (permalink / raw)
  To: 'Steven Rostedt'; +Cc: linux-kernel, Sudip Mukherjee

From: Steven Rostedt
> Sent: 17 December 2019 22:34
>
> > I'm trying to 'grok' the trace_raw_pipe data that ftrace generates.
> > I've some 3rd party code that post-processes it, but doesn't like wrapped traces
> > because (I think) the traces from different cpus start at different times.
> >
> > I can't seem to find any documentation at all...
...
> You may want to use libtraceevent (which will, hopefully, soon
> be in debian!). Attached is a simple program that reads the data using
> it and prints out the format.

The problem is that I don't want to print the trace, I want to fix
some trace files so that another program doesn't barf at them.

I guess I can try to reverse engineer the library code.

It would also be nice if there was a way that some standard program
(like cat) could read out the trace files without blocking at the end
when the trace is inactive.

	David

-
Registered Address Lakeside, Bramley Road, Mount Farm, Milton Keynes, MK1 1PT, UK
Registration No: 1397386 (Wales)


^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: ftrace trace_raw_pipe format
  2019-12-18 17:28   ` David Laight
@ 2019-12-18 17:59     ` Steven Rostedt
  2020-01-07 13:49     ` David Laight
  1 sibling, 0 replies; 10+ messages in thread
From: Steven Rostedt @ 2019-12-18 17:59 UTC (permalink / raw)
  To: David Laight; +Cc: linux-kernel, Sudip Mukherjee

On Wed, 18 Dec 2019 17:28:13 +0000
David Laight <David.Laight@ACULAB.COM> wrote:

> From: Steven Rostedt
> > Sent: 17 December 2019 22:34
> >  
> > > I'm trying to 'grok' the trace_raw_pipe data that ftrace generates.
> > > I've some 3rd party code that post-processes it, but doesn't like wrapped traces
> > > because (I think) the traces from different cpus start at different times.
> > >
> > > I can't seem to find any documentation at all...  
> ...
> > You may want to use libtraceevent (which will, hopefully, soon
> > be in debian!). Attached is a simple program that reads the data using
> > it and prints out the format.  
> 
> The problem is that I don't want to print the trace, I want to fix
> some trace files so that another program doesn't barf at them.

It's not just for printing. It allows you to read the buffers and do
whatever you want with the data. Look at the kbuffer code. It's the way
to get the raw event data with the time stamps attached to them. The
kbuffer part (see kbuffer.h) processes the meta data in the
trace-pipe-raw file, and hands you the raw data that's there.

The libtraceevent is mainly to parse the format files of the events, to
know how to read the data (see the my_sched_switch() code of the second
version of my sample program). As the format files describe the binary
layout of the raw event data. Yes, this program prints the data, but it
could be easily modified to convert the data into a different format.

> 
> I guess I can try to reverse engineer the library code.

You shouldn't have to reverse engineer the code. If it's not what you
need, let me know exactly what you want, and we can add to it. The
reason I created libtraceevent, is to get rid of all the duplicate code
that's out there toady.

> 
> It would also be nice if there was a way that some standard program
> (like cat) could read out the trace files without blocking at the end
> when the trace is inactive.
> 

It shouldn't be hard to add a trace option to the kernel, to do that.

-- Steve

^ permalink raw reply	[flat|nested] 10+ messages in thread

* RE: ftrace trace_raw_pipe format
  2019-12-18 17:28   ` David Laight
  2019-12-18 17:59     ` Steven Rostedt
@ 2020-01-07 13:49     ` David Laight
  1 sibling, 0 replies; 10+ messages in thread
From: David Laight @ 2020-01-07 13:49 UTC (permalink / raw)
  To: 'Steven Rostedt'; +Cc: 'linux-kernel@vger.kernel.org'

After reading the code in kbuffer-parse.c I can now get a sequence of trace entries
with their timestamps.

I think there is a bug in the processing (skipping) of KBUFFER_TYPE_TIME_STAMP
in translate_data().
I assume a 64bit timestamp would follow, and 'data' should only be incremented
by 8 (not 12) as there is an increment by 4 earlier.
I'm pretty sure these are never generated - __next_event() doesn't loop.

It is ~impossible to match this parsing code to the generating code because none
of the constants are in a shared header file.

Back to my original problem:

I've a set of trace files that contain 4 4k blocks each.
The 4 blocks are in time order, but I think there is a big time discontinuity
between the first and second blocks.
The second block has both MISSED_EVENTS and MISSED_STORED set.
My suspicion is that the first block (partially filled) is the final block
from the previous trace and its presence is a bug.
So anything processing the trace must completely ignore the first
block (if the second reports MISSED_EVENTS).

The 4th block is also partial - which is what I expect.

	David

-
Registered Address Lakeside, Bramley Road, Mount Farm, Milton Keynes, MK1 1PT, UK
Registration No: 1397386 (Wales)


^ permalink raw reply	[flat|nested] 10+ messages in thread

end of thread, other threads:[~2020-01-07 13:49 UTC | newest]

Thread overview: 10+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2019-12-17 17:44 ftrace trace_raw_pipe format David Laight
2019-12-17 22:34 ` Steven Rostedt
2019-12-17 22:43   ` Steven Rostedt
2019-12-17 22:47   ` Steven Rostedt
2019-12-17 23:36   ` Steven Rostedt
2019-12-17 23:54     ` Steven Rostedt
2019-12-18 16:31   ` Sudip Mukherjee
2019-12-18 17:28   ` David Laight
2019-12-18 17:59     ` Steven Rostedt
2020-01-07 13:49     ` David Laight

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).