All of lore.kernel.org
 help / color / mirror / Atom feed
From: Arnaldo Carvalho de Melo <acme@kernel.org>
To: Namhyung Kim <namhyung@kernel.org>
Cc: Ingo Molnar <mingo@kernel.org>,
	Peter Zijlstra <a.p.zijlstra@chello.nl>,
	Jiri Olsa <jolsa@redhat.com>, LKML <linux-kernel@vger.kernel.org>,
	Frederic Weisbecker <fweisbec@gmail.com>,
	Stephane Eranian <eranian@google.com>,
	David Ahern <dsahern@gmail.com>, Andi Kleen <andi@firstfloor.org>
Subject: Re: [RFC/PATCH 09/38] perf record: Add --index option for building index table
Date: Fri, 2 Oct 2015 15:58:55 -0300	[thread overview]
Message-ID: <20151002185855.GD4736@kernel.org> (raw)
In-Reply-To: <1443763159-29098-10-git-send-email-namhyung@kernel.org>

Em Fri, Oct 02, 2015 at 02:18:50PM +0900, Namhyung Kim escreveu:
> The new --index option will create indexed data file which can be
> processed by multiple threads parallelly.  It saves meta event and
> sample data in separate files and merges them with an index table.
> 
> If there's an index table in the data file, the HEADER_DATA_INDEX
> feature bit is set and session->header.index[0] will point to the meta
> event area, and rest are sample data.  It'd look like below:

So this is all about perf.data files, i.e. we will traverse it all
looking for metadata events, then the samples itself will be processed
using multiple threads. I.e. two stages, touching the whole perf.data
file looking for the metadata events, then touching it all again to
process the files, right?

The model processing events for 'perf report' and for 'perf top' will
differ, right?

Can't we have one thread reading the events, another batching then to
merge up to those FINISHED_ROUND when it needs to sort things (in
parallel with reading more events for the next round) and then pass the
sorted batch to one thread per CPU to actually process the samples as in
hist processing, etc?

Need to try this, see if 'perf top' works, which I think it will, but
you haven't mentioned anything about it in the cover letter for this
patchkit.

But no speedups should be expected there, as no 'perf.data' file is
involved...

- Arnaldo
 
>         +---------------------+
>         |     file header     |
>         |---------------------|
>         |                     |
>         |    meta events[0] <-+--+
>         |                     |  |
>         |---------------------|  |
>         |                     |  |
>         |    sample data[1] <-+--+
>         |                     |  |
>         |---------------------|  |
>         |                     |  |
>         |    sample data[2] <-|--+
>         |                     |  |
>         |---------------------|  |
>         |         ...         | ...
>         |---------------------|  |
>         |     feature data    |  |
>         |   (contains index) -+--+
>         +---------------------+
> 
> Signed-off-by: Namhyung Kim <namhyung@kernel.org>
> ---
>  tools/perf/Documentation/perf-record.txt |   4 +
>  tools/perf/builtin-record.c              | 178 ++++++++++++++++++++++++++++---
>  tools/perf/perf.h                        |   1 +
>  tools/perf/util/header.c                 |   2 +
>  tools/perf/util/session.c                |   1 +
>  5 files changed, 173 insertions(+), 13 deletions(-)
> 
> diff --git a/tools/perf/Documentation/perf-record.txt b/tools/perf/Documentation/perf-record.txt
> index 2e9ce77b5e14..71a9520b10b0 100644
> --- a/tools/perf/Documentation/perf-record.txt
> +++ b/tools/perf/Documentation/perf-record.txt
> @@ -308,6 +308,10 @@ This option sets the time out limit. The default value is 500 ms.
>  Record context switch events i.e. events of type PERF_RECORD_SWITCH or
>  PERF_RECORD_SWITCH_CPU_WIDE.
>  
> +--index::
> +Build an index table for sample data.  This will speed up perf report by
> +parallel processing.
> +
>  SEE ALSO
>  --------
>  linkperf:perf-stat[1], linkperf:perf-list[1]
> diff --git a/tools/perf/builtin-record.c b/tools/perf/builtin-record.c
> index 623984c81478..096634c4c5ea 100644
> --- a/tools/perf/builtin-record.c
> +++ b/tools/perf/builtin-record.c
> @@ -43,6 +43,7 @@ struct record {
>  	u64			bytes_written;
>  	struct perf_data_file	file;
>  	struct auxtrace_record	*itr;
> +	int			*fds;
>  	struct perf_evlist	*evlist;
>  	struct perf_session	*session;
>  	const char		*progname;
> @@ -52,9 +53,16 @@ struct record {
>  	long			samples;
>  };
>  
> -static int record__write(struct record *rec, void *bf, size_t size)
> +static int record__write(struct record *rec, void *bf, size_t size, int idx)
>  {
> -	if (perf_data_file__write(rec->session->file, bf, size) < 0) {
> +	int fd;
> +
> +	if (rec->fds && idx >= 0)
> +		fd = rec->fds[idx];
> +	else
> +		fd = perf_data_file__fd(rec->session->file);
> +
> +	if (writen(fd, bf, size) < 0) {
>  		pr_err("failed to write perf data, error: %m\n");
>  		return -1;
>  	}
> @@ -69,7 +77,7 @@ static int process_synthesized_event(struct perf_tool *tool,
>  				     struct machine *machine __maybe_unused)
>  {
>  	struct record *rec = container_of(tool, struct record, tool);
> -	return record__write(rec, event, event->header.size);
> +	return record__write(rec, event, event->header.size, -1);
>  }
>  
>  static int record__mmap_read(struct record *rec, int idx)
> @@ -94,7 +102,7 @@ static int record__mmap_read(struct record *rec, int idx)
>  		size = md->mask + 1 - (old & md->mask);
>  		old += size;
>  
> -		if (record__write(rec, buf, size) < 0) {
> +		if (record__write(rec, buf, size, idx) < 0) {
>  			rc = -1;
>  			goto out;
>  		}
> @@ -104,7 +112,7 @@ static int record__mmap_read(struct record *rec, int idx)
>  	size = head - old;
>  	old += size;
>  
> -	if (record__write(rec, buf, size) < 0) {
> +	if (record__write(rec, buf, size, idx) < 0) {
>  		rc = -1;
>  		goto out;
>  	}
> @@ -151,6 +159,7 @@ static int record__process_auxtrace(struct perf_tool *tool,
>  	struct perf_data_file *file = &rec->file;
>  	size_t padding;
>  	u8 pad[8] = {0};
> +	int idx = event->auxtrace.idx;
>  
>  	if (!perf_data_file__is_pipe(file)) {
>  		off_t file_offset;
> @@ -171,11 +180,11 @@ static int record__process_auxtrace(struct perf_tool *tool,
>  	if (padding)
>  		padding = 8 - padding;
>  
> -	record__write(rec, event, event->header.size);
> -	record__write(rec, data1, len1);
> +	record__write(rec, event, event->header.size, idx);
> +	record__write(rec, data1, len1, idx);
>  	if (len2)
> -		record__write(rec, data2, len2);
> -	record__write(rec, &pad, padding);
> +		record__write(rec, data2, len2, idx);
> +	record__write(rec, &pad, padding, idx);
>  
>  	return 0;
>  }
> @@ -268,6 +277,110 @@ int auxtrace_record__snapshot_start(struct auxtrace_record *itr __maybe_unused)
>  
>  #endif
>  
> +#define INDEX_FILE_FMT  "%s.dir/perf.data.%d"
> +
> +static int record__create_index_files(struct record *rec, int nr_index)
> +{
> +	int i = 0;
> +	int ret = -1;
> +	char path[PATH_MAX];
> +	struct perf_data_file *file = &rec->file;
> +
> +	rec->fds = malloc(nr_index * sizeof(int));
> +	if (rec->fds == NULL)
> +		return -ENOMEM;
> +
> +	scnprintf(path, sizeof(path), "%s.dir", file->path);
> +	if (rm_rf(path) < 0 || mkdir(path, S_IRWXU) < 0)
> +		goto out_err;
> +
> +	for (i = 0; i < nr_index; i++) {
> +		scnprintf(path, sizeof(path), INDEX_FILE_FMT, file->path, i);
> +		ret = open(path, O_RDWR|O_CREAT|O_TRUNC, S_IRUSR|S_IWUSR);
> +		if (ret < 0)
> +			goto out_err;
> +
> +		rec->fds[i] = ret;
> +	}
> +	return 0;
> +
> +out_err:
> +	while (--i >= 1)
> +		close(rec->fds[i]);
> +	zfree(&rec->fds);
> +
> +	scnprintf(path, sizeof(path), "%s.dir", file->path);
> +	rm_rf(path);
> +
> +	return ret;
> +}
> +
> +static int record__merge_index_files(struct record *rec, int nr_index)
> +{
> +	int i;
> +	int ret = -ENOMEM;
> +	u64 offset;
> +	char path[PATH_MAX];
> +	struct perf_file_section *idx;
> +	struct perf_data_file *file = &rec->file;
> +	struct perf_session *session = rec->session;
> +	int output_fd = perf_data_file__fd(file);
> +
> +	/* +1 for header file itself */
> +	nr_index++;
> +
> +	idx = calloc(nr_index, sizeof(*idx));
> +	if (idx == NULL)
> +		goto out_close;
> +
> +	offset = lseek(output_fd, 0, SEEK_END);
> +
> +	idx[0].offset = session->header.data_offset;
> +	idx[0].size   = offset - idx[0].offset;
> +
> +	for (i = 1; i < nr_index; i++) {
> +		struct stat stbuf;
> +		int fd = rec->fds[i - 1];
> +
> +		ret = fstat(fd, &stbuf);
> +		if (ret < 0)
> +			goto out_close;
> +
> +		idx[i].offset = offset;
> +		idx[i].size   = stbuf.st_size;
> +
> +		offset += stbuf.st_size;
> +
> +		if (idx[i].size == 0)
> +			continue;
> +
> +		ret = copyfile_offset(fd, 0, output_fd, idx[i].offset,
> +				      idx[i].size);
> +		if (ret < 0)
> +			goto out_close;
> +	}
> +
> +	session->header.index = idx;
> +	session->header.nr_index = nr_index;
> +
> +	perf_has_index = true;
> +
> +	ret = 0;
> +
> +out_close:
> +	if (ret < 0)
> +		pr_err("failed to merge index files: %d\n", ret);
> +
> +	for (i = 0; i < nr_index - 1; i++)
> +		close(rec->fds[i]);
> +
> +	scnprintf(path, sizeof(path), "%s.dir", file->path);
> +	rm_rf(path);
> +
> +	zfree(&rec->fds);
> +	return ret;
> +}
> +
>  static int record__open(struct record *rec)
>  {
>  	char msg[512];
> @@ -306,7 +419,8 @@ try_again:
>  
>  	if (perf_evlist__mmap_ex(evlist, opts->mmap_pages, false,
>  				 opts->auxtrace_mmap_pages,
> -				 opts->auxtrace_snapshot_mode, false) < 0) {
> +				 opts->auxtrace_snapshot_mode,
> +				 opts->index) < 0) {
>  		if (errno == EPERM) {
>  			pr_err("Permission error mapping pages.\n"
>  			       "Consider increasing "
> @@ -323,6 +437,14 @@ try_again:
>  		goto out;
>  	}
>  
> +	if (opts->index) {
> +		rc = record__create_index_files(rec, evlist->nr_mmaps);
> +		if (rc < 0) {
> +			pr_err("failed to create index file: %d\n", rc);
> +			goto out;
> +		}
> +	}
> +
>  	session->evlist = evlist;
>  	perf_session__set_id_hdr_size(session);
>  out:
> @@ -347,7 +469,9 @@ static int process_buildids(struct record *rec)
>  	struct perf_data_file *file  = &rec->file;
>  	struct perf_session *session = rec->session;
>  
> -	if (file->size == 0)
> +	/* update file size after merging sample files with index */
> +	u64 size = lseek(perf_data_file__fd(file), 0, SEEK_END);
> +	if (size == 0)
>  		return 0;
>  
>  	/*
> @@ -414,6 +538,13 @@ static int record__mmap_read_all(struct record *rec)
>  			}
>  		}
>  
> +		if (rec->evlist->track_mmap && rec->evlist->track_mmap[i].base) {
> +			if (record__mmap_read(rec, track_mmap_idx(i)) != 0) {
> +				rc = -1;
> +				goto out;
> +			}
> +		}
> +
>  		if (mm->base && !rec->opts.auxtrace_snapshot_mode &&
>  		    record__auxtrace_mmap_read(rec, mm) != 0) {
>  			rc = -1;
> @@ -426,7 +557,8 @@ static int record__mmap_read_all(struct record *rec)
>  	 * at least one event.
>  	 */
>  	if (bytes_written != rec->bytes_written)
> -		rc = record__write(rec, &finished_round_event, sizeof(finished_round_event));
> +		rc = record__write(rec, &finished_round_event,
> +				   sizeof(finished_round_event), -1);
>  
>  out:
>  	return rc;
> @@ -452,7 +584,8 @@ static void record__init_features(struct record *rec)
>  	if (!rec->opts.full_auxtrace)
>  		perf_header__clear_feat(&session->header, HEADER_AUXTRACE);
>  
> -	perf_header__clear_feat(&session->header, HEADER_DATA_INDEX);
> +	if (!rec->opts.index)
> +		perf_header__clear_feat(&session->header, HEADER_DATA_INDEX);
>  }
>  
>  static volatile int workload_exec_errno;
> @@ -520,6 +653,11 @@ static int __cmd_record(struct record *rec, int argc, const char **argv)
>  		}
>  	}
>  
> +	if (file->is_pipe && opts->index) {
> +		pr_warning("Indexing is disabled for pipe output\n");
> +		opts->index = false;
> +	}
> +
>  	if (record__open(rec) != 0) {
>  		err = -1;
>  		goto out_child;
> @@ -753,6 +891,9 @@ out_child:
>  		rec->session->header.data_size += rec->bytes_written;
>  		file->size = lseek(perf_data_file__fd(file), 0, SEEK_CUR);
>  
> +		if (rec->opts.index)
> +			record__merge_index_files(rec, rec->evlist->nr_mmaps);
> +
>  		if (!rec->no_buildid) {
>  			process_buildids(rec);
>  			/*
> @@ -1119,6 +1260,8 @@ struct option __record_options[] = {
>  			"per thread proc mmap processing timeout in ms"),
>  	OPT_BOOLEAN(0, "switch-events", &record.opts.record_switch_events,
>  		    "Record context switch events"),
> +	OPT_BOOLEAN(0, "index", &record.opts.index,
> +		    "make index for sample data to speed-up processing"),
>  	OPT_END()
>  };
>  
> @@ -1186,6 +1329,15 @@ int cmd_record(int argc, const char **argv, const char *prefix __maybe_unused)
>  		goto out_symbol_exit;
>  	}
>  
> +	if (rec->opts.index) {
> +		if (!rec->opts.sample_time) {
> +			pr_err("Sample timestamp is required for indexing\n");
> +			goto out_symbol_exit;
> +		}
> +
> +		perf_evlist__add_dummy_tracking(rec->evlist);
> +	}
> +
>  	if (rec->opts.target.tid && !rec->opts.no_inherit_set)
>  		rec->opts.no_inherit = true;
>  
> diff --git a/tools/perf/perf.h b/tools/perf/perf.h
> index f4b4d7d8752c..df7c208abb74 100644
> --- a/tools/perf/perf.h
> +++ b/tools/perf/perf.h
> @@ -60,6 +60,7 @@ struct record_opts {
>  	bool	     full_auxtrace;
>  	bool	     auxtrace_snapshot_mode;
>  	bool	     record_switch_events;
> +	bool	     index;
>  	unsigned int freq;
>  	unsigned int mmap_pages;
>  	unsigned int auxtrace_mmap_pages;
> diff --git a/tools/perf/util/header.c b/tools/perf/util/header.c
> index c357f7f47d32..13ba1402ec1b 100644
> --- a/tools/perf/util/header.c
> +++ b/tools/perf/util/header.c
> @@ -2706,6 +2706,8 @@ int perf_session__read_header(struct perf_session *session)
>  						   session->tevent.pevent))
>  		goto out_delete_evlist;
>  
> +	perf_has_index = perf_header__has_feat(&session->header, HEADER_DATA_INDEX);
> +
>  	return 0;
>  out_errno:
>  	return -errno;
> diff --git a/tools/perf/util/session.c b/tools/perf/util/session.c
> index 91fa9647f565..7546c4d147b9 100644
> --- a/tools/perf/util/session.c
> +++ b/tools/perf/util/session.c
> @@ -182,6 +182,7 @@ void perf_session__delete(struct perf_session *session)
>  	machines__exit(&session->machines);
>  	if (session->file)
>  		perf_data_file__close(session->file);
> +	free(session->header.index);
>  	free(session);
>  }
>  
> -- 
> 2.6.0

  reply	other threads:[~2015-10-02 18:59 UTC|newest]

Thread overview: 63+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2015-10-02  5:18 [RFC/PATCH 00/38] perf tools: Speed-up perf report by using multi thread (v5) Namhyung Kim
2015-10-02  5:18 ` [RFC/PATCH 01/38] perf tools: Use a software dummy event to track task/mmap events Namhyung Kim
2015-10-05 12:51   ` Jiri Olsa
2015-10-06  8:31     ` Namhyung Kim
2015-10-02  5:18 ` [RFC/PATCH 02/38] perf tools: Save mmap_param.len instead of mask Namhyung Kim
2015-10-02 18:44   ` Arnaldo Carvalho de Melo
2015-10-06  8:34     ` Namhyung Kim
2015-10-08 10:17   ` Jiri Olsa
2015-10-09  6:03     ` Namhyung Kim
2015-10-12 12:42       ` Jiri Olsa
2015-10-02  5:18 ` [RFC/PATCH 03/38] perf tools: Move auxtrace_mmap field to struct perf_evlist Namhyung Kim
2015-10-02 18:45   ` Arnaldo Carvalho de Melo
2015-10-05 11:29     ` Adrian Hunter
2015-10-06  9:03       ` Namhyung Kim
2015-10-06  9:26         ` Adrian Hunter
2015-10-07  9:06           ` Namhyung Kim
2015-10-08 16:07             ` Adrian Hunter
2015-10-09  7:54               ` Namhyung Kim
2015-10-06  8:56     ` Namhyung Kim
2015-10-05 13:14   ` Jiri Olsa
2015-10-06  8:40     ` Namhyung Kim
2015-10-08 10:18   ` Jiri Olsa
2015-10-02  5:18 ` [RFC/PATCH 04/38] perf tools: pass perf_mmap desc directly Namhyung Kim
2015-10-02 18:47   ` Arnaldo Carvalho de Melo
2015-10-02  5:18 ` [RFC/PATCH 05/38] perf tools: Create separate mmap for dummy tracking event Namhyung Kim
2015-10-02  5:18 ` [RFC/PATCH 06/38] perf tools: Extend perf_evlist__mmap_ex() to use track mmap Namhyung Kim
2015-10-02  5:18 ` [RFC/PATCH 07/38] perf tools: Add HEADER_DATA_INDEX feature Namhyung Kim
2015-10-02  5:18 ` [RFC/PATCH 08/38] perf tools: Handle indexed data file properly Namhyung Kim
2015-10-02  5:18 ` [RFC/PATCH 09/38] perf record: Add --index option for building index table Namhyung Kim
2015-10-02 18:58   ` Arnaldo Carvalho de Melo [this message]
2015-10-05 13:46   ` Jiri Olsa
2015-10-07  8:21     ` Namhyung Kim
2015-10-07 12:10       ` Jiri Olsa
2015-10-02  5:18 ` [RFC/PATCH 10/38] perf report: Skip dummy tracking event Namhyung Kim
2015-10-02  5:18 ` [RFC/PATCH 11/38] perf tools: Introduce thread__comm(_str)_by_time() helpers Namhyung Kim
2015-10-02  5:18 ` [RFC/PATCH 12/38] perf tools: Add a test case for thread comm handling Namhyung Kim
2015-10-02  5:18 ` [RFC/PATCH 13/38] perf tools: Use thread__comm_by_time() when adding hist entries Namhyung Kim
2015-10-02  5:18 ` [RFC/PATCH 14/38] perf tools: Convert dead thread list into rbtree Namhyung Kim
2015-10-02  5:18 ` [RFC/PATCH 15/38] perf tools: Introduce machine__find*_thread_by_time() Namhyung Kim
2015-10-08 12:20   ` Jiri Olsa
2015-10-09  6:04     ` Namhyung Kim
2015-10-02  5:18 ` [RFC/PATCH 16/38] perf tools: Add a test case for timed thread handling Namhyung Kim
2015-10-02  5:18 ` [RFC/PATCH 17/38] perf tools: Maintain map groups list in a leader thread Namhyung Kim
2015-10-08 12:51   ` Jiri Olsa
2015-10-09  6:24     ` Namhyung Kim
2015-10-08 12:58   ` Jiri Olsa
2015-10-09  6:58     ` Namhyung Kim
2015-10-12 12:43       ` Jiri Olsa
2015-10-02  5:18 ` [RFC/PATCH 18/38] perf tools: Introduce thread__find_addr_location_by_time() and friends Namhyung Kim
2015-10-12 13:35   ` Jiri Olsa
2015-10-02  5:19 ` [RFC/PATCH 19/38] perf callchain: Use " Namhyung Kim
2015-10-02  5:19 ` [RFC/PATCH 20/38] perf tools: Add a test case for timed map groups handling Namhyung Kim
2015-10-02  5:19 ` [RFC/PATCH 21/38] perf tools: Save timestamp of a map creation Namhyung Kim
2015-10-02  5:19 ` [RFC/PATCH 22/38] perf tools: Introduce map_groups__{insert,find}_by_time() Namhyung Kim
2015-10-02  5:19 ` [RFC/PATCH 23/38] perf tools: Use map_groups__find_addr_by_time() Namhyung Kim
2015-10-02  5:19 ` [RFC/PATCH 24/38] perf tools: Add testcase for managing maps with time Namhyung Kim
2015-10-02  5:19 ` [RFC/PATCH 25/38] perf callchain: Maintain libunwind's address space in map_groups Namhyung Kim
2015-10-02  5:19 ` [RFC/PATCH 26/38] perf session: Pass struct events stats to event processing functions Namhyung Kim
2015-10-02  5:19 ` [RFC/PATCH 27/38] perf hists: Pass hists struct to hist_entry_iter struct Namhyung Kim
2015-10-02  5:19 ` [RFC/PATCH 28/38] perf tools: Move BUILD_ID_SIZE definition to perf.h Namhyung Kim
2015-10-02  5:22 ` Namhyung Kim
2015-10-02  6:58 ` Namhyung Kim
2015-10-12 14:32   ` Jiri Olsa

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20151002185855.GD4736@kernel.org \
    --to=acme@kernel.org \
    --cc=a.p.zijlstra@chello.nl \
    --cc=andi@firstfloor.org \
    --cc=dsahern@gmail.com \
    --cc=eranian@google.com \
    --cc=fweisbec@gmail.com \
    --cc=jolsa@redhat.com \
    --cc=linux-kernel@vger.kernel.org \
    --cc=mingo@kernel.org \
    --cc=namhyung@kernel.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.