All of lore.kernel.org
 help / color / mirror / Atom feed
* [PATCH v7 0/2]: perf: reduce data loss when profiling highly parallel CPU bound workloads
@ 2018-09-05  7:16 Alexey Budankov
  2018-09-05  7:19 ` [PATCH v7 1/2]: perf util: map data buffer for preserving collected data Alexey Budankov
                   ` (2 more replies)
  0 siblings, 3 replies; 23+ messages in thread
From: Alexey Budankov @ 2018-09-05  7:16 UTC (permalink / raw)
  To: Ingo Molnar, Peter Zijlstra, Arnaldo Carvalho de Melo
  Cc: Alexander Shishkin, Jiri Olsa, Namhyung Kim, Andi Kleen, linux-kernel


Currently in record mode the tool implements trace writing serially. 
The algorithm loops over mapped per-cpu data buffers and stores 
ready data chunks into a trace file using write() system call.

At some circumstances the kernel may lack free space in a buffer 
because the other buffer's half is not yet written to disk due to 
some other buffer's data writing by the tool at the moment.

Thus serial trace writing implementation may cause the kernel 
to loose profiling data and that is what observed when profiling 
highly parallel CPU bound workloads on machines with big number 
of cores.

Experiment with profiling matrix multiplication code executing 128 
threads on Intel Xeon Phi (KNM) with 272 cores, like below,
demonstrates data loss metrics value of 98%:

/usr/bin/time perf record -o /tmp/perf-ser.data -a -N -B -T -R -g \
    --call-graph dwarf,1024 --user-regs=IP,SP,BP \
    --switch-events -e cycles,instructions,ref-cycles,software/period=1,name=cs,config=0x3/Duk -- \
    matrix.gcc

Data loss metrics is the ratio lost_time/elapsed_time where 
lost_time is the sum of time intervals containing PERF_RECORD_LOST 
records and elapsed_time is the elapsed application run time 
under profiling.

Applying asynchronous trace streaming thru Posix AIO API
(http://man7.org/linux/man-pages/man7/aio.7.html) 
lowers data loss metrics value providing 2x improvement -
lowering 98% loss to almost 0%.

---
 Alexey Budankov (2):
        perf util: map data buffer for preserving collected data
	perf record: enable asynchronous trace writing
 
 tools/perf/builtin-record.c | 197 +++++++++++++++++++++++++++++++++++++++++++-
 tools/perf/perf.h           |   1 +
 tools/perf/util/evlist.c    |   7 +-
 tools/perf/util/evlist.h    |   3 +-
 tools/perf/util/mmap.c      | 110 +++++++++++++++++++++----
 tools/perf/util/mmap.h      |  10 ++-
 6 files changed, 302 insertions(+), 26 deletions(-)

---
 Changes in v7:
 - implemented handling record.aio setting from perfconfig file
 Changes in v6:
 - adjusted setting of priorities for cblocks;
 - handled errno == EAGAIN case from aio_write() return;
 Changes in v5:
 - resolved livelock on perf record -e intel_pt// -- dd if=/dev/zero of=/dev/null count=100000
 - data loss metrics decreased from 25% to 2x in trialed configuration;
 - reshaped layout of data structures;
 - implemented --aio option;
 - avoided nanosleep() prior calling aio_suspend();
 - switched to per-cpu aio multi buffer record__aio_sync();
 - record_mmap_read_sync() now does global sync just before 
   switching trace file or collection stop;
 Changes in v4:
 - converted mmap()/munmap() to malloc()/free() for mmap->data buffer management
 - converted void *bf to struct perf_mmap *md in signatures
 - written comment in perf_mmap__push() just before perf_mmap__get();
 - written comment in record__mmap_read_sync() on possible restarting 
   of aio_write() operation and releasing perf_mmap object after all;
 - added perf_mmap__put() for the cases of failed aio_write();
 Changes in v3:
 - written comments about nanosleep(0.5ms) call prior aio_suspend()
   to cope with intrusiveness of its implementation in glibc;
 - written comments about rationale behind coping profiling data 
   into mmap->data buffer;
 Changes in v2:
 - converted zalloc() to calloc() for allocation of mmap_aio array,
 - cleared typo and adjusted fallback branch code;

^ permalink raw reply	[flat|nested] 23+ messages in thread

* [PATCH v7 1/2]: perf util: map data buffer for preserving collected data
  2018-09-05  7:16 [PATCH v7 0/2]: perf: reduce data loss when profiling highly parallel CPU bound workloads Alexey Budankov
@ 2018-09-05  7:19 ` Alexey Budankov
  2018-09-06 11:04   ` Jiri Olsa
  2018-09-06 11:04   ` Jiri Olsa
  2018-09-05  7:39 ` [PATCH v7 2/2]: perf record: enable asynchronous trace writing Alexey Budankov
  2018-09-05 11:28 ` [PATCH v7 0/2]: perf: reduce data loss when profiling highly parallel CPU bound workloads Jiri Olsa
  2 siblings, 2 replies; 23+ messages in thread
From: Alexey Budankov @ 2018-09-05  7:19 UTC (permalink / raw)
  To: Ingo Molnar, Peter Zijlstra, Arnaldo Carvalho de Melo
  Cc: Alexander Shishkin, Jiri Olsa, Namhyung Kim, Andi Kleen, linux-kernel


The map->data buffers are used to preserve map->base profiling data 
for writing to disk. AIO map->cblocks are used to queue corresponding 
map->data buffers for asynchronous writing. map->cblocks objects are 
located in the last page of every map->data buffer.

Signed-off-by: Alexey Budankov <alexey.budankov@linux.intel.com>
---
 Changes in v7:
  - implemented handling record.aio setting from perfconfig file
 Changes in v6:
  - adjusted setting of priorities for cblocks;
 Changes in v5:
  - reshaped layout of data structures;
  - implemented --aio option;
 Changes in v4:
  - converted mmap()/munmap() to malloc()/free() for mmap->data buffer management 
 Changes in v2:
  - converted zalloc() to calloc() for allocation of mmap_aio array,
  - cleared typo and adjusted fallback branch code;
---
 tools/perf/builtin-record.c | 15 ++++++++++++-
 tools/perf/perf.h           |  1 +
 tools/perf/util/evlist.c    |  7 +++---
 tools/perf/util/evlist.h    |  3 ++-
 tools/perf/util/mmap.c      | 53 +++++++++++++++++++++++++++++++++++++++++++++
 tools/perf/util/mmap.h      |  6 ++++-
 6 files changed, 79 insertions(+), 6 deletions(-)

diff --git a/tools/perf/builtin-record.c b/tools/perf/builtin-record.c
index 22ebeb92ac51..f17a6f9cb1ba 100644
--- a/tools/perf/builtin-record.c
+++ b/tools/perf/builtin-record.c
@@ -326,7 +326,8 @@ static int record__mmap_evlist(struct record *rec,
 
 	if (perf_evlist__mmap_ex(evlist, opts->mmap_pages,
 				 opts->auxtrace_mmap_pages,
-				 opts->auxtrace_snapshot_mode) < 0) {
+				 opts->auxtrace_snapshot_mode,
+				 opts->nr_cblocks) < 0) {
 		if (errno == EPERM) {
 			pr_err("Permission error mapping pages.\n"
 			       "Consider increasing "
@@ -1287,6 +1288,8 @@ static int perf_record_config(const char *var, const char *value, void *cb)
 		var = "call-graph.record-mode";
 		return perf_default_config(var, value, cb);
 	}
+	if (!strcmp(var, "record.aio"))
+		rec->opts.nr_cblocks = strtol(value, NULL, 0);
 
 	return 0;
 }
@@ -1519,6 +1522,7 @@ static struct record record = {
 			.default_per_cpu = true,
 		},
 		.proc_map_timeout     = 500,
+		.nr_cblocks	      = 2
 	},
 	.tool = {
 		.sample		= process_sample_event,
@@ -1678,6 +1682,8 @@ static struct option __record_options[] = {
 			  "signal"),
 	OPT_BOOLEAN(0, "dry-run", &dry_run,
 		    "Parse options then exit"),
+	OPT_INTEGER(0, "aio", &record.opts.nr_cblocks,
+		    "asynchronous trace write operations (min: 1, max: 32, default: 2)"),
 	OPT_END()
 };
 
@@ -1870,6 +1876,13 @@ int cmd_record(int argc, const char **argv)
 		goto out;
 	}
 
+	if (!(1 <= rec->opts.nr_cblocks && rec->opts.nr_cblocks <= 32))
+		rec->opts.nr_cblocks = 2;
+
+	if (verbose > 0)
+		pr_info("AIO trace writes: %d\n", rec->opts.nr_cblocks);
+
+
 	err = __cmd_record(&record, argc, argv);
 out:
 	perf_evlist__delete(rec->evlist);
diff --git a/tools/perf/perf.h b/tools/perf/perf.h
index 21bf7f5a3cf5..0a1ae2ae567a 100644
--- a/tools/perf/perf.h
+++ b/tools/perf/perf.h
@@ -82,6 +82,7 @@ struct record_opts {
 	bool         use_clockid;
 	clockid_t    clockid;
 	unsigned int proc_map_timeout;
+	int	     nr_cblocks;
 };
 
 struct option;
diff --git a/tools/perf/util/evlist.c b/tools/perf/util/evlist.c
index e7a4b31a84fb..08be79650a85 100644
--- a/tools/perf/util/evlist.c
+++ b/tools/perf/util/evlist.c
@@ -1018,7 +1018,8 @@ int perf_evlist__parse_mmap_pages(const struct option *opt, const char *str,
  */
 int perf_evlist__mmap_ex(struct perf_evlist *evlist, unsigned int pages,
 			 unsigned int auxtrace_pages,
-			 bool auxtrace_overwrite)
+			 bool auxtrace_overwrite,
+			 int nr_cblocks)
 {
 	struct perf_evsel *evsel;
 	const struct cpu_map *cpus = evlist->cpus;
@@ -1028,7 +1029,7 @@ int perf_evlist__mmap_ex(struct perf_evlist *evlist, unsigned int pages,
 	 * Its value is decided by evsel's write_backward.
 	 * So &mp should not be passed through const pointer.
 	 */
-	struct mmap_params mp;
+	struct mmap_params mp = { .nr_cblocks = nr_cblocks };
 
 	if (!evlist->mmap)
 		evlist->mmap = perf_evlist__alloc_mmap(evlist, false);
@@ -1060,7 +1061,7 @@ int perf_evlist__mmap_ex(struct perf_evlist *evlist, unsigned int pages,
 
 int perf_evlist__mmap(struct perf_evlist *evlist, unsigned int pages)
 {
-	return perf_evlist__mmap_ex(evlist, pages, 0, false);
+	return perf_evlist__mmap_ex(evlist, pages, 0, false, 2);
 }
 
 int perf_evlist__create_maps(struct perf_evlist *evlist, struct target *target)
diff --git a/tools/perf/util/evlist.h b/tools/perf/util/evlist.h
index dc66436add98..a94d3c613254 100644
--- a/tools/perf/util/evlist.h
+++ b/tools/perf/util/evlist.h
@@ -162,7 +162,8 @@ unsigned long perf_event_mlock_kb_in_pages(void);
 
 int perf_evlist__mmap_ex(struct perf_evlist *evlist, unsigned int pages,
 			 unsigned int auxtrace_pages,
-			 bool auxtrace_overwrite);
+			 bool auxtrace_overwrite,
+			 int nr_cblocks);
 int perf_evlist__mmap(struct perf_evlist *evlist, unsigned int pages);
 void perf_evlist__munmap(struct perf_evlist *evlist);
 
diff --git a/tools/perf/util/mmap.c b/tools/perf/util/mmap.c
index fc832676a798..384d17cd1379 100644
--- a/tools/perf/util/mmap.c
+++ b/tools/perf/util/mmap.c
@@ -155,6 +155,14 @@ void __weak auxtrace_mmap_params__set_idx(struct auxtrace_mmap_params *mp __mayb
 
 void perf_mmap__munmap(struct perf_mmap *map)
 {
+	int i;
+	if (map->data) {
+		for (i = 0; i < map->nr_cblocks; ++i)
+			zfree(&(map->data[i]));
+		zfree(&(map->data));
+	}
+	if (map->cblocks)
+		zfree(&(map->cblocks));
 	if (map->base != NULL) {
 		munmap(map->base, perf_mmap__mmap_len(map));
 		map->base = NULL;
@@ -166,6 +174,7 @@ void perf_mmap__munmap(struct perf_mmap *map)
 
 int perf_mmap__mmap(struct perf_mmap *map, struct mmap_params *mp, int fd)
 {
+	int i;
 	/*
 	 * The last one will be done at perf_mmap__consume(), so that we
 	 * make sure we don't prevent tools from consuming every last event in
@@ -190,6 +199,50 @@ int perf_mmap__mmap(struct perf_mmap *map, struct mmap_params *mp, int fd)
 		map->base = NULL;
 		return -1;
 	}
+	map->nr_cblocks = mp->nr_cblocks;
+	map->cblocks = calloc(map->nr_cblocks, sizeof(struct aiocb*));
+	if (!map->cblocks) {
+		pr_debug2("failed to allocate perf event data buffers, error %d\n",
+				errno);
+		return -1;
+	}
+	map->data = calloc(map->nr_cblocks, sizeof(void*));
+	if (map->data) {
+		int delta_max = sysconf(_SC_AIO_PRIO_DELTA_MAX);
+		for (i = 0; i < map->nr_cblocks; ++i) {
+			map->data[i] = malloc(perf_mmap__mmap_len(map));
+			if (map->data[i]) {
+				int prio;
+				unsigned char *data = map->data[i];
+				map->cblocks[i] = (struct aiocb *)&data[map->mask + 1];
+				memset(map->cblocks[i], 0, sizeof(struct aiocb));
+				/* Use cblock.aio_fildes value different from -1
+				 * to denote started aio write operation on the
+				 * cblock so it requires explicit record__aio_sync()
+				 * call prior the cblock may be reused again.
+				 */
+				map->cblocks[i]->aio_fildes = -1;
+				/* Allocate cblocks with decreasing priority to
+				 * have faster aio_write() calls because queued
+				 * requests are kept in separate per-prio queues
+				 * and adding a new request iterates thru shorter
+				 * per-prio list.
+				 */
+				prio = delta_max - i;
+				if (prio < 0)
+					prio = 0;
+				map->cblocks[i]->aio_reqprio = prio;
+			} else {
+				pr_debug2("failed to allocate perf event data buffer, error %d\n",
+						errno);
+				return -1;
+			}
+		}
+	} else {
+		pr_debug2("failed to alloc perf event data buffers, error %d\n",
+				errno);
+		return -1;
+	}
 	map->fd = fd;
 
 	if (auxtrace_mmap__mmap(&map->auxtrace_mmap,
diff --git a/tools/perf/util/mmap.h b/tools/perf/util/mmap.h
index d82294db1295..4a9bb0ecae4f 100644
--- a/tools/perf/util/mmap.h
+++ b/tools/perf/util/mmap.h
@@ -6,6 +6,7 @@
 #include <linux/types.h>
 #include <asm/barrier.h>
 #include <stdbool.h>
+#include <aio.h>
 #include "auxtrace.h"
 #include "event.h"
 
@@ -25,6 +26,9 @@ struct perf_mmap {
 	bool		 overwrite;
 	struct auxtrace_mmap auxtrace_mmap;
 	char		 event_copy[PERF_SAMPLE_MAX_SIZE] __aligned(8);
+	void 		 **data;
+	struct aiocb	 **cblocks;
+	int 		 nr_cblocks;
 };
 
 /*
@@ -56,7 +60,7 @@ enum bkw_mmap_state {
 };
 
 struct mmap_params {
-	int			    prot, mask;
+	int			    prot, mask, nr_cblocks;
 	struct auxtrace_mmap_params auxtrace_mp;
 };
 


^ permalink raw reply related	[flat|nested] 23+ messages in thread

* [PATCH v7 2/2]: perf record: enable asynchronous trace writing
  2018-09-05  7:16 [PATCH v7 0/2]: perf: reduce data loss when profiling highly parallel CPU bound workloads Alexey Budankov
  2018-09-05  7:19 ` [PATCH v7 1/2]: perf util: map data buffer for preserving collected data Alexey Budankov
@ 2018-09-05  7:39 ` Alexey Budankov
  2018-09-06 11:04   ` Jiri Olsa
                     ` (3 more replies)
  2018-09-05 11:28 ` [PATCH v7 0/2]: perf: reduce data loss when profiling highly parallel CPU bound workloads Jiri Olsa
  2 siblings, 4 replies; 23+ messages in thread
From: Alexey Budankov @ 2018-09-05  7:39 UTC (permalink / raw)
  To: Ingo Molnar, Peter Zijlstra, Arnaldo Carvalho de Melo
  Cc: Alexander Shishkin, Jiri Olsa, Namhyung Kim, Andi Kleen, linux-kernel


record__aio_sync() allocates index of free map->data buffer for 
a cpu buffer or blocks till completion of any started operation 
and then proceeds.

Trace file offset is calculated and updated linearly prior
enqueuing aio write at record__pushfn().

record__mmap_read_sync() implements a barrier for all incomplete
aio write requests per-cpu per aio data.

Signed-off-by: Alexey Budankov <alexey.budankov@linux.intel.com>
---
 Changes in v6:
 - handled errno == EAGAIN case from aio_write();
 Changes in v5:
 - data loss metrics decreased from 25% to 2x in trialed configuration;
 - avoided nanosleep() prior calling aio_suspend();
 - switched to per cpu multi record__aio_sync() aio
 - record_mmap_read_sync() now does global barrier just before 
   switching trace file or collection stop;
 - resolved livelock on perf record -e intel_pt// -- dd if=/dev/zero of=/dev/null count=100000
 Changes in v4:
 - converted void *bf to struct perf_mmap *md in signatures
 - written comment in perf_mmap__push() just before perf_mmap__get();
 - written comment in record__mmap_read_sync() on possible restarting 
   of aio_write() operation and releasing perf_mmap object after all;
 - added perf_mmap__put() for the cases of failed aio_write();
 Changes in v3:
 - written comments about nanosleep(0.5ms) call prior aio_suspend()
   to cope with intrusiveness of its implementation in glibc;
 - written comments about rationale behind coping profiling data 
   into mmap->data buffer;
---
 tools/perf/builtin-record.c | 182 +++++++++++++++++++++++++++++++++++++++++++-
 tools/perf/util/mmap.c      |  57 ++++++++++----
 tools/perf/util/mmap.h      |   4 +-
 3 files changed, 223 insertions(+), 20 deletions(-)

diff --git a/tools/perf/builtin-record.c b/tools/perf/builtin-record.c
index f17a6f9cb1ba..dbafc60c6bd7 100644
--- a/tools/perf/builtin-record.c
+++ b/tools/perf/builtin-record.c
@@ -121,6 +121,91 @@ static int record__write(struct record *rec, void *bf, size_t size)
 	return 0;
 }
 
+static int record__aio_write(struct aiocb *cblock, int trace_fd,
+		void *buf, size_t size, off_t off)
+{
+	int rc;
+
+	cblock->aio_fildes = trace_fd;
+	cblock->aio_buf    = buf;
+	cblock->aio_nbytes = size;
+	cblock->aio_offset = off;
+	cblock->aio_sigevent.sigev_notify = SIGEV_NONE;
+
+	do {
+		rc = aio_write(cblock);
+		if (rc == 0) {
+			break;
+		} else if (errno != EAGAIN) {
+			cblock->aio_fildes = -1;
+			pr_err("failed to queue perf data, error: %m\n");
+			break;
+		}
+	} while (1);
+
+	return rc;
+}
+
+static int record__aio_sync(struct perf_mmap *md)
+{
+	void *rem_buf;
+	off_t rem_off;
+	size_t rem_size;
+	ssize_t aio_ret, written;
+	int i, aio_errno, do_suspend, idx = -1;
+	struct aiocb **cblocks = md->cblocks;
+	struct timespec timeout = { 0, 1000 * 1000  * 1 }; // 1ms
+
+	for (i = 0; i < md->nr_cblocks; ++i)
+		if (cblocks[i]->aio_fildes == -1)
+			return i;
+
+	do {
+		do_suspend = 0;
+		if (aio_suspend((const struct aiocb**)cblocks, md->nr_cblocks, &timeout)) {
+			if (!(errno == EAGAIN || errno == EINTR))
+				pr_err("failed to sync perf data, error: %m\n");
+			do_suspend = 1;
+			continue;
+		}
+		for (i = 0; i < md->nr_cblocks; ++i) {
+			aio_errno = aio_error(cblocks[i]);
+			if (aio_errno == EINPROGRESS)
+				continue;
+			written = aio_ret = aio_return(cblocks[i]);
+			if (aio_ret < 0) {
+				if (aio_errno != EINTR)
+					pr_err("failed to write perf data, error: %m\n");
+				written = 0;
+			}
+			rem_size = cblocks[i]->aio_nbytes - written;
+			if (rem_size == 0) {
+				cblocks[i]->aio_fildes = -1;
+				/* md->refcount is incremented in perf_mmap__push() for
+				 * every enqueued aio write request so decrement it because
+				 * the request is now complete.
+				 */
+				perf_mmap__put(md);
+				if (idx == -1)
+					idx = i;
+			} else {
+				/* aio write request may require restart with the
+				 * reminder if the kernel didn't write whole
+				 * chunk at once.
+				 */
+				rem_off = cblocks[i]->aio_offset + written;
+				rem_buf = (void*)(cblocks[i]->aio_buf + written);
+				record__aio_write(cblocks[i], cblocks[i]->aio_fildes,
+						rem_buf, rem_size, rem_off);
+			}
+		}
+		if (idx == -1)
+			do_suspend = 1;
+	} while (do_suspend);
+
+	return idx;
+}
+
 static int process_synthesized_event(struct perf_tool *tool,
 				     union perf_event *event,
 				     struct perf_sample *sample __maybe_unused,
@@ -130,12 +215,28 @@ static int process_synthesized_event(struct perf_tool *tool,
 	return record__write(rec, event, event->header.size);
 }
 
-static int record__pushfn(void *to, void *bf, size_t size)
+static int record__pushfn(void *to, struct aiocb *cblock, void *data, size_t size)
 {
+	off_t off;
 	struct record *rec = to;
+	int ret, trace_fd = rec->session->data->file.fd;
 
 	rec->samples++;
-	return record__write(rec, bf, size);
+
+	off =
+	lseek(trace_fd, 0, SEEK_CUR);
+	lseek(trace_fd, off + size, SEEK_SET);
+
+	ret = record__aio_write(cblock, trace_fd, data, size, off);
+	if (!ret) {
+		rec->bytes_written += size;
+		if (switch_output_size(rec))
+			trigger_hit(&switch_output_trigger);
+	} else {
+		lseek(trace_fd, off, SEEK_SET);
+	}
+
+	return ret;
 }
 
 static volatile int done;
@@ -511,6 +612,70 @@ static struct perf_event_header finished_round_event = {
 	.type = PERF_RECORD_FINISHED_ROUND,
 };
 
+static void record__aio_complete(struct perf_mmap *md, int i)
+{
+	int aio_errno;
+	void *rem_buf;
+	off_t rem_off;
+	size_t rem_size;
+	ssize_t aio_ret, written;
+	struct aiocb *cblock = md->cblocks[i];
+	struct timespec timeout = { 0, 1000 * 1000  * 1 }; // 1ms
+
+	if (cblock->aio_fildes == -1)
+		return;
+
+	do {
+		if (aio_suspend((const struct aiocb**)&cblock, 1, &timeout)) {
+			if (!(errno == EAGAIN || errno == EINTR))
+				pr_err("failed to sync perf data, error: %m\n");
+			continue;
+		}
+		aio_errno = aio_error(cblock);
+		if (aio_errno == EINPROGRESS)
+			continue;
+		written = aio_ret = aio_return(cblock);
+		if (aio_ret < 0) {
+			if (!(aio_errno == EINTR))
+				pr_err("failed to write perf data, error: %m\n");
+			written = 0;
+		}
+		rem_size = cblock->aio_nbytes - written;
+		if (rem_size == 0) {
+			cblock->aio_fildes = -1;
+			/* md->refcount is incremented in perf_mmap__push() for
+			 * every enqueued aio write request so decrement it because
+			 * the request is now complete.
+			 */
+			perf_mmap__put(md);
+			return;
+		} else {
+			/* aio write request may require restart with the
+			 * reminder if the kernel didn't write whole
+			 * chunk at once.
+			 */
+			rem_off = cblock->aio_offset + written;
+			rem_buf = (void*)(cblock->aio_buf + written);
+			record__aio_write(cblock, cblock->aio_fildes,
+					rem_buf, rem_size, rem_off);
+		}
+	} while (1);
+}
+
+static void record__mmap_read_sync(struct record *rec)
+{
+	int i, j;
+	struct perf_evlist *evlist = rec->evlist;
+	struct perf_mmap *maps = evlist->mmap;
+
+	for (i = 0; i < evlist->nr_mmaps; i++) {
+		struct perf_mmap *map = &maps[i];
+		if (map->base)
+			for (j = 0; j < map->nr_cblocks; ++j)
+				record__aio_complete(map, j);
+	}
+}
+
 static int record__mmap_read_evlist(struct record *rec, struct perf_evlist *evlist,
 				    bool overwrite)
 {
@@ -533,7 +698,13 @@ static int record__mmap_read_evlist(struct record *rec, struct perf_evlist *evli
 		struct auxtrace_mmap *mm = &maps[i].auxtrace_mmap;
 
 		if (maps[i].base) {
-			if (perf_mmap__push(&maps[i], rec, record__pushfn) != 0) {
+			/* Call record__aio_sync() to allocate free
+			 * map->data buffer or wait till one of the
+			 * buffers becomes available after previous
+			 * aio write request.
+			 */
+			if (perf_mmap__push(&maps[i], rec, record__aio_sync(&maps[i]),
+				record__pushfn) != 0) {
 				rc = -1;
 				goto out;
 			}
@@ -1055,6 +1226,7 @@ static int __cmd_record(struct record *rec, int argc, const char **argv)
 			perf_evlist__toggle_bkw_mmap(rec->evlist, BKW_MMAP_DATA_PENDING);
 
 		if (record__mmap_read_all(rec) < 0) {
+			record__mmap_read_sync(rec);
 			trigger_error(&auxtrace_snapshot_trigger);
 			trigger_error(&switch_output_trigger);
 			err = -1;
@@ -1066,6 +1238,7 @@ static int __cmd_record(struct record *rec, int argc, const char **argv)
 			if (!trigger_is_error(&auxtrace_snapshot_trigger))
 				record__read_auxtrace_snapshot(rec);
 			if (trigger_is_error(&auxtrace_snapshot_trigger)) {
+				record__mmap_read_sync(rec);
 				pr_err("AUX area tracing snapshot failed\n");
 				err = -1;
 				goto out_child;
@@ -1084,6 +1257,8 @@ static int __cmd_record(struct record *rec, int argc, const char **argv)
 			 */
 			if (rec->evlist->bkw_mmap_state == BKW_MMAP_RUNNING)
 				continue;
+
+			record__mmap_read_sync(rec);
 			trigger_ready(&switch_output_trigger);
 
 			/*
@@ -1137,6 +1312,7 @@ static int __cmd_record(struct record *rec, int argc, const char **argv)
 			disabled = true;
 		}
 	}
+	record__mmap_read_sync(rec);
 	trigger_off(&auxtrace_snapshot_trigger);
 	trigger_off(&switch_output_trigger);
 
diff --git a/tools/perf/util/mmap.c b/tools/perf/util/mmap.c
index 384d17cd1379..1d57d8387caf 100644
--- a/tools/perf/util/mmap.c
+++ b/tools/perf/util/mmap.c
@@ -332,12 +332,12 @@ int perf_mmap__read_init(struct perf_mmap *map)
 	return __perf_mmap__read_init(map);
 }
 
-int perf_mmap__push(struct perf_mmap *md, void *to,
-		    int push(void *to, void *buf, size_t size))
+int perf_mmap__push(struct perf_mmap *md, void *to, int idx,
+		    int push(void *to, struct aiocb *cblock, void *data, size_t size))
 {
 	u64 head = perf_mmap__read_head(md);
 	unsigned char *data = md->base + page_size;
-	unsigned long size;
+	unsigned long size, size0 = 0;
 	void *buf;
 	int rc = 0;
 
@@ -345,31 +345,58 @@ int perf_mmap__push(struct perf_mmap *md, void *to,
 	if (rc < 0)
 		return (rc == -EAGAIN) ? 0 : -1;
 
+	/* md->base data is copied into md->data[idx] buffer to
+	 * release space in the kernel buffer as fast as possible,
+	 * thru perf_mmap__consume() below.
+	 *
+	 * That lets the kernel to proceed with storing more
+	 * profiling data into the kernel buffer earlier than other
+	 * per-cpu kernel buffers are handled.
+	 *
+	 * Coping can be done in two steps in case the chunk of
+	 * profiling data crosses the upper bound of the kernel buffer.
+	 * In this case we first move part of data from md->start
+	 * till the upper bound and then the reminder from the
+	 * beginning of the kernel buffer till the end of
+	 * the data chunk.
+	 */
+
 	size = md->end - md->start;
 
 	if ((md->start & md->mask) + size != (md->end & md->mask)) {
 		buf = &data[md->start & md->mask];
-		size = md->mask + 1 - (md->start & md->mask);
-		md->start += size;
-
-		if (push(to, buf, size) < 0) {
-			rc = -1;
-			goto out;
-		}
+		size0 = md->mask + 1 - (md->start & md->mask);
+		md->start += size0;
+		memcpy(md->data[idx], buf, size0);
 	}
 
 	buf = &data[md->start & md->mask];
 	size = md->end - md->start;
 	md->start += size;
+	memcpy(md->data[idx] + size0, buf, size);
 
-	if (push(to, buf, size) < 0) {
-		rc = -1;
-		goto out;
-	}
+	/* Increment md->refcount to guard md->data[idx] buffer
+	 * from premature deallocation because md object can be
+	 * released earlier than aio write request started
+	 * on mmap->data[idx] is complete.
+	 *
+	 * perf_mmap__put() is done at record__aio_sync() or
+	 * record__aio_complete() after started request completion.
+	 */
+
+	perf_mmap__get(md);
 
 	md->prev = head;
 	perf_mmap__consume(md);
-out:
+
+	rc = push(to, md->cblocks[idx], md->data[idx], size0 + size);
+	if (rc) {
+		/* Decrement md->refcount back if aio write
+		 * operation failed to start.
+		 */
+		perf_mmap__put(md);
+	}
+
 	return rc;
 }
 
diff --git a/tools/perf/util/mmap.h b/tools/perf/util/mmap.h
index 4a9bb0ecae4f..9a106c075172 100644
--- a/tools/perf/util/mmap.h
+++ b/tools/perf/util/mmap.h
@@ -95,8 +95,8 @@ union perf_event *perf_mmap__read_forward(struct perf_mmap *map);
 
 union perf_event *perf_mmap__read_event(struct perf_mmap *map);
 
-int perf_mmap__push(struct perf_mmap *md, void *to,
-		    int push(void *to, void *buf, size_t size));
+int perf_mmap__push(struct perf_mmap *md, void *to, int aio_idx,
+		    int push(void *to, struct aiocb *cblock, void *data, size_t size));
 
 size_t perf_mmap__mmap_len(struct perf_mmap *map);


^ permalink raw reply related	[flat|nested] 23+ messages in thread

* Re: [PATCH v7 0/2]: perf: reduce data loss when profiling highly parallel CPU bound workloads
  2018-09-05  7:16 [PATCH v7 0/2]: perf: reduce data loss when profiling highly parallel CPU bound workloads Alexey Budankov
  2018-09-05  7:19 ` [PATCH v7 1/2]: perf util: map data buffer for preserving collected data Alexey Budankov
  2018-09-05  7:39 ` [PATCH v7 2/2]: perf record: enable asynchronous trace writing Alexey Budankov
@ 2018-09-05 11:28 ` Jiri Olsa
  2018-09-05 17:37   ` Alexey Budankov
  2018-09-06  6:57   ` Alexey Budankov
  2 siblings, 2 replies; 23+ messages in thread
From: Jiri Olsa @ 2018-09-05 11:28 UTC (permalink / raw)
  To: Alexey Budankov
  Cc: Ingo Molnar, Peter Zijlstra, Arnaldo Carvalho de Melo,
	Alexander Shishkin, Namhyung Kim, Andi Kleen, linux-kernel

On Wed, Sep 05, 2018 at 10:16:42AM +0300, Alexey Budankov wrote:
> 
> Currently in record mode the tool implements trace writing serially. 
> The algorithm loops over mapped per-cpu data buffers and stores 
> ready data chunks into a trace file using write() system call.
> 
> At some circumstances the kernel may lack free space in a buffer 
> because the other buffer's half is not yet written to disk due to 
> some other buffer's data writing by the tool at the moment.
> 
> Thus serial trace writing implementation may cause the kernel 
> to loose profiling data and that is what observed when profiling 
> highly parallel CPU bound workloads on machines with big number 
> of cores.
> 
> Experiment with profiling matrix multiplication code executing 128 
> threads on Intel Xeon Phi (KNM) with 272 cores, like below,
> demonstrates data loss metrics value of 98%:
> 
> /usr/bin/time perf record -o /tmp/perf-ser.data -a -N -B -T -R -g \
>     --call-graph dwarf,1024 --user-regs=IP,SP,BP \
>     --switch-events -e cycles,instructions,ref-cycles,software/period=1,name=cs,config=0x3/Duk -- \
>     matrix.gcc
> 
> Data loss metrics is the ratio lost_time/elapsed_time where 
> lost_time is the sum of time intervals containing PERF_RECORD_LOST 
> records and elapsed_time is the elapsed application run time 
> under profiling.
> 
> Applying asynchronous trace streaming thru Posix AIO API
> (http://man7.org/linux/man-pages/man7/aio.7.html) 
> lowers data loss metrics value providing 2x improvement -
> lowering 98% loss to almost 0%.
> 
> ---
>  Alexey Budankov (2):
>         perf util: map data buffer for preserving collected data
> 	perf record: enable asynchronous trace writing
>  
>  tools/perf/builtin-record.c | 197 +++++++++++++++++++++++++++++++++++++++++++-
>  tools/perf/perf.h           |   1 +
>  tools/perf/util/evlist.c    |   7 +-
>  tools/perf/util/evlist.h    |   3 +-
>  tools/perf/util/mmap.c      | 110 +++++++++++++++++++++----
>  tools/perf/util/mmap.h      |  10 ++-
>  6 files changed, 302 insertions(+), 26 deletions(-)
> 
> ---
>  Changes in v7:
>  - implemented handling record.aio setting from perfconfig file

can't apply this version on Arnaldo's perf/core...

[jolsa@krava linux-perf]$ git am /tmp/ab/
Applying: perf util: map data buffer for preserving collected data
error: patch failed: tools/perf/util/mmap.c:166
error: tools/perf/util/mmap.c: patch does not apply

thanks,
jirka

^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: [PATCH v7 0/2]: perf: reduce data loss when profiling highly parallel CPU bound workloads
  2018-09-05 11:28 ` [PATCH v7 0/2]: perf: reduce data loss when profiling highly parallel CPU bound workloads Jiri Olsa
@ 2018-09-05 17:37   ` Alexey Budankov
  2018-09-05 18:51     ` Arnaldo Carvalho de Melo
  2018-09-06  6:57   ` Alexey Budankov
  1 sibling, 1 reply; 23+ messages in thread
From: Alexey Budankov @ 2018-09-05 17:37 UTC (permalink / raw)
  To: Jiri Olsa
  Cc: Ingo Molnar, Peter Zijlstra, Arnaldo Carvalho de Melo,
	Alexander Shishkin, Namhyung Kim, Andi Kleen, linux-kernel

Hi,

On 05.09.2018 14:28, Jiri Olsa wrote:
> On Wed, Sep 05, 2018 at 10:16:42AM +0300, Alexey Budankov wrote:
>>
>> Currently in record mode the tool implements trace writing serially. 
>> The algorithm loops over mapped per-cpu data buffers and stores 
>> ready data chunks into a trace file using write() system call.
>>
>> At some circumstances the kernel may lack free space in a buffer 
>> because the other buffer's half is not yet written to disk due to 
>> some other buffer's data writing by the tool at the moment.
>>
>> Thus serial trace writing implementation may cause the kernel 
>> to loose profiling data and that is what observed when profiling 
>> highly parallel CPU bound workloads on machines with big number 
>> of cores.
>>
>> Experiment with profiling matrix multiplication code executing 128 
>> threads on Intel Xeon Phi (KNM) with 272 cores, like below,
>> demonstrates data loss metrics value of 98%:
>>
>> /usr/bin/time perf record -o /tmp/perf-ser.data -a -N -B -T -R -g \
>>     --call-graph dwarf,1024 --user-regs=IP,SP,BP \
>>     --switch-events -e cycles,instructions,ref-cycles,software/period=1,name=cs,config=0x3/Duk -- \
>>     matrix.gcc
>>
>> Data loss metrics is the ratio lost_time/elapsed_time where 
>> lost_time is the sum of time intervals containing PERF_RECORD_LOST 
>> records and elapsed_time is the elapsed application run time 
>> under profiling.
>>
>> Applying asynchronous trace streaming thru Posix AIO API
>> (http://man7.org/linux/man-pages/man7/aio.7.html) 
>> lowers data loss metrics value providing 2x improvement -
>> lowering 98% loss to almost 0%.
>>
>> ---
>>  Alexey Budankov (2):
>>         perf util: map data buffer for preserving collected data
>> 	perf record: enable asynchronous trace writing
>>  
>>  tools/perf/builtin-record.c | 197 +++++++++++++++++++++++++++++++++++++++++++-
>>  tools/perf/perf.h           |   1 +
>>  tools/perf/util/evlist.c    |   7 +-
>>  tools/perf/util/evlist.h    |   3 +-
>>  tools/perf/util/mmap.c      | 110 +++++++++++++++++++++----
>>  tools/perf/util/mmap.h      |  10 ++-
>>  6 files changed, 302 insertions(+), 26 deletions(-)
>>
>> ---
>>  Changes in v7:
>>  - implemented handling record.aio setting from perfconfig file
> 
> can't apply this version on Arnaldo's perf/core...

my git remote -v

origin	git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip (fetch)
origin	git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip (push)

branch is perf/core, according to MAINTAINERS content.

What is Arnaldo's perf/core? This one?

git://git.kernel.org/pub/scm/linux/kernel/git/acme/linux

and branch is perf/core?

> 
> [jolsa@krava linux-perf]$ git am /tmp/ab/
> Applying: perf util: map data buffer for preserving collected data
> error: patch failed: tools/perf/util/mmap.c:166
> error: tools/perf/util/mmap.c: patch does not apply
> 
> thanks,
> jirka
> 

^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: [PATCH v7 0/2]: perf: reduce data loss when profiling highly parallel CPU bound workloads
  2018-09-05 17:37   ` Alexey Budankov
@ 2018-09-05 18:51     ` Arnaldo Carvalho de Melo
  2018-09-06  6:03       ` Alexey Budankov
  2018-09-06  6:59       ` Alexey Budankov
  0 siblings, 2 replies; 23+ messages in thread
From: Arnaldo Carvalho de Melo @ 2018-09-05 18:51 UTC (permalink / raw)
  To: Alexey Budankov
  Cc: Jiri Olsa, Ingo Molnar, Peter Zijlstra, Alexander Shishkin,
	Namhyung Kim, Andi Kleen, linux-kernel

Em Wed, Sep 05, 2018 at 08:37:32PM +0300, Alexey Budankov escreveu:
> On 05.09.2018 14:28, Jiri Olsa wrote:
> > can't apply this version on Arnaldo's perf/core...
 
> my git remote -v
 
> origin	git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip (fetch)
> origin	git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip (push)
 
> branch is perf/core, according to MAINTAINERS content.
 
> What is Arnaldo's perf/core? This one?
 
> git://git.kernel.org/pub/scm/linux/kernel/git/acme/linux
 
> and branch is perf/core?

yes.

- Arnaldo

^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: [PATCH v7 0/2]: perf: reduce data loss when profiling highly parallel CPU bound workloads
  2018-09-05 18:51     ` Arnaldo Carvalho de Melo
@ 2018-09-06  6:03       ` Alexey Budankov
  2018-09-06  8:14         ` Jiri Olsa
  2018-09-06  6:59       ` Alexey Budankov
  1 sibling, 1 reply; 23+ messages in thread
From: Alexey Budankov @ 2018-09-06  6:03 UTC (permalink / raw)
  To: Arnaldo Carvalho de Melo
  Cc: Jiri Olsa, Ingo Molnar, Peter Zijlstra, Alexander Shishkin,
	Namhyung Kim, Andi Kleen, linux-kernel

Hi,

On 05.09.2018 21:51, Arnaldo Carvalho de Melo wrote:
> Em Wed, Sep 05, 2018 at 08:37:32PM +0300, Alexey Budankov escreveu:
>> On 05.09.2018 14:28, Jiri Olsa wrote:
>>> can't apply this version on Arnaldo's perf/core...
>  
>> my git remote -v
>  
>> origin	git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip (fetch)
>> origin	git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip (push)
>  
>> branch is perf/core, according to MAINTAINERS content.
>  
>> What is Arnaldo's perf/core? This one?
>  
>> git://git.kernel.org/pub/scm/linux/kernel/git/acme/linux
>  
>> and branch is perf/core?
> 
> yes.

Is this the actual one for the tool development? Am I missing something?

Thanks,
Alexey

> 
> - Arnaldo
> 

^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: [PATCH v7 0/2]: perf: reduce data loss when profiling highly parallel CPU bound workloads
  2018-09-05 11:28 ` [PATCH v7 0/2]: perf: reduce data loss when profiling highly parallel CPU bound workloads Jiri Olsa
  2018-09-05 17:37   ` Alexey Budankov
@ 2018-09-06  6:57   ` Alexey Budankov
  1 sibling, 0 replies; 23+ messages in thread
From: Alexey Budankov @ 2018-09-06  6:57 UTC (permalink / raw)
  To: Jiri Olsa
  Cc: Ingo Molnar, Peter Zijlstra, Arnaldo Carvalho de Melo,
	Alexander Shishkin, Namhyung Kim, Andi Kleen, linux-kernel

Hi,

On 05.09.2018 14:28, Jiri Olsa wrote:
> On Wed, Sep 05, 2018 at 10:16:42AM +0300, Alexey Budankov wrote:
>>
>> Currently in record mode the tool implements trace writing serially. 
>> The algorithm loops over mapped per-cpu data buffers and stores 
>> ready data chunks into a trace file using write() system call.
>>
>> At some circumstances the kernel may lack free space in a buffer 
>> because the other buffer's half is not yet written to disk due to 
>> some other buffer's data writing by the tool at the moment.
>>
>> Thus serial trace writing implementation may cause the kernel 
>> to loose profiling data and that is what observed when profiling 
>> highly parallel CPU bound workloads on machines with big number 
>> of cores.
>>
>> Experiment with profiling matrix multiplication code executing 128 
>> threads on Intel Xeon Phi (KNM) with 272 cores, like below,
>> demonstrates data loss metrics value of 98%:
>>
>> /usr/bin/time perf record -o /tmp/perf-ser.data -a -N -B -T -R -g \
>>     --call-graph dwarf,1024 --user-regs=IP,SP,BP \
>>     --switch-events -e cycles,instructions,ref-cycles,software/period=1,name=cs,config=0x3/Duk -- \
>>     matrix.gcc
>>
>> Data loss metrics is the ratio lost_time/elapsed_time where 
>> lost_time is the sum of time intervals containing PERF_RECORD_LOST 
>> records and elapsed_time is the elapsed application run time 
>> under profiling.
>>
>> Applying asynchronous trace streaming thru Posix AIO API
>> (http://man7.org/linux/man-pages/man7/aio.7.html) 
>> lowers data loss metrics value providing 2x improvement -
>> lowering 98% loss to almost 0%.
>>
>> ---
>>  Alexey Budankov (2):
>>         perf util: map data buffer for preserving collected data
>> 	perf record: enable asynchronous trace writing
>>  
>>  tools/perf/builtin-record.c | 197 +++++++++++++++++++++++++++++++++++++++++++-
>>  tools/perf/perf.h           |   1 +
>>  tools/perf/util/evlist.c    |   7 +-
>>  tools/perf/util/evlist.h    |   3 +-
>>  tools/perf/util/mmap.c      | 110 +++++++++++++++++++++----
>>  tools/perf/util/mmap.h      |  10 ++-
>>  6 files changed, 302 insertions(+), 26 deletions(-)
>>
>> ---
>>  Changes in v7:
>>  - implemented handling record.aio setting from perfconfig file
> 
> can't apply this version on Arnaldo's perf/core...
> 
> [jolsa@krava linux-perf]$ git am /tmp/ab/
> Applying: perf util: map data buffer for preserving collected data
> error: patch failed: tools/perf/util/mmap.c:166
> error: tools/perf/util/mmap.c: patch does not apply

Here it is:

 tools/perf/builtin-record.c | 197 +++++++++++++++++++++++++++++++++++++++++++-
 tools/perf/perf.h           |   1 +
 tools/perf/util/evlist.c    |   7 +-
 tools/perf/util/evlist.h    |   3 +-
 tools/perf/util/mmap.c      | 110 +++++++++++++++++++++----
 tools/perf/util/mmap.h      |  10 ++-
 6 files changed, 302 insertions(+), 26 deletions(-)

diff --git a/tools/perf/builtin-record.c b/tools/perf/builtin-record.c
index 9853552bcf16..0f1324549e35 100644
--- a/tools/perf/builtin-record.c
+++ b/tools/perf/builtin-record.c
@@ -121,6 +121,91 @@ static int record__write(struct record *rec, void *bf, size_t size)
 	return 0;
 }
 
+static int record__aio_write(struct aiocb *cblock, int trace_fd,
+		void *buf, size_t size, off_t off)
+{
+	int rc;
+
+	cblock->aio_fildes = trace_fd;
+	cblock->aio_buf    = buf;
+	cblock->aio_nbytes = size;
+	cblock->aio_offset = off;
+	cblock->aio_sigevent.sigev_notify = SIGEV_NONE;
+
+	do {
+		rc = aio_write(cblock);
+		if (rc == 0) {
+			break;
+		} else if (errno != EAGAIN) {
+			cblock->aio_fildes = -1;
+			pr_err("failed to queue perf data, error: %m\n");
+			break;
+		}
+	} while (1);
+
+	return rc;
+}
+
+static int record__aio_sync(struct perf_mmap *md)
+{
+	void *rem_buf;
+	off_t rem_off;
+	size_t rem_size;
+	ssize_t aio_ret, written;
+	int i, aio_errno, do_suspend, idx = -1;
+	struct aiocb **cblocks = md->cblocks;
+	struct timespec timeout = { 0, 1000 * 1000  * 1 }; // 1ms
+
+	for (i = 0; i < md->nr_cblocks; ++i)
+		if (cblocks[i]->aio_fildes == -1)
+			return i;
+
+	do {
+		do_suspend = 0;
+		if (aio_suspend((const struct aiocb**)cblocks, md->nr_cblocks, &timeout)) {
+			if (!(errno == EAGAIN || errno == EINTR))
+				pr_err("failed to sync perf data, error: %m\n");
+			do_suspend = 1;
+			continue;
+		}
+		for (i = 0; i < md->nr_cblocks; ++i) {
+			aio_errno = aio_error(cblocks[i]);
+			if (aio_errno == EINPROGRESS)
+				continue;
+			written = aio_ret = aio_return(cblocks[i]);
+			if (aio_ret < 0) {
+				if (aio_errno != EINTR)
+					pr_err("failed to write perf data, error: %m\n");
+				written = 0;
+			}
+			rem_size = cblocks[i]->aio_nbytes - written;
+			if (rem_size == 0) {
+				cblocks[i]->aio_fildes = -1;
+				/* md->refcount is incremented in perf_mmap__push() for
+				 * every enqueued aio write request so decrement it because
+				 * the request is now complete.
+				 */
+				perf_mmap__put(md);
+				if (idx == -1)
+					idx = i;
+			} else {
+				/* aio write request may require restart with the
+				 * reminder if the kernel didn't write whole
+				 * chunk at once.
+				 */
+				rem_off = cblocks[i]->aio_offset + written;
+				rem_buf = (void*)(cblocks[i]->aio_buf + written);
+				record__aio_write(cblocks[i], cblocks[i]->aio_fildes,
+						rem_buf, rem_size, rem_off);
+			}
+		}
+		if (idx == -1)
+			do_suspend = 1;
+	} while (do_suspend);
+
+	return idx;
+}
+
 static int process_synthesized_event(struct perf_tool *tool,
 				     union perf_event *event,
 				     struct perf_sample *sample __maybe_unused,
@@ -130,12 +215,28 @@ static int process_synthesized_event(struct perf_tool *tool,
 	return record__write(rec, event, event->header.size);
 }
 
-static int record__pushfn(void *to, void *bf, size_t size)
+static int record__pushfn(void *to, struct aiocb *cblock, void *data, size_t size)
 {
+	off_t off;
 	struct record *rec = to;
+	int ret, trace_fd = rec->session->data->file.fd;
 
 	rec->samples++;
-	return record__write(rec, bf, size);
+
+	off =
+	lseek(trace_fd, 0, SEEK_CUR);
+	lseek(trace_fd, off + size, SEEK_SET);
+
+	ret = record__aio_write(cblock, trace_fd, data, size, off);
+	if (!ret) {
+		rec->bytes_written += size;
+		if (switch_output_size(rec))
+			trigger_hit(&switch_output_trigger);
+	} else {
+		lseek(trace_fd, off, SEEK_SET);
+	}
+
+	return ret;
 }
 
 static volatile int done;
@@ -326,7 +427,8 @@ static int record__mmap_evlist(struct record *rec,
 
 	if (perf_evlist__mmap_ex(evlist, opts->mmap_pages,
 				 opts->auxtrace_mmap_pages,
-				 opts->auxtrace_snapshot_mode) < 0) {
+				 opts->auxtrace_snapshot_mode,
+				 opts->nr_cblocks) < 0) {
 		if (errno == EPERM) {
 			pr_err("Permission error mapping pages.\n"
 			       "Consider increasing "
@@ -510,6 +612,70 @@ static struct perf_event_header finished_round_event = {
 	.type = PERF_RECORD_FINISHED_ROUND,
 };
 
+static void record__aio_complete(struct perf_mmap *md, int i)
+{
+	int aio_errno;
+	void *rem_buf;
+	off_t rem_off;
+	size_t rem_size;
+	ssize_t aio_ret, written;
+	struct aiocb *cblock = md->cblocks[i];
+	struct timespec timeout = { 0, 1000 * 1000  * 1 }; // 1ms
+
+	if (cblock->aio_fildes == -1)
+		return;
+
+	do {
+		if (aio_suspend((const struct aiocb**)&cblock, 1, &timeout)) {
+			if (!(errno == EAGAIN || errno == EINTR))
+				pr_err("failed to sync perf data, error: %m\n");
+			continue;
+		}
+		aio_errno = aio_error(cblock);
+		if (aio_errno == EINPROGRESS)
+			continue;
+		written = aio_ret = aio_return(cblock);
+		if (aio_ret < 0) {
+			if (!(aio_errno == EINTR))
+				pr_err("failed to write perf data, error: %m\n");
+			written = 0;
+		}
+		rem_size = cblock->aio_nbytes - written;
+		if (rem_size == 0) {
+			cblock->aio_fildes = -1;
+			/* md->refcount is incremented in perf_mmap__push() for
+			 * every enqueued aio write request so decrement it because
+			 * the request is now complete.
+			 */
+			perf_mmap__put(md);
+			return;
+		} else {
+			/* aio write request may require restart with the
+			 * reminder if the kernel didn't write whole
+			 * chunk at once.
+			 */
+			rem_off = cblock->aio_offset + written;
+			rem_buf = (void*)(cblock->aio_buf + written);
+			record__aio_write(cblock, cblock->aio_fildes,
+					rem_buf, rem_size, rem_off);
+		}
+	} while (1);
+}
+
+static void record__mmap_read_sync(struct record *rec)
+{
+	int i, j;
+	struct perf_evlist *evlist = rec->evlist;
+	struct perf_mmap *maps = evlist->mmap;
+
+	for (i = 0; i < evlist->nr_mmaps; i++) {
+		struct perf_mmap *map = &maps[i];
+		if (map->base)
+			for (j = 0; j < map->nr_cblocks; ++j)
+				record__aio_complete(map, j);
+	}
+}
+
 static int record__mmap_read_evlist(struct record *rec, struct perf_evlist *evlist,
 				    bool overwrite)
 {
@@ -532,7 +698,13 @@ static int record__mmap_read_evlist(struct record *rec, struct perf_evlist *evli
 		struct auxtrace_mmap *mm = &maps[i].auxtrace_mmap;
 
 		if (maps[i].base) {
-			if (perf_mmap__push(&maps[i], rec, record__pushfn) != 0) {
+			/* Call record__aio_sync() to allocate free
+			 * map->data buffer or wait till one of the
+			 * buffers becomes available after previous
+			 * aio write request.
+			 */
+			if (perf_mmap__push(&maps[i], rec, record__aio_sync(&maps[i]),
+				record__pushfn) != 0) {
 				rc = -1;
 				goto out;
 			}
@@ -1054,6 +1226,7 @@ static int __cmd_record(struct record *rec, int argc, const char **argv)
 			perf_evlist__toggle_bkw_mmap(rec->evlist, BKW_MMAP_DATA_PENDING);
 
 		if (record__mmap_read_all(rec) < 0) {
+			record__mmap_read_sync(rec);
 			trigger_error(&auxtrace_snapshot_trigger);
 			trigger_error(&switch_output_trigger);
 			err = -1;
@@ -1065,6 +1238,7 @@ static int __cmd_record(struct record *rec, int argc, const char **argv)
 			if (!trigger_is_error(&auxtrace_snapshot_trigger))
 				record__read_auxtrace_snapshot(rec);
 			if (trigger_is_error(&auxtrace_snapshot_trigger)) {
+				record__mmap_read_sync(rec);
 				pr_err("AUX area tracing snapshot failed\n");
 				err = -1;
 				goto out_child;
@@ -1083,6 +1257,8 @@ static int __cmd_record(struct record *rec, int argc, const char **argv)
 			 */
 			if (rec->evlist->bkw_mmap_state == BKW_MMAP_RUNNING)
 				continue;
+
+			record__mmap_read_sync(rec);
 			trigger_ready(&switch_output_trigger);
 
 			/*
@@ -1136,6 +1312,7 @@ static int __cmd_record(struct record *rec, int argc, const char **argv)
 			disabled = true;
 		}
 	}
+	record__mmap_read_sync(rec);
 	trigger_off(&auxtrace_snapshot_trigger);
 	trigger_off(&switch_output_trigger);
 
@@ -1287,6 +1464,8 @@ static int perf_record_config(const char *var, const char *value, void *cb)
 		var = "call-graph.record-mode";
 		return perf_default_config(var, value, cb);
 	}
+	if (!strcmp(var, "record.aio"))
+		rec->opts.nr_cblocks = strtol(value, NULL, 0);
 
 	return 0;
 }
@@ -1519,6 +1698,7 @@ static struct record record = {
 			.default_per_cpu = true,
 		},
 		.proc_map_timeout     = 500,
+		.nr_cblocks	      = 2
 	},
 	.tool = {
 		.sample		= process_sample_event,
@@ -1678,6 +1858,8 @@ static struct option __record_options[] = {
 			  "signal"),
 	OPT_BOOLEAN(0, "dry-run", &dry_run,
 		    "Parse options then exit"),
+	OPT_INTEGER(0, "aio", &record.opts.nr_cblocks,
+		    "asynchronous trace write operations (min: 1, max: 32, default: 2)"),
 	OPT_END()
 };
 
@@ -1870,6 +2052,13 @@ int cmd_record(int argc, const char **argv)
 		goto out;
 	}
 
+	if (!(1 <= rec->opts.nr_cblocks && rec->opts.nr_cblocks <= 32))
+		rec->opts.nr_cblocks = 2;
+
+	if (verbose > 0)
+		pr_info("AIO trace writes: %d\n", rec->opts.nr_cblocks);
+
+
 	err = __cmd_record(&record, argc, argv);
 out:
 	perf_evlist__delete(rec->evlist);
diff --git a/tools/perf/perf.h b/tools/perf/perf.h
index 21bf7f5a3cf5..0a1ae2ae567a 100644
--- a/tools/perf/perf.h
+++ b/tools/perf/perf.h
@@ -82,6 +82,7 @@ struct record_opts {
 	bool         use_clockid;
 	clockid_t    clockid;
 	unsigned int proc_map_timeout;
+	int	     nr_cblocks;
 };
 
 struct option;
diff --git a/tools/perf/util/evlist.c b/tools/perf/util/evlist.c
index be440df29615..2c8c0270886b 100644
--- a/tools/perf/util/evlist.c
+++ b/tools/perf/util/evlist.c
@@ -1018,7 +1018,8 @@ int perf_evlist__parse_mmap_pages(const struct option *opt, const char *str,
  */
 int perf_evlist__mmap_ex(struct perf_evlist *evlist, unsigned int pages,
 			 unsigned int auxtrace_pages,
-			 bool auxtrace_overwrite)
+			 bool auxtrace_overwrite,
+			 int nr_cblocks)
 {
 	struct perf_evsel *evsel;
 	const struct cpu_map *cpus = evlist->cpus;
@@ -1028,7 +1029,7 @@ int perf_evlist__mmap_ex(struct perf_evlist *evlist, unsigned int pages,
 	 * Its value is decided by evsel's write_backward.
 	 * So &mp should not be passed through const pointer.
 	 */
-	struct mmap_params mp;
+	struct mmap_params mp = { .nr_cblocks = nr_cblocks };
 
 	if (!evlist->mmap)
 		evlist->mmap = perf_evlist__alloc_mmap(evlist, false);
@@ -1060,7 +1061,7 @@ int perf_evlist__mmap_ex(struct perf_evlist *evlist, unsigned int pages,
 
 int perf_evlist__mmap(struct perf_evlist *evlist, unsigned int pages)
 {
-	return perf_evlist__mmap_ex(evlist, pages, 0, false);
+	return perf_evlist__mmap_ex(evlist, pages, 0, false, 2);
 }
 
 int perf_evlist__create_maps(struct perf_evlist *evlist, struct target *target)
diff --git a/tools/perf/util/evlist.h b/tools/perf/util/evlist.h
index dc66436add98..a94d3c613254 100644
--- a/tools/perf/util/evlist.h
+++ b/tools/perf/util/evlist.h
@@ -162,7 +162,8 @@ unsigned long perf_event_mlock_kb_in_pages(void);
 
 int perf_evlist__mmap_ex(struct perf_evlist *evlist, unsigned int pages,
 			 unsigned int auxtrace_pages,
-			 bool auxtrace_overwrite);
+			 bool auxtrace_overwrite,
+			 int nr_cblocks);
 int perf_evlist__mmap(struct perf_evlist *evlist, unsigned int pages);
 void perf_evlist__munmap(struct perf_evlist *evlist);
 
diff --git a/tools/perf/util/mmap.c b/tools/perf/util/mmap.c
index 215f69f41672..a2c8ead9344c 100644
--- a/tools/perf/util/mmap.c
+++ b/tools/perf/util/mmap.c
@@ -155,6 +155,14 @@ void __weak auxtrace_mmap_params__set_idx(struct auxtrace_mmap_params *mp __mayb
 
 void perf_mmap__munmap(struct perf_mmap *map)
 {
+	int i;
+	if (map->data) {
+		for (i = 0; i < map->nr_cblocks; ++i)
+			zfree(&(map->data[i]));
+		zfree(&(map->data));
+	}
+	if (map->cblocks)
+		zfree(&(map->cblocks));
 	if (map->base != NULL) {
 		munmap(map->base, perf_mmap__mmap_len(map));
 		map->base = NULL;
@@ -166,6 +174,7 @@ void perf_mmap__munmap(struct perf_mmap *map)
 
 int perf_mmap__mmap(struct perf_mmap *map, struct mmap_params *mp, int fd, int cpu)
 {
+	int i;
 	/*
 	 * The last one will be done at perf_mmap__consume(), so that we
 	 * make sure we don't prevent tools from consuming every last event in
@@ -190,6 +199,50 @@ int perf_mmap__mmap(struct perf_mmap *map, struct mmap_params *mp, int fd, int c
 		map->base = NULL;
 		return -1;
 	}
+	map->nr_cblocks = mp->nr_cblocks;
+	map->cblocks = calloc(map->nr_cblocks, sizeof(struct aiocb*));
+	if (!map->cblocks) {
+		pr_debug2("failed to allocate perf event data buffers, error %d\n",
+				errno);
+		return -1;
+	}
+	map->data = calloc(map->nr_cblocks, sizeof(void*));
+	if (map->data) {
+		int delta_max = sysconf(_SC_AIO_PRIO_DELTA_MAX);
+		for (i = 0; i < map->nr_cblocks; ++i) {
+			map->data[i] = malloc(perf_mmap__mmap_len(map));
+			if (map->data[i]) {
+				int prio;
+				unsigned char *data = map->data[i];
+				map->cblocks[i] = (struct aiocb *)&data[map->mask + 1];
+				memset(map->cblocks[i], 0, sizeof(struct aiocb));
+				/* Use cblock.aio_fildes value different from -1
+				 * to denote started aio write operation on the
+				 * cblock so it requires explicit record__aio_sync()
+				 * call prior the cblock may be reused again.
+				 */
+				map->cblocks[i]->aio_fildes = -1;
+				/* Allocate cblocks with decreasing priority to
+				 * have faster aio_write() calls because queued
+				 * requests are kept in separate per-prio queues
+				 * and adding a new request iterates thru shorter
+				 * per-prio list.
+				 */
+				prio = delta_max - i;
+				if (prio < 0)
+					prio = 0;
+				map->cblocks[i]->aio_reqprio = prio;
+			} else {
+				pr_debug2("failed to allocate perf event data buffer, error %d\n",
+						errno);
+				return -1;
+			}
+		}
+	} else {
+		pr_debug2("failed to alloc perf event data buffers, error %d\n",
+				errno);
+		return -1;
+	}
 	map->fd = fd;
 	map->cpu = cpu;
 
@@ -280,12 +333,12 @@ int perf_mmap__read_init(struct perf_mmap *map)
 	return __perf_mmap__read_init(map);
 }
 
-int perf_mmap__push(struct perf_mmap *md, void *to,
-		    int push(void *to, void *buf, size_t size))
+int perf_mmap__push(struct perf_mmap *md, void *to, int idx,
+		    int push(void *to, struct aiocb *cblock, void *data, size_t size))
 {
 	u64 head = perf_mmap__read_head(md);
 	unsigned char *data = md->base + page_size;
-	unsigned long size;
+	unsigned long size, size0 = 0;
 	void *buf;
 	int rc = 0;
 
@@ -293,31 +346,58 @@ int perf_mmap__push(struct perf_mmap *md, void *to,
 	if (rc < 0)
 		return (rc == -EAGAIN) ? 0 : -1;
 
+	/* md->base data is copied into md->data[idx] buffer to
+	 * release space in the kernel buffer as fast as possible,
+	 * thru perf_mmap__consume() below.
+	 *
+	 * That lets the kernel to proceed with storing more
+	 * profiling data into the kernel buffer earlier than other
+	 * per-cpu kernel buffers are handled.
+	 *
+	 * Coping can be done in two steps in case the chunk of
+	 * profiling data crosses the upper bound of the kernel buffer.
+	 * In this case we first move part of data from md->start
+	 * till the upper bound and then the reminder from the
+	 * beginning of the kernel buffer till the end of
+	 * the data chunk.
+	 */
+
 	size = md->end - md->start;
 
 	if ((md->start & md->mask) + size != (md->end & md->mask)) {
 		buf = &data[md->start & md->mask];
-		size = md->mask + 1 - (md->start & md->mask);
-		md->start += size;
-
-		if (push(to, buf, size) < 0) {
-			rc = -1;
-			goto out;
-		}
+		size0 = md->mask + 1 - (md->start & md->mask);
+		md->start += size0;
+		memcpy(md->data[idx], buf, size0);
 	}
 
 	buf = &data[md->start & md->mask];
 	size = md->end - md->start;
 	md->start += size;
+	memcpy(md->data[idx] + size0, buf, size);
 
-	if (push(to, buf, size) < 0) {
-		rc = -1;
-		goto out;
-	}
+	/* Increment md->refcount to guard md->data[idx] buffer
+	 * from premature deallocation because md object can be
+	 * released earlier than aio write request started
+	 * on mmap->data[idx] is complete.
+	 *
+	 * perf_mmap__put() is done at record__aio_sync() or
+	 * record__aio_complete() after started request completion.
+	 */
+
+	perf_mmap__get(md);
 
 	md->prev = head;
 	perf_mmap__consume(md);
-out:
+
+	rc = push(to, md->cblocks[idx], md->data[idx], size0 + size);
+	if (rc) {
+		/* Decrement md->refcount back if aio write
+		 * operation failed to start.
+		 */
+		perf_mmap__put(md);
+	}
+
 	return rc;
 }
 
diff --git a/tools/perf/util/mmap.h b/tools/perf/util/mmap.h
index 05a6d47c7956..8fbe5cb6d95b 100644
--- a/tools/perf/util/mmap.h
+++ b/tools/perf/util/mmap.h
@@ -6,6 +6,7 @@
 #include <linux/types.h>
 #include <asm/barrier.h>
 #include <stdbool.h>
+#include <aio.h>
 #include "auxtrace.h"
 #include "event.h"
 
@@ -26,6 +27,9 @@ struct perf_mmap {
 	bool		 overwrite;
 	struct auxtrace_mmap auxtrace_mmap;
 	char		 event_copy[PERF_SAMPLE_MAX_SIZE] __aligned(8);
+	void 		 **data;
+	struct aiocb	 **cblocks;
+	int 		 nr_cblocks;
 };
 
 /*
@@ -57,7 +61,7 @@ enum bkw_mmap_state {
 };
 
 struct mmap_params {
-	int			    prot, mask;
+	int			    prot, mask, nr_cblocks;
 	struct auxtrace_mmap_params auxtrace_mp;
 };
 
@@ -92,8 +96,8 @@ union perf_event *perf_mmap__read_forward(struct perf_mmap *map);
 
 union perf_event *perf_mmap__read_event(struct perf_mmap *map);
 
-int perf_mmap__push(struct perf_mmap *md, void *to,
-		    int push(void *to, void *buf, size_t size));
+int perf_mmap__push(struct perf_mmap *md, void *to, int aio_idx,
+		    int push(void *to, struct aiocb *cblock, void *data, size_t size));
 
 size_t perf_mmap__mmap_len(struct perf_mmap *map);
 

> 
> thanks,
> jirka
> 

^ permalink raw reply related	[flat|nested] 23+ messages in thread

* Re: [PATCH v7 0/2]: perf: reduce data loss when profiling highly parallel CPU bound workloads
  2018-09-05 18:51     ` Arnaldo Carvalho de Melo
  2018-09-06  6:03       ` Alexey Budankov
@ 2018-09-06  6:59       ` Alexey Budankov
  1 sibling, 0 replies; 23+ messages in thread
From: Alexey Budankov @ 2018-09-06  6:59 UTC (permalink / raw)
  To: Arnaldo Carvalho de Melo
  Cc: Jiri Olsa, Ingo Molnar, Peter Zijlstra, Alexander Shishkin,
	Namhyung Kim, Andi Kleen, linux-kernel

Hi,

On 05.09.2018 21:51, Arnaldo Carvalho de Melo wrote:
> Em Wed, Sep 05, 2018 at 08:37:32PM +0300, Alexey Budankov escreveu:
>> On 05.09.2018 14:28, Jiri Olsa wrote:
>>> can't apply this version on Arnaldo's perf/core...
>  
>> my git remote -v
>  
>> origin	git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip (fetch)
>> origin	git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip (push)
>  
>> branch is perf/core, according to MAINTAINERS content.
>  
>> What is Arnaldo's perf/core? This one?
>  
>> git://git.kernel.org/pub/scm/linux/kernel/git/acme/linux
>  
>> and branch is perf/core?
> 
> yes.

See patch on top of this branch in answer to Jiri.

> 
> - Arnaldo
> 

^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: [PATCH v7 0/2]: perf: reduce data loss when profiling highly parallel CPU bound workloads
  2018-09-06  6:03       ` Alexey Budankov
@ 2018-09-06  8:14         ` Jiri Olsa
  2018-09-06  8:20           ` Alexey Budankov
  0 siblings, 1 reply; 23+ messages in thread
From: Jiri Olsa @ 2018-09-06  8:14 UTC (permalink / raw)
  To: Alexey Budankov
  Cc: Arnaldo Carvalho de Melo, Ingo Molnar, Peter Zijlstra,
	Alexander Shishkin, Namhyung Kim, Andi Kleen, linux-kernel

On Thu, Sep 06, 2018 at 09:03:17AM +0300, Alexey Budankov wrote:
> Hi,
> 
> On 05.09.2018 21:51, Arnaldo Carvalho de Melo wrote:
> > Em Wed, Sep 05, 2018 at 08:37:32PM +0300, Alexey Budankov escreveu:
> >> On 05.09.2018 14:28, Jiri Olsa wrote:
> >>> can't apply this version on Arnaldo's perf/core...
> >  
> >> my git remote -v
> >  
> >> origin	git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip (fetch)
> >> origin	git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip (push)
> >  
> >> branch is perf/core, according to MAINTAINERS content.
> >  
> >> What is Arnaldo's perf/core? This one?
> >  
> >> git://git.kernel.org/pub/scm/linux/kernel/git/acme/linux
> >  
> >> and branch is perf/core?
> > 
> > yes.
> 
> Is this the actual one for the tool development? Am I missing something?

yep, Arnaldo takes all and pushes his perf/core to Ingo
it's better to always rebase to it

thanks,
jirka

^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: [PATCH v7 0/2]: perf: reduce data loss when profiling highly parallel CPU bound workloads
  2018-09-06  8:14         ` Jiri Olsa
@ 2018-09-06  8:20           ` Alexey Budankov
  0 siblings, 0 replies; 23+ messages in thread
From: Alexey Budankov @ 2018-09-06  8:20 UTC (permalink / raw)
  To: Jiri Olsa
  Cc: Arnaldo Carvalho de Melo, Ingo Molnar, Peter Zijlstra,
	Alexander Shishkin, Namhyung Kim, Andi Kleen, linux-kernel

Hi,

On 06.09.2018 11:14, Jiri Olsa wrote:
> On Thu, Sep 06, 2018 at 09:03:17AM +0300, Alexey Budankov wrote:
>> Hi,
>>
>> On 05.09.2018 21:51, Arnaldo Carvalho de Melo wrote:
>>> Em Wed, Sep 05, 2018 at 08:37:32PM +0300, Alexey Budankov escreveu:
>>>> On 05.09.2018 14:28, Jiri Olsa wrote:
>>>>> can't apply this version on Arnaldo's perf/core...
>>>  
>>>> my git remote -v
>>>  
>>>> origin	git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip (fetch)
>>>> origin	git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip (push)
>>>  
>>>> branch is perf/core, according to MAINTAINERS content.
>>>  
>>>> What is Arnaldo's perf/core? This one?
>>>  
>>>> git://git.kernel.org/pub/scm/linux/kernel/git/acme/linux
>>>  
>>>> and branch is perf/core?
>>>
>>> yes.
>>
>> Is this the actual one for the tool development? Am I missing something?
> 
> yep, Arnaldo takes all and pushes his perf/core to Ingo
> it's better to always rebase to it

Good to know. Thanks!

> 
> thanks,
> jirka
> 

^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: [PATCH v7 2/2]: perf record: enable asynchronous trace writing
  2018-09-05  7:39 ` [PATCH v7 2/2]: perf record: enable asynchronous trace writing Alexey Budankov
@ 2018-09-06 11:04   ` Jiri Olsa
  2018-09-06 11:57     ` Alexey Budankov
  2018-09-06 11:04   ` Jiri Olsa
                     ` (2 subsequent siblings)
  3 siblings, 1 reply; 23+ messages in thread
From: Jiri Olsa @ 2018-09-06 11:04 UTC (permalink / raw)
  To: Alexey Budankov
  Cc: Ingo Molnar, Peter Zijlstra, Arnaldo Carvalho de Melo,
	Alexander Shishkin, Namhyung Kim, Andi Kleen, linux-kernel

On Wed, Sep 05, 2018 at 10:39:25AM +0300, Alexey Budankov wrote:

SNIP

> -static int record__pushfn(void *to, void *bf, size_t size)
> +static int record__pushfn(void *to, struct aiocb *cblock, void *data, size_t size)
>  {
> +	off_t off;
>  	struct record *rec = to;
> +	int ret, trace_fd = rec->session->data->file.fd;
>  
>  	rec->samples++;
> -	return record__write(rec, bf, size);
> +
> +	off =
> +	lseek(trace_fd, 0, SEEK_CUR);
> +	lseek(trace_fd, off + size, SEEK_SET);
> +
> +	ret = record__aio_write(cblock, trace_fd, data, size, off);
> +	if (!ret) {
> +		rec->bytes_written += size;
> +		if (switch_output_size(rec))
> +			trigger_hit(&switch_output_trigger);

why do you need to call switch output from here?

jirka

^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: [PATCH v7 1/2]: perf util: map data buffer for preserving collected data
  2018-09-05  7:19 ` [PATCH v7 1/2]: perf util: map data buffer for preserving collected data Alexey Budankov
@ 2018-09-06 11:04   ` Jiri Olsa
  2018-09-06 11:50     ` Alexey Budankov
  2018-09-06 11:04   ` Jiri Olsa
  1 sibling, 1 reply; 23+ messages in thread
From: Jiri Olsa @ 2018-09-06 11:04 UTC (permalink / raw)
  To: Alexey Budankov
  Cc: Ingo Molnar, Peter Zijlstra, Arnaldo Carvalho de Melo,
	Alexander Shishkin, Namhyung Kim, Andi Kleen, linux-kernel

On Wed, Sep 05, 2018 at 10:19:56AM +0300, Alexey Budankov wrote:
> 
> The map->data buffers are used to preserve map->base profiling data 
> for writing to disk. AIO map->cblocks are used to queue corresponding 
> map->data buffers for asynchronous writing. map->cblocks objects are 
> located in the last page of every map->data buffer.
> 
> Signed-off-by: Alexey Budankov <alexey.budankov@linux.intel.com>
> ---
>  Changes in v7:
>   - implemented handling record.aio setting from perfconfig file
>  Changes in v6:
>   - adjusted setting of priorities for cblocks;
>  Changes in v5:
>   - reshaped layout of data structures;
>   - implemented --aio option;
>  Changes in v4:
>   - converted mmap()/munmap() to malloc()/free() for mmap->data buffer management 
>  Changes in v2:
>   - converted zalloc() to calloc() for allocation of mmap_aio array,
>   - cleared typo and adjusted fallback branch code;
> ---
>  tools/perf/builtin-record.c | 15 ++++++++++++-
>  tools/perf/perf.h           |  1 +
>  tools/perf/util/evlist.c    |  7 +++---
>  tools/perf/util/evlist.h    |  3 ++-
>  tools/perf/util/mmap.c      | 53 +++++++++++++++++++++++++++++++++++++++++++++
>  tools/perf/util/mmap.h      |  6 ++++-
>  6 files changed, 79 insertions(+), 6 deletions(-)
> 
> diff --git a/tools/perf/builtin-record.c b/tools/perf/builtin-record.c
> index 22ebeb92ac51..f17a6f9cb1ba 100644
> --- a/tools/perf/builtin-record.c
> +++ b/tools/perf/builtin-record.c
> @@ -326,7 +326,8 @@ static int record__mmap_evlist(struct record *rec,
>  
>  	if (perf_evlist__mmap_ex(evlist, opts->mmap_pages,
>  				 opts->auxtrace_mmap_pages,
> -				 opts->auxtrace_snapshot_mode) < 0) {
> +				 opts->auxtrace_snapshot_mode,
> +				 opts->nr_cblocks) < 0) {
>  		if (errno == EPERM) {
>  			pr_err("Permission error mapping pages.\n"
>  			       "Consider increasing "
> @@ -1287,6 +1288,8 @@ static int perf_record_config(const char *var, const char *value, void *cb)
>  		var = "call-graph.record-mode";
>  		return perf_default_config(var, value, cb);
>  	}
> +	if (!strcmp(var, "record.aio"))
> +		rec->opts.nr_cblocks = strtol(value, NULL, 0);
>  
>  	return 0;
>  }
> @@ -1519,6 +1522,7 @@ static struct record record = {
>  			.default_per_cpu = true,
>  		},
>  		.proc_map_timeout     = 500,
> +		.nr_cblocks	      = 2
>  	},
>  	.tool = {
>  		.sample		= process_sample_event,
> @@ -1678,6 +1682,8 @@ static struct option __record_options[] = {
>  			  "signal"),
>  	OPT_BOOLEAN(0, "dry-run", &dry_run,
>  		    "Parse options then exit"),
> +	OPT_INTEGER(0, "aio", &record.opts.nr_cblocks,
> +		    "asynchronous trace write operations (min: 1, max: 32, default: 2)"),

ok, so this got silently added in recent versions and I couldn't
find any justification for it.. why do we use more aio blocks for
single map now? also why the default is 2?

the option should be more specific like 'aio-blocks'

the change is difficult enough.. we should start simple and add
these additions with proper justification in separate patches

thanks,
jirka

^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: [PATCH v7 2/2]: perf record: enable asynchronous trace writing
  2018-09-05  7:39 ` [PATCH v7 2/2]: perf record: enable asynchronous trace writing Alexey Budankov
  2018-09-06 11:04   ` Jiri Olsa
@ 2018-09-06 11:04   ` Jiri Olsa
  2018-09-06 11:58     ` Alexey Budankov
  2018-09-06 11:04   ` Jiri Olsa
  2018-09-06 11:04   ` Jiri Olsa
  3 siblings, 1 reply; 23+ messages in thread
From: Jiri Olsa @ 2018-09-06 11:04 UTC (permalink / raw)
  To: Alexey Budankov
  Cc: Ingo Molnar, Peter Zijlstra, Arnaldo Carvalho de Melo,
	Alexander Shishkin, Namhyung Kim, Andi Kleen, linux-kernel

On Wed, Sep 05, 2018 at 10:39:25AM +0300, Alexey Budankov wrote:

SNIP

> +
>  static int process_synthesized_event(struct perf_tool *tool,
>  				     union perf_event *event,
>  				     struct perf_sample *sample __maybe_unused,
> @@ -130,12 +215,28 @@ static int process_synthesized_event(struct perf_tool *tool,
>  	return record__write(rec, event, event->header.size);
>  }
>  
> -static int record__pushfn(void *to, void *bf, size_t size)
> +static int record__pushfn(void *to, struct aiocb *cblock, void *data, size_t size)
>  {
> +	off_t off;
>  	struct record *rec = to;
> +	int ret, trace_fd = rec->session->data->file.fd;
>  
>  	rec->samples++;
> -	return record__write(rec, bf, size);
> +
> +	off =
> +	lseek(trace_fd, 0, SEEK_CUR);

please keep that on the same line.. also, please run this
patchset through the scripts/checkpatch.pl

don't worry about the 'line over 80 characters' errors,
but the rest will be nice to have ;-)

jirka

^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: [PATCH v7 2/2]: perf record: enable asynchronous trace writing
  2018-09-05  7:39 ` [PATCH v7 2/2]: perf record: enable asynchronous trace writing Alexey Budankov
  2018-09-06 11:04   ` Jiri Olsa
  2018-09-06 11:04   ` Jiri Olsa
@ 2018-09-06 11:04   ` Jiri Olsa
  2018-09-06 11:59     ` Alexey Budankov
  2018-09-06 11:04   ` Jiri Olsa
  3 siblings, 1 reply; 23+ messages in thread
From: Jiri Olsa @ 2018-09-06 11:04 UTC (permalink / raw)
  To: Alexey Budankov
  Cc: Ingo Molnar, Peter Zijlstra, Arnaldo Carvalho de Melo,
	Alexander Shishkin, Namhyung Kim, Andi Kleen, linux-kernel

On Wed, Sep 05, 2018 at 10:39:25AM +0300, Alexey Budankov wrote:

SNIP

> diff --git a/tools/perf/util/mmap.c b/tools/perf/util/mmap.c
> index 384d17cd1379..1d57d8387caf 100644
> --- a/tools/perf/util/mmap.c
> +++ b/tools/perf/util/mmap.c
> @@ -332,12 +332,12 @@ int perf_mmap__read_init(struct perf_mmap *map)
>  	return __perf_mmap__read_init(map);
>  }
>  
> -int perf_mmap__push(struct perf_mmap *md, void *to,
> -		    int push(void *to, void *buf, size_t size))
> +int perf_mmap__push(struct perf_mmap *md, void *to, int idx,
> +		    int push(void *to, struct aiocb *cblock, void *data, size_t size))
>  {
>  	u64 head = perf_mmap__read_head(md);
>  	unsigned char *data = md->base + page_size;
> -	unsigned long size;
> +	unsigned long size, size0 = 0;
>  	void *buf;
>  	int rc = 0;
>  
> @@ -345,31 +345,58 @@ int perf_mmap__push(struct perf_mmap *md, void *to,
>  	if (rc < 0)
>  		return (rc == -EAGAIN) ? 0 : -1;
>  
> +	/* md->base data is copied into md->data[idx] buffer to
> +	 * release space in the kernel buffer as fast as possible,

please start the comment with the extra '/*', like:

	/*
	 * md->base data is copied into md->data[idx] buffer to
	 * release space in the kernel buffer as fast as possible,
	...

jirka

^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: [PATCH v7 2/2]: perf record: enable asynchronous trace writing
  2018-09-05  7:39 ` [PATCH v7 2/2]: perf record: enable asynchronous trace writing Alexey Budankov
                     ` (2 preceding siblings ...)
  2018-09-06 11:04   ` Jiri Olsa
@ 2018-09-06 11:04   ` Jiri Olsa
  2018-09-06 12:09     ` Alexey Budankov
  3 siblings, 1 reply; 23+ messages in thread
From: Jiri Olsa @ 2018-09-06 11:04 UTC (permalink / raw)
  To: Alexey Budankov
  Cc: Ingo Molnar, Peter Zijlstra, Arnaldo Carvalho de Melo,
	Alexander Shishkin, Namhyung Kim, Andi Kleen, linux-kernel

On Wed, Sep 05, 2018 at 10:39:25AM +0300, Alexey Budankov wrote:

SNIP

> +		} else if (errno != EAGAIN) {
> +			cblock->aio_fildes = -1;
> +			pr_err("failed to queue perf data, error: %m\n");
> +			break;
> +		}
> +	} while (1);
> +
> +	return rc;
> +}
> +
> +static int record__aio_sync(struct perf_mmap *md)
> +{

this is almost identical to record__aio_sync function,
it looks like we should be able to do the sync with
single function.. for both the in-between syncs and
the final one

jirka

^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: [PATCH v7 1/2]: perf util: map data buffer for preserving collected data
  2018-09-05  7:19 ` [PATCH v7 1/2]: perf util: map data buffer for preserving collected data Alexey Budankov
  2018-09-06 11:04   ` Jiri Olsa
@ 2018-09-06 11:04   ` Jiri Olsa
  2018-09-06 11:54     ` Alexey Budankov
  1 sibling, 1 reply; 23+ messages in thread
From: Jiri Olsa @ 2018-09-06 11:04 UTC (permalink / raw)
  To: Alexey Budankov
  Cc: Ingo Molnar, Peter Zijlstra, Arnaldo Carvalho de Melo,
	Alexander Shishkin, Namhyung Kim, Andi Kleen, linux-kernel

On Wed, Sep 05, 2018 at 10:19:56AM +0300, Alexey Budankov wrote:

SNIP

> @@ -166,6 +174,7 @@ void perf_mmap__munmap(struct perf_mmap *map)
>  
>  int perf_mmap__mmap(struct perf_mmap *map, struct mmap_params *mp, int fd)
>  {
> +	int i;
>  	/*
>  	 * The last one will be done at perf_mmap__consume(), so that we
>  	 * make sure we don't prevent tools from consuming every last event in
> @@ -190,6 +199,50 @@ int perf_mmap__mmap(struct perf_mmap *map, struct mmap_params *mp, int fd)
>  		map->base = NULL;
>  		return -1;
>  	}
> +	map->nr_cblocks = mp->nr_cblocks;
> +	map->cblocks = calloc(map->nr_cblocks, sizeof(struct aiocb*));
> +	if (!map->cblocks) {
> +		pr_debug2("failed to allocate perf event data buffers, error %d\n",
> +				errno);
> +		return -1;
> +	}
> +	map->data = calloc(map->nr_cblocks, sizeof(void*));
> +	if (map->data) {
> +		int delta_max = sysconf(_SC_AIO_PRIO_DELTA_MAX);
> +		for (i = 0; i < map->nr_cblocks; ++i) {
> +			map->data[i] = malloc(perf_mmap__mmap_len(map));
> +			if (map->data[i]) {
> +				int prio;
> +				unsigned char *data = map->data[i];
> +				map->cblocks[i] = (struct aiocb *)&data[map->mask + 1];

the 'struct aiocb' is allocated at the last page of the buffer?
is that enough space? could we please make this more transparent
and allocate that space separately?

thanks,
jirka

^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: [PATCH v7 1/2]: perf util: map data buffer for preserving collected data
  2018-09-06 11:04   ` Jiri Olsa
@ 2018-09-06 11:50     ` Alexey Budankov
  0 siblings, 0 replies; 23+ messages in thread
From: Alexey Budankov @ 2018-09-06 11:50 UTC (permalink / raw)
  To: Jiri Olsa
  Cc: Ingo Molnar, Peter Zijlstra, Arnaldo Carvalho de Melo,
	Alexander Shishkin, Namhyung Kim, Andi Kleen, linux-kernel



On 06.09.2018 14:04, Jiri Olsa wrote:
> On Wed, Sep 05, 2018 at 10:19:56AM +0300, Alexey Budankov wrote:
>>
>> The map->data buffers are used to preserve map->base profiling data 
>> for writing to disk. AIO map->cblocks are used to queue corresponding 
>> map->data buffers for asynchronous writing. map->cblocks objects are 
>> located in the last page of every map->data buffer.
>>
>> Signed-off-by: Alexey Budankov <alexey.budankov@linux.intel.com>
>> ---
>>  Changes in v7:
>>   - implemented handling record.aio setting from perfconfig file
>>  Changes in v6:
>>   - adjusted setting of priorities for cblocks;
>>  Changes in v5:
>>   - reshaped layout of data structures;
>>   - implemented --aio option;
>>  Changes in v4:
>>   - converted mmap()/munmap() to malloc()/free() for mmap->data buffer management 
>>  Changes in v2:
>>   - converted zalloc() to calloc() for allocation of mmap_aio array,
>>   - cleared typo and adjusted fallback branch code;
>> ---
>>  tools/perf/builtin-record.c | 15 ++++++++++++-
>>  tools/perf/perf.h           |  1 +
>>  tools/perf/util/evlist.c    |  7 +++---
>>  tools/perf/util/evlist.h    |  3 ++-
>>  tools/perf/util/mmap.c      | 53 +++++++++++++++++++++++++++++++++++++++++++++
>>  tools/perf/util/mmap.h      |  6 ++++-
>>  6 files changed, 79 insertions(+), 6 deletions(-)
>>
>> diff --git a/tools/perf/builtin-record.c b/tools/perf/builtin-record.c
>> index 22ebeb92ac51..f17a6f9cb1ba 100644
>> --- a/tools/perf/builtin-record.c
>> +++ b/tools/perf/builtin-record.c
>> @@ -326,7 +326,8 @@ static int record__mmap_evlist(struct record *rec,
>>  
>>  	if (perf_evlist__mmap_ex(evlist, opts->mmap_pages,
>>  				 opts->auxtrace_mmap_pages,
>> -				 opts->auxtrace_snapshot_mode) < 0) {
>> +				 opts->auxtrace_snapshot_mode,
>> +				 opts->nr_cblocks) < 0) {
>>  		if (errno == EPERM) {
>>  			pr_err("Permission error mapping pages.\n"
>>  			       "Consider increasing "
>> @@ -1287,6 +1288,8 @@ static int perf_record_config(const char *var, const char *value, void *cb)
>>  		var = "call-graph.record-mode";
>>  		return perf_default_config(var, value, cb);
>>  	}
>> +	if (!strcmp(var, "record.aio"))
>> +		rec->opts.nr_cblocks = strtol(value, NULL, 0);
>>  
>>  	return 0;
>>  }
>> @@ -1519,6 +1522,7 @@ static struct record record = {
>>  			.default_per_cpu = true,
>>  		},
>>  		.proc_map_timeout     = 500,
>> +		.nr_cblocks	      = 2
>>  	},
>>  	.tool = {
>>  		.sample		= process_sample_event,
>> @@ -1678,6 +1682,8 @@ static struct option __record_options[] = {
>>  			  "signal"),
>>  	OPT_BOOLEAN(0, "dry-run", &dry_run,
>>  		    "Parse options then exit"),
>> +	OPT_INTEGER(0, "aio", &record.opts.nr_cblocks,
>> +		    "asynchronous trace write operations (min: 1, max: 32, default: 2)"),
> 
> ok, so this got silently added in recent versions and I couldn't
> find any justification for it.. why do we use more aio blocks for
> single map now? also why the default is 2?

Having more blocks may improve thruput from kernel to userspace for 
cases when we get more data at map->base but the started AIO is not 
finished yet. That can easily happen between calls of 
record__mmap_read_evlist().

> 
> the option should be more specific like 'aio-blocks'

ok.

> 
> the change is difficult enough.. we should start simple and add
> these additions with proper justification in separate patches

Setting default to 1 gives the simplest solution. I could provide 
justification where spinning at record__aio_sync() becomes the hotspot.

> 
> thanks,
> jirka
> 

^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: [PATCH v7 1/2]: perf util: map data buffer for preserving collected data
  2018-09-06 11:04   ` Jiri Olsa
@ 2018-09-06 11:54     ` Alexey Budankov
  0 siblings, 0 replies; 23+ messages in thread
From: Alexey Budankov @ 2018-09-06 11:54 UTC (permalink / raw)
  To: Jiri Olsa
  Cc: Ingo Molnar, Peter Zijlstra, Arnaldo Carvalho de Melo,
	Alexander Shishkin, Namhyung Kim, Andi Kleen, linux-kernel



On 06.09.2018 14:04, Jiri Olsa wrote:
> On Wed, Sep 05, 2018 at 10:19:56AM +0300, Alexey Budankov wrote:
> 
> SNIP
> 
>> @@ -166,6 +174,7 @@ void perf_mmap__munmap(struct perf_mmap *map)
>>  
>>  int perf_mmap__mmap(struct perf_mmap *map, struct mmap_params *mp, int fd)
>>  {
>> +	int i;
>>  	/*
>>  	 * The last one will be done at perf_mmap__consume(), so that we
>>  	 * make sure we don't prevent tools from consuming every last event in
>> @@ -190,6 +199,50 @@ int perf_mmap__mmap(struct perf_mmap *map, struct mmap_params *mp, int fd)
>>  		map->base = NULL;
>>  		return -1;
>>  	}
>> +	map->nr_cblocks = mp->nr_cblocks;
>> +	map->cblocks = calloc(map->nr_cblocks, sizeof(struct aiocb*));
>> +	if (!map->cblocks) {
>> +		pr_debug2("failed to allocate perf event data buffers, error %d\n",
>> +				errno);
>> +		return -1;
>> +	}
>> +	map->data = calloc(map->nr_cblocks, sizeof(void*));
>> +	if (map->data) {
>> +		int delta_max = sysconf(_SC_AIO_PRIO_DELTA_MAX);
>> +		for (i = 0; i < map->nr_cblocks; ++i) {
>> +			map->data[i] = malloc(perf_mmap__mmap_len(map));
>> +			if (map->data[i]) {
>> +				int prio;
>> +				unsigned char *data = map->data[i];
>> +				map->cblocks[i] = (struct aiocb *)&data[map->mask + 1];
> 
> the 'struct aiocb' is allocated at the last page of the buffer?
> is that enough space? could we please make this more transparent
> and allocate that space separately?

Would the array of pointers followed by the array of referenced objects fit?

> 
> thanks,
> jirka
> 

^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: [PATCH v7 2/2]: perf record: enable asynchronous trace writing
  2018-09-06 11:04   ` Jiri Olsa
@ 2018-09-06 11:57     ` Alexey Budankov
  0 siblings, 0 replies; 23+ messages in thread
From: Alexey Budankov @ 2018-09-06 11:57 UTC (permalink / raw)
  To: Jiri Olsa
  Cc: Ingo Molnar, Peter Zijlstra, Arnaldo Carvalho de Melo,
	Alexander Shishkin, Namhyung Kim, Andi Kleen, linux-kernel



On 06.09.2018 14:04, Jiri Olsa wrote:
> On Wed, Sep 05, 2018 at 10:39:25AM +0300, Alexey Budankov wrote:
> 
> SNIP
> 
>> -static int record__pushfn(void *to, void *bf, size_t size)
>> +static int record__pushfn(void *to, struct aiocb *cblock, void *data, size_t size)
>>  {
>> +	off_t off;
>>  	struct record *rec = to;
>> +	int ret, trace_fd = rec->session->data->file.fd;
>>  
>>  	rec->samples++;
>> -	return record__write(rec, bf, size);
>> +
>> +	off =
>> +	lseek(trace_fd, 0, SEEK_CUR);
>> +	lseek(trace_fd, off + size, SEEK_SET);
>> +
>> +	ret = record__aio_write(cblock, trace_fd, data, size, off);
>> +	if (!ret) {
>> +		rec->bytes_written += size;
>> +		if (switch_output_size(rec))
>> +			trigger_hit(&switch_output_trigger);
> 
> why do you need to call switch output from here?

Just preserved this logic from the serial implementation.

> 
> jirka
> 

^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: [PATCH v7 2/2]: perf record: enable asynchronous trace writing
  2018-09-06 11:04   ` Jiri Olsa
@ 2018-09-06 11:58     ` Alexey Budankov
  0 siblings, 0 replies; 23+ messages in thread
From: Alexey Budankov @ 2018-09-06 11:58 UTC (permalink / raw)
  To: Jiri Olsa
  Cc: Ingo Molnar, Peter Zijlstra, Arnaldo Carvalho de Melo,
	Alexander Shishkin, Namhyung Kim, Andi Kleen, linux-kernel



On 06.09.2018 14:04, Jiri Olsa wrote:
> On Wed, Sep 05, 2018 at 10:39:25AM +0300, Alexey Budankov wrote:
> 
> SNIP
> 
>> +
>>  static int process_synthesized_event(struct perf_tool *tool,
>>  				     union perf_event *event,
>>  				     struct perf_sample *sample __maybe_unused,
>> @@ -130,12 +215,28 @@ static int process_synthesized_event(struct perf_tool *tool,
>>  	return record__write(rec, event, event->header.size);
>>  }
>>  
>> -static int record__pushfn(void *to, void *bf, size_t size)
>> +static int record__pushfn(void *to, struct aiocb *cblock, void *data, size_t size)
>>  {
>> +	off_t off;
>>  	struct record *rec = to;
>> +	int ret, trace_fd = rec->session->data->file.fd;
>>  
>>  	rec->samples++;
>> -	return record__write(rec, bf, size);
>> +
>> +	off =
>> +	lseek(trace_fd, 0, SEEK_CUR);
> 
> please keep that on the same line.. also, please run this
> patchset through the scripts/checkpatch.pl
> 
> don't worry about the 'line over 80 characters' errors,
> but the rest will be nice to have ;-)

ok! :)

> 
> jirka
> 

^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: [PATCH v7 2/2]: perf record: enable asynchronous trace writing
  2018-09-06 11:04   ` Jiri Olsa
@ 2018-09-06 11:59     ` Alexey Budankov
  0 siblings, 0 replies; 23+ messages in thread
From: Alexey Budankov @ 2018-09-06 11:59 UTC (permalink / raw)
  To: Jiri Olsa
  Cc: Ingo Molnar, Peter Zijlstra, Arnaldo Carvalho de Melo,
	Alexander Shishkin, Namhyung Kim, Andi Kleen, linux-kernel



On 06.09.2018 14:04, Jiri Olsa wrote:
> On Wed, Sep 05, 2018 at 10:39:25AM +0300, Alexey Budankov wrote:
> 
> SNIP
> 
>> diff --git a/tools/perf/util/mmap.c b/tools/perf/util/mmap.c
>> index 384d17cd1379..1d57d8387caf 100644
>> --- a/tools/perf/util/mmap.c
>> +++ b/tools/perf/util/mmap.c
>> @@ -332,12 +332,12 @@ int perf_mmap__read_init(struct perf_mmap *map)
>>  	return __perf_mmap__read_init(map);
>>  }
>>  
>> -int perf_mmap__push(struct perf_mmap *md, void *to,
>> -		    int push(void *to, void *buf, size_t size))
>> +int perf_mmap__push(struct perf_mmap *md, void *to, int idx,
>> +		    int push(void *to, struct aiocb *cblock, void *data, size_t size))
>>  {
>>  	u64 head = perf_mmap__read_head(md);
>>  	unsigned char *data = md->base + page_size;
>> -	unsigned long size;
>> +	unsigned long size, size0 = 0;
>>  	void *buf;
>>  	int rc = 0;
>>  
>> @@ -345,31 +345,58 @@ int perf_mmap__push(struct perf_mmap *md, void *to,
>>  	if (rc < 0)
>>  		return (rc == -EAGAIN) ? 0 : -1;
>>  
>> +	/* md->base data is copied into md->data[idx] buffer to
>> +	 * release space in the kernel buffer as fast as possible,
> 
> please start the comment with the extra '/*', like:
> 
> 	/*
> 	 * md->base data is copied into md->data[idx] buffer to
> 	 * release space in the kernel buffer as fast as possible,
> 	...

Accepted!

> 
> jirka
> 

^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: [PATCH v7 2/2]: perf record: enable asynchronous trace writing
  2018-09-06 11:04   ` Jiri Olsa
@ 2018-09-06 12:09     ` Alexey Budankov
  0 siblings, 0 replies; 23+ messages in thread
From: Alexey Budankov @ 2018-09-06 12:09 UTC (permalink / raw)
  To: Jiri Olsa
  Cc: Ingo Molnar, Peter Zijlstra, Arnaldo Carvalho de Melo,
	Alexander Shishkin, Namhyung Kim, Andi Kleen, linux-kernel



On 06.09.2018 14:04, Jiri Olsa wrote:
> On Wed, Sep 05, 2018 at 10:39:25AM +0300, Alexey Budankov wrote:
> 
> SNIP
> 
>> +		} else if (errno != EAGAIN) {
>> +			cblock->aio_fildes = -1;
>> +			pr_err("failed to queue perf data, error: %m\n");
>> +			break;
>> +		}
>> +	} while (1);
>> +
>> +	return rc;
>> +}
>> +
>> +static int record__aio_sync(struct perf_mmap *md)
>> +{
> 
> this is almost identical to record__aio_sync function,
> it looks like we should be able to do the sync with
> single function.. for both the in-between syncs and
> the final one

There is some code duplication here. for()'s loop body at 
record__aio_complete() could probably be separated to 
a function and then reused by record__aio_sync().
The rest is not obvious at the moment.

> 
> jirka
> 

^ permalink raw reply	[flat|nested] 23+ messages in thread

end of thread, other threads:[~2018-09-06 12:09 UTC | newest]

Thread overview: 23+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2018-09-05  7:16 [PATCH v7 0/2]: perf: reduce data loss when profiling highly parallel CPU bound workloads Alexey Budankov
2018-09-05  7:19 ` [PATCH v7 1/2]: perf util: map data buffer for preserving collected data Alexey Budankov
2018-09-06 11:04   ` Jiri Olsa
2018-09-06 11:50     ` Alexey Budankov
2018-09-06 11:04   ` Jiri Olsa
2018-09-06 11:54     ` Alexey Budankov
2018-09-05  7:39 ` [PATCH v7 2/2]: perf record: enable asynchronous trace writing Alexey Budankov
2018-09-06 11:04   ` Jiri Olsa
2018-09-06 11:57     ` Alexey Budankov
2018-09-06 11:04   ` Jiri Olsa
2018-09-06 11:58     ` Alexey Budankov
2018-09-06 11:04   ` Jiri Olsa
2018-09-06 11:59     ` Alexey Budankov
2018-09-06 11:04   ` Jiri Olsa
2018-09-06 12:09     ` Alexey Budankov
2018-09-05 11:28 ` [PATCH v7 0/2]: perf: reduce data loss when profiling highly parallel CPU bound workloads Jiri Olsa
2018-09-05 17:37   ` Alexey Budankov
2018-09-05 18:51     ` Arnaldo Carvalho de Melo
2018-09-06  6:03       ` Alexey Budankov
2018-09-06  8:14         ` Jiri Olsa
2018-09-06  8:20           ` Alexey Budankov
2018-09-06  6:59       ` Alexey Budankov
2018-09-06  6:57   ` Alexey Budankov

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.