linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [PATCH v11 0/3]: perf: reduce data loss when profiling highly parallel CPU bound workloads
@ 2018-10-08  5:55 Alexey Budankov
  2018-10-08  6:14 ` [PATCH v11 1/3]: perf util: map data buffer for preserving collected data Alexey Budankov
                   ` (2 more replies)
  0 siblings, 3 replies; 33+ messages in thread
From: Alexey Budankov @ 2018-10-08  5:55 UTC (permalink / raw)
  To: Peter Zijlstra, Ingo Molnar, Arnaldo Carvalho de Melo
  Cc: Alexander Shishkin, Jiri Olsa, Namhyung Kim, Andi Kleen, linux-kernel


Currently in record mode the tool implements trace writing serially. 
The algorithm loops over mapped per-cpu data buffers and stores 
ready data chunks into a trace file using write() system call.

At some circumstances the kernel may lack free space in a buffer 
because the other buffer's half is not yet written to disk due to 
some other buffer's data writing by the tool at the moment.

Thus serial trace writing implementation may cause the kernel 
to loose profiling data and that is what observed when profiling 
highly parallel CPU bound workloads on machines with big number 
of cores.

Experiment with profiling matrix multiplication code executing 128 
threads on Intel Xeon Phi (KNM) with 272 cores, like below,
demonstrates data loss metrics value of 98%:

/usr/bin/time perf record -o /tmp/perf-ser.data -a -N -B -T -R -g \
    --call-graph dwarf,1024 --user-regs=IP,SP,BP --switch-events \
    -e cycles,instructions,ref-cycles,software/period=1,name=cs,config=0x3/Duk -- \
    matrix.gcc

Data loss metrics is the ratio lost_time/elapsed_time where 
lost_time is the sum of time intervals containing PERF_RECORD_LOST 
records and elapsed_time is the elapsed application run time 
under profiling.

Applying asynchronous trace streaming thru Posix AIO API [1] lowers 
data loss metrics value providing 2x improvement (from 98% to ~1%)

Asynchronous trace streaming is currently limited to glibc linkage.
musl libc [5] also provides Posix AIO API implementation, however 
the patchkit is not tested with it. There may be other libc libraries 
linked by Perf tool that currently lack Posix AIO API support [2], 
[3], [4] so NO_AIO define may be used to limit Perf tool binary to 
serial streaming only.

---
 Alexey Budankov (3):
        perf util: map data buffer for preserving collected data
	perf record: enable asynchronous trace writing
        perf record: extend trace writing to multi AIO

 tools/perf/Documentation/perf-record.txt |   6 +
 tools/perf/Makefile.config               |   5 +
 tools/perf/Makefile.perf                 |   7 +-
 tools/perf/builtin-record.c              | 210 +++++++++++++++++++++++++++++--
 tools/perf/perf.h                        |   3 +
 tools/perf/util/evlist.c                 |   8 +-
 tools/perf/util/evlist.h                 |   2 +-
 tools/perf/util/mmap.c                   | 154 ++++++++++++++++++++++-
 tools/perf/util/mmap.h                   |  17 +++
 9 files changed, 398 insertions(+), 14 deletions(-)

---
 Changes in v11:
 - replacing the both lseek() syscalls in every loop iteration by the only 
   two syscalls just before and after the loop at record__mmap_read_evlist() 
   and advancing *in-flight* off file pos value at perf_mmap__aio_push()
 Changes in v10:
 - moved specific code to perf_mmap__aio_mmap(), perf_mmap__aio_munmap();
 - adjusted error reporting by using %m
 - avoided lseek() setting file pos back in case of record__aio_write() failure
 - compacted code selecting between serial and AIO streaming
 - optimized call places of record__mmap_read_sync()
 - added description of aio-cblocks option into perf-record.txt
 Changes in v9:
 - enable AIO streaming only when --aio-cblocks option is specified explicitly
 - enable AIO based implementation when linking with glibc only
 - define NO_AIO to limit Perf binary to serial implementation
 Changes in v8:
 - run the whole thing thru checkpatch.pl and corrected found issues except
   lines longer than 80 symbols
 - corrected comments alignment and formatting
 - moved multi AIO implementation into 3rd patch in the series
 - implemented explicit cblocks array allocation
 - split AIO completion check into separate record__aio_complete()
 - set nr_cblocks default to 1 and max allowed value to 4
 Changes in v7:
 - implemented handling record.aio setting from perfconfig file
 Changes in v6:
 - adjusted setting of priorities for cblocks;
 - handled errno == EAGAIN case from aio_write() return;
 Changes in v5:
 - resolved livelock on perf record -e intel_pt// -- dd if=/dev/zero of=/dev/null count=100000
 - data loss metrics decreased from 25% to 2x in trialed configuration;
 - reshaped layout of data structures;
 - implemented --aio option;
 - avoided nanosleep() prior calling aio_suspend();
 - switched to per-cpu aio multi buffer record__aio_sync();
 - record_mmap_read_sync() now does global sync just before 
   switching trace file or collection stop;
 Changes in v4:
 - converted mmap()/munmap() to malloc()/free() for mmap->data buffer management
 - converted void *bf to struct perf_mmap *md in signatures
 - written comment in perf_mmap__push() just before perf_mmap__get();
 - written comment in record__mmap_read_sync() on possible restarting 
   of aio_write() operation and releasing perf_mmap object after all;
 - added perf_mmap__put() for the cases of failed aio_write();
 Changes in v3:
 - written comments about nanosleep(0.5ms) call prior aio_suspend()
   to cope with intrusiveness of its implementation in glibc;
 - written comments about rationale behind coping profiling data 
   into mmap->data buffer;
 Changes in v2:
 - converted zalloc() to calloc() for allocation of mmap_aio array,
 - cleared typo and adjusted fallback branch code;

---

[1] http://man7.org/linux/man-pages/man7/aio.7.html
[2] https://android.googlesource.com/platform/bionic/+/master/docs/status.md
[3] https://www.uclibc.org/
[4] https://uclibc-ng.org/
[5] https://www.musl-libc.org/

^ permalink raw reply	[flat|nested] 33+ messages in thread

* [PATCH v11 1/3]: perf util: map data buffer for preserving collected data
  2018-10-08  5:55 [PATCH v11 0/3]: perf: reduce data loss when profiling highly parallel CPU bound workloads Alexey Budankov
@ 2018-10-08  6:14 ` Alexey Budankov
  2018-10-08 10:50   ` Jiri Olsa
                     ` (2 more replies)
  2018-10-08  6:17 ` [PATCH v11 2/3]: perf record: enable asynchronous trace writing Alexey Budankov
  2018-10-08  6:19 ` [PATCH v11 3/3]: perf record: extend trace writing to multi AIO Alexey Budankov
  2 siblings, 3 replies; 33+ messages in thread
From: Alexey Budankov @ 2018-10-08  6:14 UTC (permalink / raw)
  To: Peter Zijlstra, Ingo Molnar, Arnaldo Carvalho de Melo
  Cc: Alexander Shishkin, Jiri Olsa, Namhyung Kim, Andi Kleen, linux-kernel


The map->data buffer is used to preserve map->base profiling data 
for writing to disk. AIO map->cblock is used to queue corresponding 
map->data buffer for asynchronous writing.

Signed-off-by: Alexey Budankov <alexey.budankov@linux.intel.com>
---
Changes in v10:
 - moved specific code to perf_mmap__aio_mmap(), perf_mmap__aio_munmap()
 - adjusted error reporting by using %m
Changes in v9:
  - implemented NO_AIO and HAVE_AIO_SUPPORT defines to cover cases of 
    libc implementations without Posix AIO API support
Changes in v7:
  - implemented handling record.aio setting from perfconfig file
 Changes in v6:
  - adjusted setting of priorities for cblocks;
 Changes in v5:
  - reshaped layout of data structures;
  - implemented --aio option;
 Changes in v4:
  - converted mmap()/munmap() to malloc()/free() for mmap->data buffer management 
 Changes in v2:
  - converted zalloc() to calloc() for allocation of mmap_aio array,
  - cleared typo and adjusted fallback branch code;
---
 tools/perf/Makefile.config |  5 +++++
 tools/perf/Makefile.perf   |  7 ++++++-
 tools/perf/util/evlist.c   |  4 +++-
 tools/perf/util/mmap.c     | 48 +++++++++++++++++++++++++++++++++++++++++++++-
 tools/perf/util/mmap.h     | 11 +++++++++++
 5 files changed, 72 insertions(+), 3 deletions(-)

diff --git a/tools/perf/Makefile.config b/tools/perf/Makefile.config
index f6d1a03c7523..2e90f4ce9214 100644
--- a/tools/perf/Makefile.config
+++ b/tools/perf/Makefile.config
@@ -355,6 +355,11 @@ endif # NO_LIBELF
 
 ifeq ($(feature-glibc), 1)
   CFLAGS += -DHAVE_GLIBC_SUPPORT
+  ifndef NO_AIO
+    ifndef BIONIC
+        CFLAGS += -DHAVE_AIO_SUPPORT
+    endif
+  endif
 endif
 
 ifdef NO_DWARF
diff --git a/tools/perf/Makefile.perf b/tools/perf/Makefile.perf
index 92514fb3689f..7becc6a72cf2 100644
--- a/tools/perf/Makefile.perf
+++ b/tools/perf/Makefile.perf
@@ -97,8 +97,13 @@ include ../scripts/utilities.mak
 # Define LIBCLANGLLVM if you DO want builtin clang and llvm support.
 # When selected, pass LLVM_CONFIG=/path/to/llvm-config to `make' if
 # llvm-config is not in $PATH.
-
+#
 # Define NO_CORESIGHT if you do not want support for CoreSight trace decoding.
+#
+# Define NO_AIO if you do not want support of Posix AIO based trace
+# streaming for record mode. Currently Posix AIO trace streaming is
+# supported only when linking with glibc.
+#
 
 # As per kernel Makefile, avoid funny character set dependencies
 unexport LC_ALL
diff --git a/tools/perf/util/evlist.c b/tools/perf/util/evlist.c
index be440df29615..af2f8c965d7a 100644
--- a/tools/perf/util/evlist.c
+++ b/tools/perf/util/evlist.c
@@ -1029,7 +1029,9 @@ int perf_evlist__mmap_ex(struct perf_evlist *evlist, unsigned int pages,
 	 * So &mp should not be passed through const pointer.
 	 */
 	struct mmap_params mp;
-
+#ifdef HAVE_AIO_SUPPORT
+	mp.nr_cblocks = 0;
+#endif
 	if (!evlist->mmap)
 		evlist->mmap = perf_evlist__alloc_mmap(evlist, false);
 	if (!evlist->mmap)
diff --git a/tools/perf/util/mmap.c b/tools/perf/util/mmap.c
index cdb95b3a1213..db8f16f8a363 100644
--- a/tools/perf/util/mmap.c
+++ b/tools/perf/util/mmap.c
@@ -153,8 +153,19 @@ void __weak auxtrace_mmap_params__set_idx(struct auxtrace_mmap_params *mp __mayb
 {
 }
 
+#ifdef HAVE_AIO_SUPPORT
+static void perf_mmap__aio_munmap(struct perf_mmap *map)
+{
+	if (map->data)
+		zfree(&map->data);
+}
+#endif
+
 void perf_mmap__munmap(struct perf_mmap *map)
 {
+#ifdef HAVE_AIO_SUPPORT
+	perf_mmap__aio_munmap(map);
+#endif
 	if (map->base != NULL) {
 		munmap(map->base, perf_mmap__mmap_len(map));
 		map->base = NULL;
@@ -164,8 +175,40 @@ void perf_mmap__munmap(struct perf_mmap *map)
 	auxtrace_mmap__munmap(&map->auxtrace_mmap);
 }
 
+#ifdef HAVE_AIO_SUPPORT
+static int perf_mmap__aio_mmap(struct perf_mmap *map, struct mmap_params *mp)
+{
+	int delta_max;
+
+	map->nr_cblocks = mp->nr_cblocks;
+	if (map->nr_cblocks) {
+		map->data = malloc(perf_mmap__mmap_len(map));
+		if (!map->data) {
+			pr_debug2("failed to allocate data buffer, error %m\n");
+			return -1;
+		}
+		/*
+		 * Use cblock.aio_fildes value different from -1
+		 * to denote started aio write operation on the
+		 * cblock so it requires explicit record__aio_sync()
+		 * call prior the cblock may be reused again.
+		 */
+		map->cblock.aio_fildes = -1;
+		/*
+		 * Allocate cblock with max priority delta to
+		 * have faster aio write system calls.
+		 */
+		delta_max = sysconf(_SC_AIO_PRIO_DELTA_MAX);
+		map->cblock.aio_reqprio = delta_max;
+	}
+
+	return 0;
+}
+#endif
+
 int perf_mmap__mmap(struct perf_mmap *map, struct mmap_params *mp, int fd, int cpu)
 {
+	int rc = 0;
 	/*
 	 * The last one will be done at perf_mmap__consume(), so that we
 	 * make sure we don't prevent tools from consuming every last event in
@@ -197,7 +240,10 @@ int perf_mmap__mmap(struct perf_mmap *map, struct mmap_params *mp, int fd, int c
 				&mp->auxtrace_mp, map->base, fd))
 		return -1;
 
-	return 0;
+#ifdef HAVE_AIO_SUPPORT
+	rc = perf_mmap__aio_mmap(map, mp);
+#endif
+	return rc;
 }
 
 static int overwrite_rb_find_range(void *buf, int mask, u64 *start, u64 *end)
diff --git a/tools/perf/util/mmap.h b/tools/perf/util/mmap.h
index e603314dc792..1b63b6cc7cf9 100644
--- a/tools/perf/util/mmap.h
+++ b/tools/perf/util/mmap.h
@@ -6,6 +6,9 @@
 #include <linux/types.h>
 #include <asm/barrier.h>
 #include <stdbool.h>
+#ifdef HAVE_AIO_SUPPORT
+#include <aio.h>
+#endif
 #include "auxtrace.h"
 #include "event.h"
 
@@ -26,6 +29,11 @@ struct perf_mmap {
 	bool		 overwrite;
 	struct auxtrace_mmap auxtrace_mmap;
 	char		 event_copy[PERF_SAMPLE_MAX_SIZE] __aligned(8);
+#ifdef HAVE_AIO_SUPPORT
+	void 		 *data;
+	struct aiocb	 cblock;
+	int		 nr_cblocks;
+#endif
 };
 
 /*
@@ -59,6 +67,9 @@ enum bkw_mmap_state {
 struct mmap_params {
 	int			    prot, mask;
 	struct auxtrace_mmap_params auxtrace_mp;
+#ifdef HAVE_AIO_SUPPORT
+	int			    nr_cblocks;
+#endif
 };
 
 int perf_mmap__mmap(struct perf_mmap *map, struct mmap_params *mp, int fd, int cpu);

^ permalink raw reply related	[flat|nested] 33+ messages in thread

* [PATCH v11 2/3]: perf record: enable asynchronous trace writing
  2018-10-08  5:55 [PATCH v11 0/3]: perf: reduce data loss when profiling highly parallel CPU bound workloads Alexey Budankov
  2018-10-08  6:14 ` [PATCH v11 1/3]: perf util: map data buffer for preserving collected data Alexey Budankov
@ 2018-10-08  6:17 ` Alexey Budankov
  2018-10-08 10:50   ` Jiri Olsa
                     ` (4 more replies)
  2018-10-08  6:19 ` [PATCH v11 3/3]: perf record: extend trace writing to multi AIO Alexey Budankov
  2 siblings, 5 replies; 33+ messages in thread
From: Alexey Budankov @ 2018-10-08  6:17 UTC (permalink / raw)
  To: Peter Zijlstra, Ingo Molnar, Arnaldo Carvalho de Melo
  Cc: Alexander Shishkin, Jiri Olsa, Namhyung Kim, Andi Kleen, linux-kernel


Trace file offset is read once before mmaps iterating loop and written
back after all performance data enqueued for aio writing. Trace file offset 
is incremented linearly after every successful aio write operation. 

record__aio_sync() blocks till completion of started AIO operation 
and then proceeds.

record__mmap_read_sync() implements a barrier for all incomplete
aio write requests.

Signed-off-by: Alexey Budankov <alexey.budankov@linux.intel.com>
---
 Changes in v11:
 - replacing the both lseek() syscalls in every loop iteration by the only 
   two syscalls just before and after the loop at record__mmap_read_evlist() 
   and advancing *in-flight* off file pos value at perf_mmap__aio_push()
 Changes in v10:
 - avoided lseek() setting file pos back in case of record__aio_write() failure
 - compacted code selecting between serial and AIO streaming
 - optimized call places of record__mmap_read_sync()
 Changes in v9:
 - enable AIO streaming only when --aio-cblocks option is specified explicitly
 Changes in v8:
 - split AIO completion check into separate record__aio_complete()
 Changes in v6:
 - handled errno == EAGAIN case from aio_write();
 Changes in v5:
 - data loss metrics decreased from 25% to 2x in trialed configuration;
 - avoided nanosleep() prior calling aio_suspend();
 - switched to per cpu multi record__aio_sync() aio
 - record_mmap_read_sync() now does global barrier just before 
   switching trace file or collection stop;
 - resolved livelock on perf record -e intel_pt// -- dd if=/dev/zero of=/dev/null count=100000
 Changes in v4:
 - converted void *bf to struct perf_mmap *md in signatures
 - written comment in perf_mmap__push() just before perf_mmap__get();
 - written comment in record__mmap_read_sync() on possible restarting 
   of aio_write() operation and releasing perf_mmap object after all;
 - added perf_mmap__put() for the cases of failed aio_write();
 Changes in v3:
 - written comments about nanosleep(0.5ms) call prior aio_suspend()
   to cope with intrusiveness of its implementation in glibc;
 - written comments about rationale behind coping profiling data 
   into mmap->data buffer;
---
 tools/perf/builtin-record.c | 168 ++++++++++++++++++++++++++++++++++++++++++--
 tools/perf/perf.h           |   3 +
 tools/perf/util/mmap.c      |  76 ++++++++++++++++++++
 tools/perf/util/mmap.h      |   5 ++
 4 files changed, 246 insertions(+), 6 deletions(-)

diff --git a/tools/perf/builtin-record.c b/tools/perf/builtin-record.c
index 0980dfe3396b..c66ae23e3939 100644
--- a/tools/perf/builtin-record.c
+++ b/tools/perf/builtin-record.c
@@ -124,6 +124,112 @@ static int record__write(struct record *rec, struct perf_mmap *map __maybe_unuse
 	return 0;
 }
 
+#ifdef HAVE_AIO_SUPPORT
+static int record__aio_write(struct aiocb *cblock, int trace_fd,
+		void *buf, size_t size, off_t off)
+{
+	int rc;
+
+	cblock->aio_fildes = trace_fd;
+	cblock->aio_buf    = buf;
+	cblock->aio_nbytes = size;
+	cblock->aio_offset = off;
+	cblock->aio_sigevent.sigev_notify = SIGEV_NONE;
+
+	do {
+		rc = aio_write(cblock);
+		if (rc == 0) {
+			break;
+		} else if (errno != EAGAIN) {
+			cblock->aio_fildes = -1;
+			pr_err("failed to queue perf data, error: %m\n");
+			break;
+		}
+	} while (1);
+
+	return rc;
+}
+
+static int record__aio_complete(struct perf_mmap *md, struct aiocb *cblock)
+{
+	void *rem_buf;
+	off_t rem_off;
+	size_t rem_size;
+	int rc, aio_errno;
+	ssize_t aio_ret, written;
+
+	aio_errno = aio_error(cblock);
+	if (aio_errno == EINPROGRESS)
+		return 0;
+
+	written = aio_ret = aio_return(cblock);
+	if (aio_ret < 0) {
+		if (aio_errno != EINTR)
+			pr_err("failed to write perf data, error: %m\n");
+		written = 0;
+	}
+
+	rem_size = cblock->aio_nbytes - written;
+
+	if (rem_size == 0) {
+		cblock->aio_fildes = -1;
+		/*
+		 * md->refcount is incremented in perf_mmap__push() for
+		 * every enqueued aio write request so decrement it because
+		 * the request is now complete.
+		 */
+		perf_mmap__put(md);
+		rc = 1;
+	} else {
+		/*
+		 * aio write request may require restart with the
+		 * reminder if the kernel didn't write whole
+		 * chunk at once.
+		 */
+		rem_off = cblock->aio_offset + written;
+		rem_buf = (void *)(cblock->aio_buf + written);
+		record__aio_write(cblock, cblock->aio_fildes,
+				rem_buf, rem_size, rem_off);
+		rc = 0;
+	}
+
+	return rc;
+}
+
+static void record__aio_sync(struct perf_mmap *md)
+{
+	struct aiocb *cblock = &md->cblock;
+	struct timespec timeout = { 0, 1000 * 1000  * 1 }; /* 1ms */
+
+	do {
+		if (cblock->aio_fildes == -1 || record__aio_complete(md, cblock))
+			return;
+
+		while (aio_suspend((const struct aiocb**)&cblock, 1, &timeout)) {
+			if (!(errno == EAGAIN || errno == EINTR))
+				pr_err("failed to sync perf data, error: %m\n");
+		}
+	} while (1);
+}
+
+static int record__aio_pushfn(void *to, struct aiocb *cblock, void *bf, size_t size, off_t off)
+{
+	struct record *rec = to;
+	int ret, trace_fd = rec->session->data->file.fd;
+
+	rec->samples++;
+
+	ret = record__aio_write(cblock, trace_fd, bf, size, off);
+	if (!ret) {
+		rec->bytes_written += size;
+		if (switch_output_size(rec))
+			trigger_hit(&switch_output_trigger);
+	}
+
+	return ret;
+}
+#endif
+
 static int process_synthesized_event(struct perf_tool *tool,
 				     union perf_event *event,
 				     struct perf_sample *sample __maybe_unused,
@@ -136,7 +242,6 @@ static int process_synthesized_event(struct perf_tool *tool,
 static int record__pushfn(struct perf_mmap *map, void *to, void *bf, size_t size)
 {
 	struct record *rec = to;
-
 	rec->samples++;
 	return record__write(rec, map, bf, size);
 }
@@ -513,6 +618,25 @@ static struct perf_event_header finished_round_event = {
 	.type = PERF_RECORD_FINISHED_ROUND,
 };
 
+#ifdef HAVE_AIO_SUPPORT
+static void record__mmap_read_sync(struct record *rec)
+{
+	int i;
+	struct perf_evlist *evlist = rec->evlist;
+	struct perf_mmap *maps = evlist->mmap;
+
+	if (!rec->opts.nr_cblocks)
+		return;
+
+	for (i = 0; i < evlist->nr_mmaps; i++) {
+		struct perf_mmap *map = &maps[i];
+
+		if (map->base)
+			record__aio_sync(map);
+	}
+}
+#endif
+
 static int record__mmap_read_evlist(struct record *rec, struct perf_evlist *evlist,
 				    bool overwrite)
 {
@@ -520,7 +644,10 @@ static int record__mmap_read_evlist(struct record *rec, struct perf_evlist *evli
 	int i;
 	int rc = 0;
 	struct perf_mmap *maps;
-
+#ifdef HAVE_AIO_SUPPORT
+	int trace_fd = rec->data.file.fd;
+	off_t off;
+#endif
 	if (!evlist)
 		return 0;
 
@@ -531,14 +658,34 @@ static int record__mmap_read_evlist(struct record *rec, struct perf_evlist *evli
 	if (overwrite && evlist->bkw_mmap_state != BKW_MMAP_DATA_PENDING)
 		return 0;
 
+#ifdef HAVE_AIO_SUPPORT
+	off = lseek(trace_fd, 0, SEEK_CUR);
+#endif
 	for (i = 0; i < evlist->nr_mmaps; i++) {
 		struct perf_mmap *map = &maps[i];
 
 		if (map->base) {
-			if (perf_mmap__push(map, rec, record__pushfn) != 0) {
-				rc = -1;
-				goto out;
+#ifdef HAVE_AIO_SUPPORT
+			if (!rec->opts.nr_cblocks) {
+#endif
+				if (perf_mmap__push(map, rec, record__pushfn) != 0) {
+					rc = -1;
+					goto out;
+				}
+#ifdef HAVE_AIO_SUPPORT
+			} else {
+				/*
+				 * Call record__aio_sync() to wait till map->data buffer
+				 * becomes available after previous aio write request.
+				 */
+				record__aio_sync(map);
+				if (perf_mmap__aio_push(map, rec, record__aio_pushfn, &off) != 0) {
+					lseek(trace_fd, off, SEEK_SET);
+					rc = -1;
+					goto out;
+				}
 			}
+#endif
 		}
 
 		if (map->auxtrace_mmap.base && !rec->opts.auxtrace_snapshot_mode &&
@@ -547,7 +694,9 @@ static int record__mmap_read_evlist(struct record *rec, struct perf_evlist *evli
 			goto out;
 		}
 	}
-
+#ifdef HAVE_AIO_SUPPORT
+	lseek(trace_fd, off, SEEK_SET);
+#endif
 	/*
 	 * Mark the round finished in case we wrote
 	 * at least one event.
@@ -650,6 +799,9 @@ record__switch_output(struct record *rec, bool at_exit)
 	/* Same Size:      "2015122520103046"*/
 	char timestamp[] = "InvalidTimestamp";
 
+#ifdef HAVE_AIO_SUPPORT
+	record__mmap_read_sync(rec);
+#endif
 	record__synthesize(rec, true);
 	if (target__none(&rec->opts.target))
 		record__synthesize_workload(rec, true);
@@ -1157,6 +1309,10 @@ static int __cmd_record(struct record *rec, int argc, const char **argv)
 		record__synthesize_workload(rec, true);
 
 out_child:
+
+#ifdef HAVE_AIO_SUPPORT
+	record__mmap_read_sync(rec);
+#endif
 	if (forks) {
 		int exit_status;
 
diff --git a/tools/perf/perf.h b/tools/perf/perf.h
index 21bf7f5a3cf5..ef700d8bb610 100644
--- a/tools/perf/perf.h
+++ b/tools/perf/perf.h
@@ -82,6 +82,9 @@ struct record_opts {
 	bool         use_clockid;
 	clockid_t    clockid;
 	unsigned int proc_map_timeout;
+#ifdef HAVE_AIO_SUPPORT
+	int	     nr_cblocks;
+#endif
 };
 
 struct option;
diff --git a/tools/perf/util/mmap.c b/tools/perf/util/mmap.c
index db8f16f8a363..ecaa5b5eb3ed 100644
--- a/tools/perf/util/mmap.c
+++ b/tools/perf/util/mmap.c
@@ -367,6 +367,82 @@ int perf_mmap__push(struct perf_mmap *md, void *to,
 	return rc;
 }
 
+#ifdef HAVE_AIO_SUPPORT
+int perf_mmap__aio_push(struct perf_mmap *md, void *to,
+			int push(void *to, struct aiocb *cblock, void *buf, size_t size, off_t off),
+			off_t *off)
+{
+	u64 head = perf_mmap__read_head(md);
+	unsigned char *data = md->base + page_size;
+	unsigned long size, size0 = 0;
+	void *buf;
+	int rc = 0;
+
+	rc = perf_mmap__read_init(md);
+	if (rc < 0)
+		return (rc == -EAGAIN) ? 0 : -1;
+
+	/*
+	 * md->base data is copied into md->data buffer to
+	 * release space in the kernel buffer as fast as possible,
+	 * thru perf_mmap__consume() below.
+	 *
+	 * That lets the kernel to proceed with storing more
+	 * profiling data into the kernel buffer earlier than other
+	 * per-cpu kernel buffers are handled.
+	 *
+	 * Coping can be done in two steps in case the chunk of
+	 * profiling data crosses the upper bound of the kernel buffer.
+	 * In this case we first move part of data from md->start
+	 * till the upper bound and then the reminder from the
+	 * beginning of the kernel buffer till the end of
+	 * the data chunk.
+	 */
+
+	size = md->end - md->start;
+
+	if ((md->start & md->mask) + size != (md->end & md->mask)) {
+		buf = &data[md->start & md->mask];
+		size = md->mask + 1 - (md->start & md->mask);
+		md->start += size;
+		memcpy(md->data, buf, size);
+		size0 = size;
+	}
+
+	buf = &data[md->start & md->mask];
+	size = md->end - md->start;
+	md->start += size;
+	memcpy(md->data + size0, buf, size);
+
+	/*
+	 * Increment md->refcount to guard md->data buffer
+	 * from premature deallocation because md object can be
+	 * released earlier than aio write request started
+	 * on mmap->data is complete.
+	 *
+	 * perf_mmap__put() is done at record__aio_complete()
+	 * after started request completion.
+	 */
+	perf_mmap__get(md);
+
+	md->prev = head;
+	perf_mmap__consume(md);
+
+	rc = push(to, &md->cblock, md->data, size0 + size, *off);
+	if (!rc) {
+		*off += size0 + size;
+	} else {
+		/*
+		 * Decrement md->refcount back if aio write
+		 * operation failed to start.
+		 */
+		perf_mmap__put(md);
+	}
+
+	return rc;
+}
+#endif
+
 /*
  * Mandatory for overwrite mode
  * The direction of overwrite mode is backward.
diff --git a/tools/perf/util/mmap.h b/tools/perf/util/mmap.h
index 1b63b6cc7cf9..8618223ae675 100644
--- a/tools/perf/util/mmap.h
+++ b/tools/perf/util/mmap.h
@@ -105,6 +105,11 @@ union perf_event *perf_mmap__read_event(struct perf_mmap *map);
 
 int perf_mmap__push(struct perf_mmap *md, void *to,
 		    int push(struct perf_mmap *map, void *to, void *buf, size_t size));
+#ifdef HAVE_AIO_SUPPORT
+int perf_mmap__aio_push(struct perf_mmap *md, void *to,
+			int push(void *to, struct aiocb *cblock, void *buf, size_t size, off_t off),
+			off_t *off);
+#endif
 
 size_t perf_mmap__mmap_len(struct perf_mmap *map);
 

^ permalink raw reply related	[flat|nested] 33+ messages in thread

* [PATCH v11 3/3]: perf record: extend trace writing to multi AIO
  2018-10-08  5:55 [PATCH v11 0/3]: perf: reduce data loss when profiling highly parallel CPU bound workloads Alexey Budankov
  2018-10-08  6:14 ` [PATCH v11 1/3]: perf util: map data buffer for preserving collected data Alexey Budankov
  2018-10-08  6:17 ` [PATCH v11 2/3]: perf record: enable asynchronous trace writing Alexey Budankov
@ 2018-10-08  6:19 ` Alexey Budankov
  2018-10-08 10:55   ` Jiri Olsa
  2 siblings, 1 reply; 33+ messages in thread
From: Alexey Budankov @ 2018-10-08  6:19 UTC (permalink / raw)
  To: Peter Zijlstra, Ingo Molnar, Arnaldo Carvalho de Melo
  Cc: Alexander Shishkin, Jiri Olsa, Namhyung Kim, Andi Kleen, linux-kernel


Multi AIO trace writing allows caching more kernel data into userspace 
memory postponing trace writing for the sake of overall profiling data 
thruput increase. It could be seen as kernel data buffer extension into
userspace memory.

With aio-cblocks option value different from 0, current default value, 
tool has capability to cache more and more data into user space
along with delegating spill to AIO.

That allows avoiding suspend at record__aio_sync() between calls of 
record__mmap_read_evlist() and increase profiling data thruput for 
the cost of userspace memory.

Signed-off-by: Alexey Budankov <alexey.budankov@linux.intel.com>
---
Changes in v10:
- added description of aio-cblocks option into perf-record.txt
---
 tools/perf/Documentation/perf-record.txt |  6 +++
 tools/perf/builtin-record.c              | 58 +++++++++++++++++++-----
 tools/perf/util/evlist.c                 |  6 +--
 tools/perf/util/evlist.h                 |  2 +-
 tools/perf/util/mmap.c                   | 76 ++++++++++++++++++++++----------
 tools/perf/util/mmap.h                   |  7 +--
 6 files changed, 115 insertions(+), 40 deletions(-)

diff --git a/tools/perf/Documentation/perf-record.txt b/tools/perf/Documentation/perf-record.txt
index 246dee081efd..7f43a05908ed 100644
--- a/tools/perf/Documentation/perf-record.txt
+++ b/tools/perf/Documentation/perf-record.txt
@@ -435,6 +435,12 @@ Specify vmlinux path which has debuginfo.
 --buildid-all::
 Record build-id of all DSOs regardless whether it's actually hit or not.
 
+--aio-cblocks <n>
+Use <n> control blocks for asynchronous (Posix AIO) trace writing (max: 4).
+Default value is 0 that enables serial (none-AIO) trace writing mode.
+AIO mode is supported only when linking Perf tool binary with libc library
+providing implementation for Posix AIO API.
+
 --all-kernel::
 Configure all used events to run in kernel space.
 
diff --git a/tools/perf/builtin-record.c b/tools/perf/builtin-record.c
index c66ae23e3939..016f521fcaf0 100644
--- a/tools/perf/builtin-record.c
+++ b/tools/perf/builtin-record.c
@@ -196,16 +196,35 @@ static int record__aio_complete(struct perf_mmap *md, struct aiocb *cblock)
 	return rc;
 }
 
-static void record__aio_sync(struct perf_mmap *md)
+static int record__aio_sync(struct perf_mmap *md, bool sync_all)
 {
-	struct aiocb *cblock = &md->cblock;
+	struct aiocb **aiocb = md->aiocb;
+	struct aiocb *cblocks = md->cblocks;
 	struct timespec timeout = { 0, 1000 * 1000  * 1 }; /* 1ms */
+	int i, do_suspend;
 
 	do {
-		if (cblock->aio_fildes == -1 || record__aio_complete(md, cblock))
-			return;
+		do_suspend = 0;
+		for (i = 0; i < md->nr_cblocks; ++i) {
+			if (cblocks[i].aio_fildes == -1 || record__aio_complete(md, &cblocks[i])) {
+				if (sync_all)
+					aiocb[i] = NULL;
+				else
+					return i;
+			} else {
+				/*
+				 * Started aio write is not complete yet
+				 * so it has to be waited before the
+				 * next allocation.
+				 */
+				aiocb[i] = &cblocks[i];
+				do_suspend = 1;
+			}
+		}
+		if (!do_suspend)
+			return -1;
 
-		while (aio_suspend((const struct aiocb**)&cblock, 1, &timeout)) {
+		while (aio_suspend((const struct aiocb **)aiocb, md->nr_cblocks, &timeout)) {
 			if (!(errno == EAGAIN || errno == EINTR))
 				pr_err("failed to sync perf data, error: %m\n");
 		}
@@ -430,11 +449,14 @@ static int record__mmap_evlist(struct record *rec,
 			       struct perf_evlist *evlist)
 {
 	struct record_opts *opts = &rec->opts;
+	int nr_cblocks = 0;
 	char msg[512];
-
+#ifdef HAVE_AIO_SUPPORT
+	nr_cblocks = opts->nr_cblocks;
+#endif
 	if (perf_evlist__mmap_ex(evlist, opts->mmap_pages,
 				 opts->auxtrace_mmap_pages,
-				 opts->auxtrace_snapshot_mode) < 0) {
+				 opts->auxtrace_snapshot_mode, nr_cblocks) < 0) {
 		if (errno == EPERM) {
 			pr_err("Permission error mapping pages.\n"
 			       "Consider increasing "
@@ -632,7 +654,7 @@ static void record__mmap_read_sync(struct record *rec)
 		struct perf_mmap *map = &maps[i];
 
 		if (map->base)
-			record__aio_sync(map);
+			record__aio_sync(map, true);
 	}
 }
 #endif
@@ -674,12 +696,13 @@ static int record__mmap_read_evlist(struct record *rec, struct perf_evlist *evli
 				}
 #ifdef HAVE_AIO_SUPPORT
 			} else {
+				int idx;
 				/*
 				 * Call record__aio_sync() to wait till map->data buffer
 				 * becomes available after previous aio write request.
 				 */
-				record__aio_sync(map);
-				if (perf_mmap__aio_push(map, rec, record__aio_pushfn, &off) != 0) {
+				idx = record__aio_sync(map, false);
+				if (perf_mmap__aio_push(map, rec, idx, record__aio_pushfn, &off) != 0) {
 					lseek(trace_fd, off, SEEK_SET);
 					rc = -1;
 					goto out;
@@ -1446,6 +1469,10 @@ static int perf_record_config(const char *var, const char *value, void *cb)
 		var = "call-graph.record-mode";
 		return perf_default_config(var, value, cb);
 	}
+#ifdef HAVE_AIO_SUPPORT
+	if (!strcmp(var, "record.aio-cblocks"))
+		rec->opts.nr_cblocks = strtol(value, NULL, 0);
+#endif
 
 	return 0;
 }
@@ -1837,6 +1864,10 @@ static struct option __record_options[] = {
 			  "signal"),
 	OPT_BOOLEAN(0, "dry-run", &dry_run,
 		    "Parse options then exit"),
+#ifdef HAVE_AIO_SUPPORT
+	OPT_INTEGER(0, "aio-cblocks", &record.opts.nr_cblocks,
+		    "Max number of simultaneous per-mmap trace writes (default: 0 - serial, max: 4)"),
+#endif
 	OPT_END()
 };
 
@@ -2029,6 +2060,13 @@ int cmd_record(int argc, const char **argv)
 		goto out;
 	}
 
+#ifdef HAVE_AIO_SUPPORT
+	if (rec->opts.nr_cblocks > 4)
+		rec->opts.nr_cblocks = 4;
+	if (verbose > 0)
+		pr_info("nr_cblocks: %d\n", rec->opts.nr_cblocks);
+#endif
+
 	err = __cmd_record(&record, argc, argv);
 out:
 	perf_evlist__delete(rec->evlist);
diff --git a/tools/perf/util/evlist.c b/tools/perf/util/evlist.c
index af2f8c965d7a..2dad3a4e50e0 100644
--- a/tools/perf/util/evlist.c
+++ b/tools/perf/util/evlist.c
@@ -1018,7 +1018,7 @@ int perf_evlist__parse_mmap_pages(const struct option *opt, const char *str,
  */
 int perf_evlist__mmap_ex(struct perf_evlist *evlist, unsigned int pages,
 			 unsigned int auxtrace_pages,
-			 bool auxtrace_overwrite)
+			 bool auxtrace_overwrite, int nr_cblocks __maybe_unused)
 {
 	struct perf_evsel *evsel;
 	const struct cpu_map *cpus = evlist->cpus;
@@ -1030,7 +1030,7 @@ int perf_evlist__mmap_ex(struct perf_evlist *evlist, unsigned int pages,
 	 */
 	struct mmap_params mp;
 #ifdef HAVE_AIO_SUPPORT
-	mp.nr_cblocks = 0;
+	mp.nr_cblocks = nr_cblocks;
 #endif
 	if (!evlist->mmap)
 		evlist->mmap = perf_evlist__alloc_mmap(evlist, false);
@@ -1062,7 +1062,7 @@ int perf_evlist__mmap_ex(struct perf_evlist *evlist, unsigned int pages,
 
 int perf_evlist__mmap(struct perf_evlist *evlist, unsigned int pages)
 {
-	return perf_evlist__mmap_ex(evlist, pages, 0, false);
+	return perf_evlist__mmap_ex(evlist, pages, 0, false, 0);
 }
 
 int perf_evlist__create_maps(struct perf_evlist *evlist, struct target *target)
diff --git a/tools/perf/util/evlist.h b/tools/perf/util/evlist.h
index dc66436add98..2464463879b4 100644
--- a/tools/perf/util/evlist.h
+++ b/tools/perf/util/evlist.h
@@ -162,7 +162,7 @@ unsigned long perf_event_mlock_kb_in_pages(void);
 
 int perf_evlist__mmap_ex(struct perf_evlist *evlist, unsigned int pages,
 			 unsigned int auxtrace_pages,
-			 bool auxtrace_overwrite);
+			 bool auxtrace_overwrite, int nr_cblocks);
 int perf_evlist__mmap(struct perf_evlist *evlist, unsigned int pages);
 void perf_evlist__munmap(struct perf_evlist *evlist);
 
diff --git a/tools/perf/util/mmap.c b/tools/perf/util/mmap.c
index ecaa5b5eb3ed..d33511490632 100644
--- a/tools/perf/util/mmap.c
+++ b/tools/perf/util/mmap.c
@@ -156,8 +156,16 @@ void __weak auxtrace_mmap_params__set_idx(struct auxtrace_mmap_params *mp __mayb
 #ifdef HAVE_AIO_SUPPORT
 static void perf_mmap__aio_munmap(struct perf_mmap *map)
 {
-	if (map->data)
-		zfree(&map->data);
+	int i;
+
+	if (map->aiocb)
+		zfree(&map->aiocb);
+	if (map->cblocks)
+		zfree(&map->cblocks);
+	for (i = 0; i < map->nr_cblocks; ++i) {
+		if (map->data[i])
+			zfree(&map->data[i]);
+	}
 }
 #endif
 
@@ -178,28 +186,50 @@ void perf_mmap__munmap(struct perf_mmap *map)
 #ifdef HAVE_AIO_SUPPORT
 static int perf_mmap__aio_mmap(struct perf_mmap *map, struct mmap_params *mp)
 {
-	int delta_max;
+	int delta_max, i, prio;
 
 	map->nr_cblocks = mp->nr_cblocks;
 	if (map->nr_cblocks) {
-		map->data = malloc(perf_mmap__mmap_len(map));
+		map->aiocb = calloc(map->nr_cblocks, sizeof(struct aiocb *));
+		if (!map->aiocb) {
+			pr_debug2("failed to allocate aiocb for data buffer, error %m\n");
+			return -1;
+		}
+		map->cblocks = calloc(map->nr_cblocks, sizeof(struct aiocb));
+		if (!map->cblocks) {
+			pr_debug2("failed to allocate cblocks for data buffer, error %m\n");
+			return -1;
+		}
+		map->data = calloc(map->nr_cblocks, sizeof(void *));
 		if (!map->data) {
 			pr_debug2("failed to allocate data buffer, error %m\n");
 			return -1;
 		}
-		/*
-		 * Use cblock.aio_fildes value different from -1
-		 * to denote started aio write operation on the
-		 * cblock so it requires explicit record__aio_sync()
-		 * call prior the cblock may be reused again.
-		 */
-		map->cblock.aio_fildes = -1;
-		/*
-		 * Allocate cblock with max priority delta to
-		 * have faster aio write system calls.
-		 */
 		delta_max = sysconf(_SC_AIO_PRIO_DELTA_MAX);
-		map->cblock.aio_reqprio = delta_max;
+		for (i = 0; i < map->nr_cblocks; ++i) {
+			map->data[i] = malloc(perf_mmap__mmap_len(map));
+			if (!map->data[i]) {
+				pr_debug2("failed to allocate data buffer area, error %m");
+				return -1;
+			}
+			/*
+			 * Use cblock.aio_fildes value different from -1
+			 * to denote started aio write operation on the
+			 * cblock so it requires explicit record__aio_sync()
+			 * call prior the cblock may be reused again.
+			 */
+			map->cblocks[i].aio_fildes = -1;
+			/*
+			 * Allocate cblocks with priority delta to have
+			 * faster aio write system calls because queued requests
+			 * are kept in separate per-prio queues and adding
+			 * a new request will iterate thru shorter per-prio
+			 * list. Blocks with numbers higher than
+			 *  _SC_AIO_PRIO_DELTA_MAX go with priority 0.
+			 */
+			prio = delta_max - i;
+			map->cblocks[i].aio_reqprio = prio >= 0 ? prio : 0;
+		}
 	}
 
 	return 0;
@@ -368,7 +398,7 @@ int perf_mmap__push(struct perf_mmap *md, void *to,
 }
 
 #ifdef HAVE_AIO_SUPPORT
-int perf_mmap__aio_push(struct perf_mmap *md, void *to,
+int perf_mmap__aio_push(struct perf_mmap *md, void *to, int idx,
 			int push(void *to, struct aiocb *cblock, void *buf, size_t size, off_t off),
 			off_t *off)
 {
@@ -383,7 +413,7 @@ int perf_mmap__aio_push(struct perf_mmap *md, void *to,
 		return (rc == -EAGAIN) ? 0 : -1;
 
 	/*
-	 * md->base data is copied into md->data buffer to
+	 * md->base data is copied into md->data[idx] buffer to
 	 * release space in the kernel buffer as fast as possible,
 	 * thru perf_mmap__consume() below.
 	 *
@@ -405,20 +435,20 @@ int perf_mmap__aio_push(struct perf_mmap *md, void *to,
 		buf = &data[md->start & md->mask];
 		size = md->mask + 1 - (md->start & md->mask);
 		md->start += size;
-		memcpy(md->data, buf, size);
+		memcpy(md->data[idx], buf, size);
 		size0 = size;
 	}
 
 	buf = &data[md->start & md->mask];
 	size = md->end - md->start;
 	md->start += size;
-	memcpy(md->data + size0, buf, size);
+	memcpy(md->data[idx] + size0, buf, size);
 
 	/*
-	 * Increment md->refcount to guard md->data buffer
+	 * Increment md->refcount to guard md->data[idx] buffer
 	 * from premature deallocation because md object can be
 	 * released earlier than aio write request started
-	 * on mmap->data is complete.
+	 * on mmap->data[idx] is complete.
 	 *
 	 * perf_mmap__put() is done at record__aio_complete()
 	 * after started request completion.
@@ -428,7 +458,7 @@ int perf_mmap__aio_push(struct perf_mmap *md, void *to,
 	md->prev = head;
 	perf_mmap__consume(md);
 
-	rc = push(to, &md->cblock, md->data, size0 + size, *off);
+	rc = push(to, &md->cblocks[idx], md->data[idx], size0 + size, *off);
 	if (!rc) {
 		*off += size0 + size;
 	} else {
diff --git a/tools/perf/util/mmap.h b/tools/perf/util/mmap.h
index 8618223ae675..67af594b57eb 100644
--- a/tools/perf/util/mmap.h
+++ b/tools/perf/util/mmap.h
@@ -30,8 +30,9 @@ struct perf_mmap {
 	struct auxtrace_mmap auxtrace_mmap;
 	char		 event_copy[PERF_SAMPLE_MAX_SIZE] __aligned(8);
 #ifdef HAVE_AIO_SUPPORT
-	void 		 *data;
-	struct aiocb	 cblock;
+	void		 **data;
+	struct aiocb	 *cblocks;
+	struct aiocb	 **aiocb;
 	int		 nr_cblocks;
 #endif
 };
@@ -106,7 +107,7 @@ union perf_event *perf_mmap__read_event(struct perf_mmap *map);
 int perf_mmap__push(struct perf_mmap *md, void *to,
 		    int push(struct perf_mmap *map, void *to, void *buf, size_t size));
 #ifdef HAVE_AIO_SUPPORT
-int perf_mmap__aio_push(struct perf_mmap *md, void *to,
+int perf_mmap__aio_push(struct perf_mmap *md, void *to, int idx,
 			int push(void *to, struct aiocb *cblock, void *buf, size_t size, off_t off),
 			off_t *off);
 #endif

^ permalink raw reply related	[flat|nested] 33+ messages in thread

* Re: [PATCH v11 2/3]: perf record: enable asynchronous trace writing
  2018-10-08  6:17 ` [PATCH v11 2/3]: perf record: enable asynchronous trace writing Alexey Budankov
@ 2018-10-08 10:50   ` Jiri Olsa
  2018-10-08 12:17     ` Alexey Budankov
  2018-10-08 10:51   ` Jiri Olsa
                     ` (3 subsequent siblings)
  4 siblings, 1 reply; 33+ messages in thread
From: Jiri Olsa @ 2018-10-08 10:50 UTC (permalink / raw)
  To: Alexey Budankov
  Cc: Peter Zijlstra, Ingo Molnar, Arnaldo Carvalho de Melo,
	Alexander Shishkin, Namhyung Kim, Andi Kleen, linux-kernel

On Mon, Oct 08, 2018 at 09:17:11AM +0300, Alexey Budankov wrote:

SNIP

> -
> +#ifdef HAVE_AIO_SUPPORT
> +	lseek(trace_fd, off, SEEK_SET);
> +#endif
>  	/*
>  	 * Mark the round finished in case we wrote
>  	 * at least one event.
> @@ -650,6 +799,9 @@ record__switch_output(struct record *rec, bool at_exit)
>  	/* Same Size:      "2015122520103046"*/
>  	char timestamp[] = "InvalidTimestamp";
>  
> +#ifdef HAVE_AIO_SUPPORT
> +	record__mmap_read_sync(rec);
> +#endif

same, please define the record__mmap_read_sync dummy
so we can skip that #ifdef #endif

>  	record__synthesize(rec, true);
>  	if (target__none(&rec->opts.target))
>  		record__synthesize_workload(rec, true);
> @@ -1157,6 +1309,10 @@ static int __cmd_record(struct record *rec, int argc, const char **argv)
>  		record__synthesize_workload(rec, true);
>  
>  out_child:
> +
> +#ifdef HAVE_AIO_SUPPORT
> +	record__mmap_read_sync(rec);
> +#endif

ditto

thanks,
jirka

SNIP

^ permalink raw reply	[flat|nested] 33+ messages in thread

* Re: [PATCH v11 1/3]: perf util: map data buffer for preserving collected data
  2018-10-08  6:14 ` [PATCH v11 1/3]: perf util: map data buffer for preserving collected data Alexey Budankov
@ 2018-10-08 10:50   ` Jiri Olsa
  2018-10-08 11:55     ` Alexey Budankov
  2018-10-08 10:50   ` Jiri Olsa
  2018-10-08 10:51   ` Jiri Olsa
  2 siblings, 1 reply; 33+ messages in thread
From: Jiri Olsa @ 2018-10-08 10:50 UTC (permalink / raw)
  To: Alexey Budankov
  Cc: Peter Zijlstra, Ingo Molnar, Arnaldo Carvalho de Melo,
	Alexander Shishkin, Namhyung Kim, Andi Kleen, linux-kernel

On Mon, Oct 08, 2018 at 09:14:29AM +0300, Alexey Budankov wrote:
> 
> The map->data buffer is used to preserve map->base profiling data 
> for writing to disk. AIO map->cblock is used to queue corresponding 
> map->data buffer for asynchronous writing.
> 
> Signed-off-by: Alexey Budankov <alexey.budankov@linux.intel.com>
> ---
> Changes in v10:
>  - moved specific code to perf_mmap__aio_mmap(), perf_mmap__aio_munmap()
>  - adjusted error reporting by using %m
> Changes in v9:
>   - implemented NO_AIO and HAVE_AIO_SUPPORT defines to cover cases of 
>     libc implementations without Posix AIO API support
> Changes in v7:
>   - implemented handling record.aio setting from perfconfig file
>  Changes in v6:
>   - adjusted setting of priorities for cblocks;
>  Changes in v5:
>   - reshaped layout of data structures;
>   - implemented --aio option;
>  Changes in v4:
>   - converted mmap()/munmap() to malloc()/free() for mmap->data buffer management 
>  Changes in v2:
>   - converted zalloc() to calloc() for allocation of mmap_aio array,
>   - cleared typo and adjusted fallback branch code;
> ---
>  tools/perf/Makefile.config |  5 +++++
>  tools/perf/Makefile.perf   |  7 ++++++-
>  tools/perf/util/evlist.c   |  4 +++-
>  tools/perf/util/mmap.c     | 48 +++++++++++++++++++++++++++++++++++++++++++++-
>  tools/perf/util/mmap.h     | 11 +++++++++++
>  5 files changed, 72 insertions(+), 3 deletions(-)
> 
> diff --git a/tools/perf/Makefile.config b/tools/perf/Makefile.config
> index f6d1a03c7523..2e90f4ce9214 100644
> --- a/tools/perf/Makefile.config
> +++ b/tools/perf/Makefile.config
> @@ -355,6 +355,11 @@ endif # NO_LIBELF
>  
>  ifeq ($(feature-glibc), 1)
>    CFLAGS += -DHAVE_GLIBC_SUPPORT
> +  ifndef NO_AIO

hum, do we need NO_AIO? we have the --aio option to enable that right?
I guess BIONIC does not support aio, but but will it fail when it's
compiled in there?

jirka

> +    ifndef BIONIC
> +        CFLAGS += -DHAVE_AIO_SUPPORT
> +    endif
> +  endif
>  endif

SNIP

^ permalink raw reply	[flat|nested] 33+ messages in thread

* Re: [PATCH v11 1/3]: perf util: map data buffer for preserving collected data
  2018-10-08  6:14 ` [PATCH v11 1/3]: perf util: map data buffer for preserving collected data Alexey Budankov
  2018-10-08 10:50   ` Jiri Olsa
@ 2018-10-08 10:50   ` Jiri Olsa
  2018-10-08 12:03     ` Alexey Budankov
  2018-10-08 10:51   ` Jiri Olsa
  2 siblings, 1 reply; 33+ messages in thread
From: Jiri Olsa @ 2018-10-08 10:50 UTC (permalink / raw)
  To: Alexey Budankov
  Cc: Peter Zijlstra, Ingo Molnar, Arnaldo Carvalho de Melo,
	Alexander Shishkin, Namhyung Kim, Andi Kleen, linux-kernel

On Mon, Oct 08, 2018 at 09:14:29AM +0300, Alexey Budankov wrote:

SNIP

> diff --git a/tools/perf/util/evlist.c b/tools/perf/util/evlist.c
> index be440df29615..af2f8c965d7a 100644
> --- a/tools/perf/util/evlist.c
> +++ b/tools/perf/util/evlist.c
> @@ -1029,7 +1029,9 @@ int perf_evlist__mmap_ex(struct perf_evlist *evlist, unsigned int pages,
>  	 * So &mp should not be passed through const pointer.
>  	 */
>  	struct mmap_params mp;
> -
> +#ifdef HAVE_AIO_SUPPORT
> +	mp.nr_cblocks = 0;
> +#endif
>  	if (!evlist->mmap)
>  		evlist->mmap = perf_evlist__alloc_mmap(evlist, false);
>  	if (!evlist->mmap)
> diff --git a/tools/perf/util/mmap.c b/tools/perf/util/mmap.c
> index cdb95b3a1213..db8f16f8a363 100644
> --- a/tools/perf/util/mmap.c
> +++ b/tools/perf/util/mmap.c
> @@ -153,8 +153,19 @@ void __weak auxtrace_mmap_params__set_idx(struct auxtrace_mmap_params *mp __mayb
>  {
>  }
>  
> +#ifdef HAVE_AIO_SUPPORT
> +static void perf_mmap__aio_munmap(struct perf_mmap *map)
> +{
> +	if (map->data)
> +		zfree(&map->data);
> +}

if we really need to keep this optional for compilation,
please make it as single block with dummy functions
for when it's not compiled in, like:

#ifdef 
static void perf_mmap__aio_munmap(struct perf_mmap *map)
{
	if (map->data)
		zfree(&map->data);
}
#else
static void perf_mmap__aio_munmap(struct perf_mmap *map) { }
#endif

thanks,
jirka


> +#endif
> +
>  void perf_mmap__munmap(struct perf_mmap *map)
>  {
> +#ifdef HAVE_AIO_SUPPORT
> +	perf_mmap__aio_munmap(map);
> +#endif
>  	if (map->base != NULL) {
>  		munmap(map->base, perf_mmap__mmap_len(map));
>  		map->base = NULL;
> @@ -164,8 +175,40 @@ void perf_mmap__munmap(struct perf_mmap *map)
>  	auxtrace_mmap__munmap(&map->auxtrace_mmap);
>  }

SNIP

^ permalink raw reply	[flat|nested] 33+ messages in thread

* Re: [PATCH v11 1/3]: perf util: map data buffer for preserving collected data
  2018-10-08  6:14 ` [PATCH v11 1/3]: perf util: map data buffer for preserving collected data Alexey Budankov
  2018-10-08 10:50   ` Jiri Olsa
  2018-10-08 10:50   ` Jiri Olsa
@ 2018-10-08 10:51   ` Jiri Olsa
  2018-10-08 12:05     ` Alexey Budankov
  2 siblings, 1 reply; 33+ messages in thread
From: Jiri Olsa @ 2018-10-08 10:51 UTC (permalink / raw)
  To: Alexey Budankov
  Cc: Peter Zijlstra, Ingo Molnar, Arnaldo Carvalho de Melo,
	Alexander Shishkin, Namhyung Kim, Andi Kleen, linux-kernel

On Mon, Oct 08, 2018 at 09:14:29AM +0300, Alexey Budankov wrote:

SNIP

>  static int overwrite_rb_find_range(void *buf, int mask, u64 *start, u64 *end)
> diff --git a/tools/perf/util/mmap.h b/tools/perf/util/mmap.h
> index e603314dc792..1b63b6cc7cf9 100644
> --- a/tools/perf/util/mmap.h
> +++ b/tools/perf/util/mmap.h
> @@ -6,6 +6,9 @@
>  #include <linux/types.h>
>  #include <asm/barrier.h>
>  #include <stdbool.h>
> +#ifdef HAVE_AIO_SUPPORT
> +#include <aio.h>
> +#endif
>  #include "auxtrace.h"
>  #include "event.h"
>  
> @@ -26,6 +29,11 @@ struct perf_mmap {
>  	bool		 overwrite;
>  	struct auxtrace_mmap auxtrace_mmap;
>  	char		 event_copy[PERF_SAMPLE_MAX_SIZE] __aligned(8);
> +#ifdef HAVE_AIO_SUPPORT
> +	void 		 *data;
> +	struct aiocb	 cblock;
> +	int		 nr_cblocks;
> +#endif

could you please separate those in struct, maybe anonymous like:

	...
	struct {
		void 		 *data;
		struct aiocb	 cblock;
		int		 nr_cblocks;
	} aio;
	...


>  };
>  
>  /*
> @@ -59,6 +67,9 @@ enum bkw_mmap_state {
>  struct mmap_params {
>  	int			    prot, mask;
>  	struct auxtrace_mmap_params auxtrace_mp;
> +#ifdef HAVE_AIO_SUPPORT
> +	int			    nr_cblocks;
> +#endif

I dont think we need the ifdef here.. IMO one extra int is not
worth of the ifdef endif confusion

jirka

^ permalink raw reply	[flat|nested] 33+ messages in thread

* Re: [PATCH v11 2/3]: perf record: enable asynchronous trace writing
  2018-10-08  6:17 ` [PATCH v11 2/3]: perf record: enable asynchronous trace writing Alexey Budankov
  2018-10-08 10:50   ` Jiri Olsa
@ 2018-10-08 10:51   ` Jiri Olsa
  2018-10-08 12:19     ` Alexey Budankov
  2018-10-08 10:52   ` Jiri Olsa
                     ` (2 subsequent siblings)
  4 siblings, 1 reply; 33+ messages in thread
From: Jiri Olsa @ 2018-10-08 10:51 UTC (permalink / raw)
  To: Alexey Budankov
  Cc: Peter Zijlstra, Ingo Molnar, Arnaldo Carvalho de Melo,
	Alexander Shishkin, Namhyung Kim, Andi Kleen, linux-kernel

On Mon, Oct 08, 2018 at 09:17:11AM +0300, Alexey Budankov wrote:

SNIP

>  struct option;
> diff --git a/tools/perf/util/mmap.c b/tools/perf/util/mmap.c
> index db8f16f8a363..ecaa5b5eb3ed 100644
> --- a/tools/perf/util/mmap.c
> +++ b/tools/perf/util/mmap.c
> @@ -367,6 +367,82 @@ int perf_mmap__push(struct perf_mmap *md, void *to,
>  	return rc;
>  }
>  
> +#ifdef HAVE_AIO_SUPPORT
> +int perf_mmap__aio_push(struct perf_mmap *md, void *to,
> +			int push(void *to, struct aiocb *cblock, void *buf, size_t size, off_t off),
> +			off_t *off)
> +{

seems like this could be defined static within builtin-record object,
is there a reason why it's in here?

thanks,
jirka

^ permalink raw reply	[flat|nested] 33+ messages in thread

* Re: [PATCH v11 2/3]: perf record: enable asynchronous trace writing
  2018-10-08  6:17 ` [PATCH v11 2/3]: perf record: enable asynchronous trace writing Alexey Budankov
  2018-10-08 10:50   ` Jiri Olsa
  2018-10-08 10:51   ` Jiri Olsa
@ 2018-10-08 10:52   ` Jiri Olsa
  2018-10-08 12:24     ` Alexey Budankov
  2018-10-08 10:57   ` Jiri Olsa
  2018-10-08 10:58   ` Jiri Olsa
  4 siblings, 1 reply; 33+ messages in thread
From: Jiri Olsa @ 2018-10-08 10:52 UTC (permalink / raw)
  To: Alexey Budankov
  Cc: Peter Zijlstra, Ingo Molnar, Arnaldo Carvalho de Melo,
	Alexander Shishkin, Namhyung Kim, Andi Kleen, linux-kernel

On Mon, Oct 08, 2018 at 09:17:11AM +0300, Alexey Budankov wrote:

SNIP

>  static int record__mmap_read_evlist(struct record *rec, struct perf_evlist *evlist,
>  				    bool overwrite)
>  {
> @@ -520,7 +644,10 @@ static int record__mmap_read_evlist(struct record *rec, struct perf_evlist *evli
>  	int i;
>  	int rc = 0;
>  	struct perf_mmap *maps;
> -
> +#ifdef HAVE_AIO_SUPPORT
> +	int trace_fd = rec->data.file.fd;
> +	off_t off;
> +#endif
>  	if (!evlist)
>  		return 0;
>  
> @@ -531,14 +658,34 @@ static int record__mmap_read_evlist(struct record *rec, struct perf_evlist *evli
>  	if (overwrite && evlist->bkw_mmap_state != BKW_MMAP_DATA_PENDING)
>  		return 0;
>  
> +#ifdef HAVE_AIO_SUPPORT
> +	off = lseek(trace_fd, 0, SEEK_CUR);
> +#endif

I'm still little puzzled why we need to do this,
when the aio write takes the offset value, but
please provide some aio_.. function for this


>  	for (i = 0; i < evlist->nr_mmaps; i++) {
>  		struct perf_mmap *map = &maps[i];
>  
>  		if (map->base) {
> -			if (perf_mmap__push(map, rec, record__pushfn) != 0) {
> -				rc = -1;
> -				goto out;
> +#ifdef HAVE_AIO_SUPPORT
> +			if (!rec->opts.nr_cblocks) {
> +#endif
> +				if (perf_mmap__push(map, rec, record__pushfn) != 0) {
> +					rc = -1;
> +					goto out;
> +				}
> +#ifdef HAVE_AIO_SUPPORT
> +			} else {
> +				/*
> +				 * Call record__aio_sync() to wait till map->data buffer
> +				 * becomes available after previous aio write request.
> +				 */
> +				record__aio_sync(map);
> +				if (perf_mmap__aio_push(map, rec, record__aio_pushfn, &off) != 0) {
> +					lseek(trace_fd, off, SEEK_SET);
> +					rc = -1;
> +					goto out;
> +				}
>  			}
> +#endif
>  		}
>  
>  		if (map->auxtrace_mmap.base && !rec->opts.auxtrace_snapshot_mode &&
> @@ -547,7 +694,9 @@ static int record__mmap_read_evlist(struct record *rec, struct perf_evlist *evli
>  			goto out;
>  		}
>  	}
> -
> +#ifdef HAVE_AIO_SUPPORT
> +	lseek(trace_fd, off, SEEK_SET);
> +#endif

ditto

thanks,
jirka

^ permalink raw reply	[flat|nested] 33+ messages in thread

* Re: [PATCH v11 3/3]: perf record: extend trace writing to multi AIO
  2018-10-08  6:19 ` [PATCH v11 3/3]: perf record: extend trace writing to multi AIO Alexey Budankov
@ 2018-10-08 10:55   ` Jiri Olsa
  2018-10-08 11:47     ` Alexey Budankov
  0 siblings, 1 reply; 33+ messages in thread
From: Jiri Olsa @ 2018-10-08 10:55 UTC (permalink / raw)
  To: Alexey Budankov
  Cc: Peter Zijlstra, Ingo Molnar, Arnaldo Carvalho de Melo,
	Alexander Shishkin, Namhyung Kim, Andi Kleen, linux-kernel

On Mon, Oct 08, 2018 at 09:19:17AM +0300, Alexey Budankov wrote:

SNIP

>  #ifdef HAVE_AIO_SUPPORT
>  			} else {
> +				int idx;
>  				/*
>  				 * Call record__aio_sync() to wait till map->data buffer
>  				 * becomes available after previous aio write request.
>  				 */
> -				record__aio_sync(map);
> -				if (perf_mmap__aio_push(map, rec, record__aio_pushfn, &off) != 0) {
> +				idx = record__aio_sync(map, false);
> +				if (perf_mmap__aio_push(map, rec, idx, record__aio_pushfn, &off) != 0) {
>  					lseek(trace_fd, off, SEEK_SET);
>  					rc = -1;
>  					goto out;
> @@ -1446,6 +1469,10 @@ static int perf_record_config(const char *var, const char *value, void *cb)
>  		var = "call-graph.record-mode";
>  		return perf_default_config(var, value, cb);
>  	}
> +#ifdef HAVE_AIO_SUPPORT
> +	if (!strcmp(var, "record.aio-cblocks"))
> +		rec->opts.nr_cblocks = strtol(value, NULL, 0);
> +#endif
>  
>  	return 0;
>  }
> @@ -1837,6 +1864,10 @@ static struct option __record_options[] = {
>  			  "signal"),
>  	OPT_BOOLEAN(0, "dry-run", &dry_run,
>  		    "Parse options then exit"),
> +#ifdef HAVE_AIO_SUPPORT
> +	OPT_INTEGER(0, "aio-cblocks", &record.opts.nr_cblocks,
> +		    "Max number of simultaneous per-mmap trace writes (default: 0 - serial, max: 4)"),
> +#endif

could you please move the option to enable that to the previou patch?
so we could test the simple variant as well

also I think it'd be better if we have simple '--aio' option that would
enable this with some default values..  and add --aio-cblocks to configure
that further

thanks,
jirka

^ permalink raw reply	[flat|nested] 33+ messages in thread

* Re: [PATCH v11 2/3]: perf record: enable asynchronous trace writing
  2018-10-08  6:17 ` [PATCH v11 2/3]: perf record: enable asynchronous trace writing Alexey Budankov
                     ` (2 preceding siblings ...)
  2018-10-08 10:52   ` Jiri Olsa
@ 2018-10-08 10:57   ` Jiri Olsa
  2018-10-08 12:15     ` Alexey Budankov
  2018-10-08 10:58   ` Jiri Olsa
  4 siblings, 1 reply; 33+ messages in thread
From: Jiri Olsa @ 2018-10-08 10:57 UTC (permalink / raw)
  To: Alexey Budankov
  Cc: Peter Zijlstra, Ingo Molnar, Arnaldo Carvalho de Melo,
	Alexander Shishkin, Namhyung Kim, Andi Kleen, linux-kernel

On Mon, Oct 08, 2018 at 09:17:11AM +0300, Alexey Budankov wrote:

SNIP

>  static int record__mmap_read_evlist(struct record *rec, struct perf_evlist *evlist,
>  				    bool overwrite)
>  {
> @@ -520,7 +644,10 @@ static int record__mmap_read_evlist(struct record *rec, struct perf_evlist *evli
>  	int i;
>  	int rc = 0;
>  	struct perf_mmap *maps;
> -
> +#ifdef HAVE_AIO_SUPPORT
> +	int trace_fd = rec->data.file.fd;
> +	off_t off;
> +#endif
>  	if (!evlist)
>  		return 0;
>  
> @@ -531,14 +658,34 @@ static int record__mmap_read_evlist(struct record *rec, struct perf_evlist *evli
>  	if (overwrite && evlist->bkw_mmap_state != BKW_MMAP_DATA_PENDING)
>  		return 0;
>  
> +#ifdef HAVE_AIO_SUPPORT
> +	off = lseek(trace_fd, 0, SEEK_CUR);
> +#endif
>  	for (i = 0; i < evlist->nr_mmaps; i++) {
>  		struct perf_mmap *map = &maps[i];
>  
>  		if (map->base) {
> -			if (perf_mmap__push(map, rec, record__pushfn) != 0) {
> -				rc = -1;
> -				goto out;
> +#ifdef HAVE_AIO_SUPPORT
> +			if (!rec->opts.nr_cblocks) {
> +#endif
> +				if (perf_mmap__push(map, rec, record__pushfn) != 0) {
> +					rc = -1;
> +					goto out;
> +				}
> +#ifdef HAVE_AIO_SUPPORT
> +			} else {
> +				/*
> +				 * Call record__aio_sync() to wait till map->data buffer
> +				 * becomes available after previous aio write request.
> +				 */
> +				record__aio_sync(map);
> +				if (perf_mmap__aio_push(map, rec, record__aio_pushfn, &off) != 0) {
> +					lseek(trace_fd, off, SEEK_SET);
> +					rc = -1;
> +					goto out;
> +				}
>  			}
> +#endif

I really don't like this.. let's have some aio_... function with its
dummy counterpart to be called in here.. and again.. if we realyl must
have this compiled optionaly, which I'm not convinced of

jirka

^ permalink raw reply	[flat|nested] 33+ messages in thread

* Re: [PATCH v11 2/3]: perf record: enable asynchronous trace writing
  2018-10-08  6:17 ` [PATCH v11 2/3]: perf record: enable asynchronous trace writing Alexey Budankov
                     ` (3 preceding siblings ...)
  2018-10-08 10:57   ` Jiri Olsa
@ 2018-10-08 10:58   ` Jiri Olsa
  2018-10-08 12:26     ` Alexey Budankov
  4 siblings, 1 reply; 33+ messages in thread
From: Jiri Olsa @ 2018-10-08 10:58 UTC (permalink / raw)
  To: Alexey Budankov
  Cc: Peter Zijlstra, Ingo Molnar, Arnaldo Carvalho de Melo,
	Alexander Shishkin, Namhyung Kim, Andi Kleen, linux-kernel

On Mon, Oct 08, 2018 at 09:17:11AM +0300, Alexey Budankov wrote:

SNIP

>  static int record__mmap_read_evlist(struct record *rec, struct perf_evlist *evlist,
>  				    bool overwrite)
>  {
> @@ -520,7 +644,10 @@ static int record__mmap_read_evlist(struct record *rec, struct perf_evlist *evli
>  	int i;
>  	int rc = 0;
>  	struct perf_mmap *maps;
> -
> +#ifdef HAVE_AIO_SUPPORT
> +	int trace_fd = rec->data.file.fd;
> +	off_t off;
> +#endif
>  	if (!evlist)
>  		return 0;
>  
> @@ -531,14 +658,34 @@ static int record__mmap_read_evlist(struct record *rec, struct perf_evlist *evli
>  	if (overwrite && evlist->bkw_mmap_state != BKW_MMAP_DATA_PENDING)
>  		return 0;
>  
> +#ifdef HAVE_AIO_SUPPORT
> +	off = lseek(trace_fd, 0, SEEK_CUR);
> +#endif
>  	for (i = 0; i < evlist->nr_mmaps; i++) {
>  		struct perf_mmap *map = &maps[i];
>  
>  		if (map->base) {
> -			if (perf_mmap__push(map, rec, record__pushfn) != 0) {
> -				rc = -1;
> -				goto out;
> +#ifdef HAVE_AIO_SUPPORT
> +			if (!rec->opts.nr_cblocks) {
> +#endif

maybe it'd be less confusing having something like 
  rec->opts.aio.enabled

or aio__is_enabled() global with the dummy counterpart

jirka

^ permalink raw reply	[flat|nested] 33+ messages in thread

* Re: [PATCH v11 3/3]: perf record: extend trace writing to multi AIO
  2018-10-08 10:55   ` Jiri Olsa
@ 2018-10-08 11:47     ` Alexey Budankov
  2018-10-08 12:45       ` Jiri Olsa
  0 siblings, 1 reply; 33+ messages in thread
From: Alexey Budankov @ 2018-10-08 11:47 UTC (permalink / raw)
  To: Jiri Olsa
  Cc: Peter Zijlstra, Ingo Molnar, Arnaldo Carvalho de Melo,
	Alexander Shishkin, Namhyung Kim, Andi Kleen, linux-kernel

Hi,

On 08.10.2018 13:55, Jiri Olsa wrote:
> On Mon, Oct 08, 2018 at 09:19:17AM +0300, Alexey Budankov wrote:
<SNIP>
>> +#ifdef HAVE_AIO_SUPPORT
>> +	if (!strcmp(var, "record.aio-cblocks"))
>> +		rec->opts.nr_cblocks = strtol(value, NULL, 0);
>> +#endif
>>  
>>  	return 0;
>>  }
>> @@ -1837,6 +1864,10 @@ static struct option __record_options[] = {
>>  			  "signal"),
>>  	OPT_BOOLEAN(0, "dry-run", &dry_run,
>>  		    "Parse options then exit"),
>> +#ifdef HAVE_AIO_SUPPORT
>> +	OPT_INTEGER(0, "aio-cblocks", &record.opts.nr_cblocks,
>> +		    "Max number of simultaneous per-mmap trace writes (default: 0 - serial, max: 4)"),
>> +#endif
> 
> could you please move the option to enable that to the previou patch?
> so we could test the simple variant as well

Ok.

> 
> also I think it'd be better if we have simple '--aio' option that would
> enable this with some default values..  and add --aio-cblocks to configure
> that further

Well, absence of --aio option, on the command line or in .perfconfig, or --aio=0 
means serial writing.

From this perspective perf record --aio --aio-cblocks=N looks like a complication.

Thanks,
Alexey

^ permalink raw reply	[flat|nested] 33+ messages in thread

* Re: [PATCH v11 1/3]: perf util: map data buffer for preserving collected data
  2018-10-08 10:50   ` Jiri Olsa
@ 2018-10-08 11:55     ` Alexey Budankov
  2018-10-08 12:37       ` Jiri Olsa
  0 siblings, 1 reply; 33+ messages in thread
From: Alexey Budankov @ 2018-10-08 11:55 UTC (permalink / raw)
  To: Jiri Olsa
  Cc: Peter Zijlstra, Ingo Molnar, Arnaldo Carvalho de Melo,
	Alexander Shishkin, Namhyung Kim, Andi Kleen, linux-kernel

Hi,

On 08.10.2018 13:50, Jiri Olsa wrote:
> On Mon, Oct 08, 2018 at 09:14:29AM +0300, Alexey Budankov wrote:
>>
<SNIP>
>> diff --git a/tools/perf/Makefile.config b/tools/perf/Makefile.config
>> index f6d1a03c7523..2e90f4ce9214 100644
>> --- a/tools/perf/Makefile.config
>> +++ b/tools/perf/Makefile.config
>> @@ -355,6 +355,11 @@ endif # NO_LIBELF
>>  
>>  ifeq ($(feature-glibc), 1)
>>    CFLAGS += -DHAVE_GLIBC_SUPPORT
>> +  ifndef NO_AIO
> 
> hum, do we need NO_AIO? we have the --aio option to enable that right?

Right. Enable that *in runtime*.

> I guess BIONIC does not support aio, but but will it fail when it's
> compiled in there?

Please see updated section of the cover letter for more information 
regarding this. Possible compilation issues is that's why we better 
have this capability in advance:
  
  objdump -T tools/perf/perf | grep aio

  0000000000000000      DF *UND*	0000000000000000  GLIBC_2.2.5 aio_suspend64
  0000000000000000      DF *UND*	0000000000000000  GLIBC_2.2.5 aio_return64
  0000000000000000      DF *UND*	0000000000000000  GLIBC_2.2.5 aio_error64
  0000000000000000      DF *UND*	0000000000000000  GLIBC_2.2.5 aio_write64

IMHO, it is worth implementing NO_AIO define.

Thanks,
Alexey

> 
> jirka
> 
>> +    ifndef BIONIC
>> +        CFLAGS += -DHAVE_AIO_SUPPORT
>> +    endif
>> +  endif
>>  endif
> 
> SNIP
> 

^ permalink raw reply	[flat|nested] 33+ messages in thread

* Re: [PATCH v11 1/3]: perf util: map data buffer for preserving collected data
  2018-10-08 10:50   ` Jiri Olsa
@ 2018-10-08 12:03     ` Alexey Budankov
  2018-10-08 12:38       ` Jiri Olsa
  0 siblings, 1 reply; 33+ messages in thread
From: Alexey Budankov @ 2018-10-08 12:03 UTC (permalink / raw)
  To: Jiri Olsa
  Cc: Peter Zijlstra, Ingo Molnar, Arnaldo Carvalho de Melo,
	Alexander Shishkin, Namhyung Kim, Andi Kleen, linux-kernel

Hi,

On 08.10.2018 13:50, Jiri Olsa wrote:
> On Mon, Oct 08, 2018 at 09:14:29AM +0300, Alexey Budankov wrote:
<SNIP>
>>  
>> +#ifdef HAVE_AIO_SUPPORT
>> +static void perf_mmap__aio_munmap(struct perf_mmap *map)
>> +{
>> +	if (map->data)
>> +		zfree(&map->data);
>> +}
> 
> if we really need to keep this optional for compilation,
> please make it as single block with dummy functions
> for when it's not compiled in, like:
> 
> #ifdef 
> static void perf_mmap__aio_munmap(struct perf_mmap *map)
> {
> 	if (map->data)
> 		zfree(&map->data);
> }
> #else
> static void perf_mmap__aio_munmap(struct perf_mmap *map) { }
> #endif

Ok.

> 
> thanks,
> jirka
> 
> 
>> +#endif
>> +
>>  void perf_mmap__munmap(struct perf_mmap *map)
>>  {
>> +#ifdef HAVE_AIO_SUPPORT
>> +	perf_mmap__aio_munmap(map);
>> +#endif
>>  	if (map->base != NULL) {
>>  		munmap(map->base, perf_mmap__mmap_len(map));
>>  		map->base = NULL;
>> @@ -164,8 +175,40 @@ void perf_mmap__munmap(struct perf_mmap *map)
>>  	auxtrace_mmap__munmap(&map->auxtrace_mmap);
>>  }

Well, let's go with stub functions design as opposite to pure 
conditional compilation. However this may, probably, result in 
unintended Perf tool binary size growth, even if NO_AIO is 
defined during compilation.

Thanks,
Alexey

^ permalink raw reply	[flat|nested] 33+ messages in thread

* Re: [PATCH v11 1/3]: perf util: map data buffer for preserving collected data
  2018-10-08 10:51   ` Jiri Olsa
@ 2018-10-08 12:05     ` Alexey Budankov
  0 siblings, 0 replies; 33+ messages in thread
From: Alexey Budankov @ 2018-10-08 12:05 UTC (permalink / raw)
  To: Jiri Olsa
  Cc: Peter Zijlstra, Ingo Molnar, Arnaldo Carvalho de Melo,
	Alexander Shishkin, Namhyung Kim, Andi Kleen, linux-kernel

Hi,

On 08.10.2018 13:51, Jiri Olsa wrote:
> On Mon, Oct 08, 2018 at 09:14:29AM +0300, Alexey Budankov wrote:
<SNIP>
> 
> could you please separate those in struct, maybe anonymous like:
> 
> 	...
> 	struct {
> 		void 		 *data;
> 		struct aiocb	 cblock;
> 		int		 nr_cblocks;
> 	} aio;
> 	...
> 

Accepted.

> 
>>  };
>>  
>>  /*
>> @@ -59,6 +67,9 @@ enum bkw_mmap_state {
>>  struct mmap_params {
>>  	int			    prot, mask;
>>  	struct auxtrace_mmap_params auxtrace_mp;
>> +#ifdef HAVE_AIO_SUPPORT
>> +	int			    nr_cblocks;
>> +#endif
> 
> I dont think we need the ifdef here.. IMO one extra int is not
> worth of the ifdef endif confusion

Accepted.

Thanks,
Alexey

> 
> jirka
> 

^ permalink raw reply	[flat|nested] 33+ messages in thread

* Re: [PATCH v11 2/3]: perf record: enable asynchronous trace writing
  2018-10-08 10:57   ` Jiri Olsa
@ 2018-10-08 12:15     ` Alexey Budankov
  0 siblings, 0 replies; 33+ messages in thread
From: Alexey Budankov @ 2018-10-08 12:15 UTC (permalink / raw)
  To: Jiri Olsa
  Cc: Peter Zijlstra, Ingo Molnar, Arnaldo Carvalho de Melo,
	Alexander Shishkin, Namhyung Kim, Andi Kleen, linux-kernel

Hi,

On 08.10.2018 13:57, Jiri Olsa wrote:
> On Mon, Oct 08, 2018 at 09:17:11AM +0300, Alexey Budankov wrote:
> 
> SNIP
> 
>>  static int record__mmap_read_evlist(struct record *rec, struct perf_evlist *evlist,
>>  				    bool overwrite)
>>  {
>> @@ -520,7 +644,10 @@ static int record__mmap_read_evlist(struct record *rec, struct perf_evlist *evli
>>  	int i;
>>  	int rc = 0;
>>  	struct perf_mmap *maps;
>> -
>> +#ifdef HAVE_AIO_SUPPORT
>> +	int trace_fd = rec->data.file.fd;
>> +	off_t off;
>> +#endif
>>  	if (!evlist)
>>  		return 0;
>>  
>> @@ -531,14 +658,34 @@ static int record__mmap_read_evlist(struct record *rec, struct perf_evlist *evli
>>  	if (overwrite && evlist->bkw_mmap_state != BKW_MMAP_DATA_PENDING)
>>  		return 0;
>>  
>> +#ifdef HAVE_AIO_SUPPORT
>> +	off = lseek(trace_fd, 0, SEEK_CUR);
>> +#endif
>>  	for (i = 0; i < evlist->nr_mmaps; i++) {
>>  		struct perf_mmap *map = &maps[i];
>>  
>>  		if (map->base) {
>> -			if (perf_mmap__push(map, rec, record__pushfn) != 0) {
>> -				rc = -1;
>> -				goto out;
>> +#ifdef HAVE_AIO_SUPPORT
>> +			if (!rec->opts.nr_cblocks) {
>> +#endif
>> +				if (perf_mmap__push(map, rec, record__pushfn) != 0) {
>> +					rc = -1;
>> +					goto out;
>> +				}
>> +#ifdef HAVE_AIO_SUPPORT
>> +			} else {
>> +				/*
>> +				 * Call record__aio_sync() to wait till map->data buffer
>> +				 * becomes available after previous aio write request.
>> +				 */
>> +				record__aio_sync(map);
>> +				if (perf_mmap__aio_push(map, rec, record__aio_pushfn, &off) != 0) {
>> +					lseek(trace_fd, off, SEEK_SET);
>> +					rc = -1;
>> +					goto out;
>> +				}
>>  			}
>> +#endif
> 
> I really don't like this.. let's have some aio_... function with its
> dummy counterpart to be called in here.. and again.. if we realyl must
> have this compiled optionaly, which I'm not convinced of

Accepted.

Thanks,
Alexey

> 
> jirka
> 

^ permalink raw reply	[flat|nested] 33+ messages in thread

* Re: [PATCH v11 2/3]: perf record: enable asynchronous trace writing
  2018-10-08 10:50   ` Jiri Olsa
@ 2018-10-08 12:17     ` Alexey Budankov
  0 siblings, 0 replies; 33+ messages in thread
From: Alexey Budankov @ 2018-10-08 12:17 UTC (permalink / raw)
  To: Jiri Olsa
  Cc: Peter Zijlstra, Ingo Molnar, Arnaldo Carvalho de Melo,
	Alexander Shishkin, Namhyung Kim, Andi Kleen, linux-kernel

Hi,

On 08.10.2018 13:50, Jiri Olsa wrote:
> On Mon, Oct 08, 2018 at 09:17:11AM +0300, Alexey Budankov wrote:
> 
> SNIP
> 
>> -
>> +#ifdef HAVE_AIO_SUPPORT
>> +	lseek(trace_fd, off, SEEK_SET);
>> +#endif
>>  	/*
>>  	 * Mark the round finished in case we wrote
>>  	 * at least one event.
>> @@ -650,6 +799,9 @@ record__switch_output(struct record *rec, bool at_exit)
>>  	/* Same Size:      "2015122520103046"*/
>>  	char timestamp[] = "InvalidTimestamp";
>>  
>> +#ifdef HAVE_AIO_SUPPORT
>> +	record__mmap_read_sync(rec);
>> +#endif
> 
> same, please define the record__mmap_read_sync dummy
> so we can skip that #ifdef #endif

Accepted.

> 
>>  	record__synthesize(rec, true);
>>  	if (target__none(&rec->opts.target))
>>  		record__synthesize_workload(rec, true);
>> @@ -1157,6 +1309,10 @@ static int __cmd_record(struct record *rec, int argc, const char **argv)
>>  		record__synthesize_workload(rec, true);
>>  
>>  out_child:
>> +
>> +#ifdef HAVE_AIO_SUPPORT
>> +	record__mmap_read_sync(rec);
>> +#endif
> 
> ditto

Accepted.

Thanks,
Alexey

> 
> thanks,
> jirka
> 
> SNIP
> 

^ permalink raw reply	[flat|nested] 33+ messages in thread

* Re: [PATCH v11 2/3]: perf record: enable asynchronous trace writing
  2018-10-08 10:51   ` Jiri Olsa
@ 2018-10-08 12:19     ` Alexey Budankov
  0 siblings, 0 replies; 33+ messages in thread
From: Alexey Budankov @ 2018-10-08 12:19 UTC (permalink / raw)
  To: Jiri Olsa
  Cc: Peter Zijlstra, Ingo Molnar, Arnaldo Carvalho de Melo,
	Alexander Shishkin, Namhyung Kim, Andi Kleen, linux-kernel

Hi,

On 08.10.2018 13:51, Jiri Olsa wrote:
> On Mon, Oct 08, 2018 at 09:17:11AM +0300, Alexey Budankov wrote:
> 
> SNIP
> 
>>  struct option;
>> diff --git a/tools/perf/util/mmap.c b/tools/perf/util/mmap.c
>> index db8f16f8a363..ecaa5b5eb3ed 100644
>> --- a/tools/perf/util/mmap.c
>> +++ b/tools/perf/util/mmap.c
>> @@ -367,6 +367,82 @@ int perf_mmap__push(struct perf_mmap *md, void *to,
>>  	return rc;
>>  }
>>  
>> +#ifdef HAVE_AIO_SUPPORT
>> +int perf_mmap__aio_push(struct perf_mmap *md, void *to,
>> +			int push(void *to, struct aiocb *cblock, void *buf, size_t size, off_t off),
>> +			off_t *off)
>> +{
> 
> seems like this could be defined static within builtin-record object,
> is there a reason why it's in here?

The reason is analogy with perf_mmap__push() above.

Thanks,
Alexey
 
> 
> thanks,
> jirka
> 

^ permalink raw reply	[flat|nested] 33+ messages in thread

* Re: [PATCH v11 2/3]: perf record: enable asynchronous trace writing
  2018-10-08 10:52   ` Jiri Olsa
@ 2018-10-08 12:24     ` Alexey Budankov
  2018-10-08 12:53       ` Jiri Olsa
  0 siblings, 1 reply; 33+ messages in thread
From: Alexey Budankov @ 2018-10-08 12:24 UTC (permalink / raw)
  To: Jiri Olsa
  Cc: Peter Zijlstra, Ingo Molnar, Arnaldo Carvalho de Melo,
	Alexander Shishkin, Namhyung Kim, Andi Kleen, linux-kernel

Hi,

On 08.10.2018 13:52, Jiri Olsa wrote:
> On Mon, Oct 08, 2018 at 09:17:11AM +0300, Alexey Budankov wrote:
<SNIP>
>> +#ifdef HAVE_AIO_SUPPORT
>> +	off = lseek(trace_fd, 0, SEEK_CUR);
>> +#endif
> 
> I'm still little puzzled why we need to do this,
> when the aio write takes the offset value, but

pwrite() syscall [1] which is the base for aio_write() doesn't 
advance file pos value so it requires to be calculated and 
updated by callers of aio_write() API.

> please provide some aio_.. function for this

Accepted.

> 
> 
>>  	for (i = 0; i < evlist->nr_mmaps; i++) {
>>  		struct perf_mmap *map = &maps[i];
>>  
>>  		if (map->base) {
>> -			if (perf_mmap__push(map, rec, record__pushfn) != 0) {
>> -				rc = -1;
>> -				goto out;
>> +#ifdef HAVE_AIO_SUPPORT
>> +			if (!rec->opts.nr_cblocks) {
>> +#endif
>> +				if (perf_mmap__push(map, rec, record__pushfn) != 0) {
>> +					rc = -1;
>> +					goto out;
>> +				}
>> +#ifdef HAVE_AIO_SUPPORT
>> +			} else {
>> +				/*
>> +				 * Call record__aio_sync() to wait till map->data buffer
>> +				 * becomes available after previous aio write request.
>> +				 */
>> +				record__aio_sync(map);
>> +				if (perf_mmap__aio_push(map, rec, record__aio_pushfn, &off) != 0) {
>> +					lseek(trace_fd, off, SEEK_SET);
>> +					rc = -1;
>> +					goto out;
>> +				}
>>  			}
>> +#endif
>>  		}
>>  
>>  		if (map->auxtrace_mmap.base && !rec->opts.auxtrace_snapshot_mode &&
>> @@ -547,7 +694,9 @@ static int record__mmap_read_evlist(struct record *rec, struct perf_evlist *evli
>>  			goto out;
>>  		}
>>  	}
>> -
>> +#ifdef HAVE_AIO_SUPPORT
>> +	lseek(trace_fd, off, SEEK_SET);
>> +#endif
> 
> ditto

Accepted.

Thanks,
Alexey

> 
> thanks,
> jirka
> 

[1] http://man7.org/linux/man-pages/man2/pread.2.html



^ permalink raw reply	[flat|nested] 33+ messages in thread

* Re: [PATCH v11 2/3]: perf record: enable asynchronous trace writing
  2018-10-08 10:58   ` Jiri Olsa
@ 2018-10-08 12:26     ` Alexey Budankov
  0 siblings, 0 replies; 33+ messages in thread
From: Alexey Budankov @ 2018-10-08 12:26 UTC (permalink / raw)
  To: Jiri Olsa
  Cc: Peter Zijlstra, Ingo Molnar, Arnaldo Carvalho de Melo,
	Alexander Shishkin, Namhyung Kim, Andi Kleen, linux-kernel

Hi,

On 08.10.2018 13:58, Jiri Olsa wrote:
> On Mon, Oct 08, 2018 at 09:17:11AM +0300, Alexey Budankov wrote:
>>  
<SNIP>
>> +#ifdef HAVE_AIO_SUPPORT
>> +	off = lseek(trace_fd, 0, SEEK_CUR);
>> +#endif
>>  	for (i = 0; i < evlist->nr_mmaps; i++) {
>>  		struct perf_mmap *map = &maps[i];
>>  
>>  		if (map->base) {
>> -			if (perf_mmap__push(map, rec, record__pushfn) != 0) {
>> -				rc = -1;
>> -				goto out;
>> +#ifdef HAVE_AIO_SUPPORT
>> +			if (!rec->opts.nr_cblocks) {
>> +#endif
> 
> maybe it'd be less confusing having something like 
>   rec->opts.aio.enabled
> 
> or aio__is_enabled() global with the dummy counterpart

Makes sense. Will implement __is_enabled() function.

Thanks,
Alexey

> 
> jirka
> 

^ permalink raw reply	[flat|nested] 33+ messages in thread

* Re: [PATCH v11 1/3]: perf util: map data buffer for preserving collected data
  2018-10-08 11:55     ` Alexey Budankov
@ 2018-10-08 12:37       ` Jiri Olsa
  0 siblings, 0 replies; 33+ messages in thread
From: Jiri Olsa @ 2018-10-08 12:37 UTC (permalink / raw)
  To: Alexey Budankov
  Cc: Peter Zijlstra, Ingo Molnar, Arnaldo Carvalho de Melo,
	Alexander Shishkin, Namhyung Kim, Andi Kleen, linux-kernel

On Mon, Oct 08, 2018 at 02:55:13PM +0300, Alexey Budankov wrote:
> Hi,
> 
> On 08.10.2018 13:50, Jiri Olsa wrote:
> > On Mon, Oct 08, 2018 at 09:14:29AM +0300, Alexey Budankov wrote:
> >>
> <SNIP>
> >> diff --git a/tools/perf/Makefile.config b/tools/perf/Makefile.config
> >> index f6d1a03c7523..2e90f4ce9214 100644
> >> --- a/tools/perf/Makefile.config
> >> +++ b/tools/perf/Makefile.config
> >> @@ -355,6 +355,11 @@ endif # NO_LIBELF
> >>  
> >>  ifeq ($(feature-glibc), 1)
> >>    CFLAGS += -DHAVE_GLIBC_SUPPORT
> >> +  ifndef NO_AIO
> > 
> > hum, do we need NO_AIO? we have the --aio option to enable that right?
> 
> Right. Enable that *in runtime*.
> 
> > I guess BIONIC does not support aio, but but will it fail when it's
> > compiled in there?
> 
> Please see updated section of the cover letter for more information 
> regarding this. Possible compilation issues is that's why we better 
> have this capability in advance:
>   
>   objdump -T tools/perf/perf | grep aio
> 
>   0000000000000000      DF *UND*	0000000000000000  GLIBC_2.2.5 aio_suspend64
>   0000000000000000      DF *UND*	0000000000000000  GLIBC_2.2.5 aio_return64
>   0000000000000000      DF *UND*	0000000000000000  GLIBC_2.2.5 aio_error64
>   0000000000000000      DF *UND*	0000000000000000  GLIBC_2.2.5 aio_write64
> 
> IMHO, it is worth implementing NO_AIO define.

ok, if there are c libs that won't compile with that,
then sure, we need it

thanks,
jirka

^ permalink raw reply	[flat|nested] 33+ messages in thread

* Re: [PATCH v11 1/3]: perf util: map data buffer for preserving collected data
  2018-10-08 12:03     ` Alexey Budankov
@ 2018-10-08 12:38       ` Jiri Olsa
  2018-10-08 14:43         ` Alexey Budankov
  0 siblings, 1 reply; 33+ messages in thread
From: Jiri Olsa @ 2018-10-08 12:38 UTC (permalink / raw)
  To: Alexey Budankov
  Cc: Peter Zijlstra, Ingo Molnar, Arnaldo Carvalho de Melo,
	Alexander Shishkin, Namhyung Kim, Andi Kleen, linux-kernel

On Mon, Oct 08, 2018 at 03:03:18PM +0300, Alexey Budankov wrote:
> Hi,
> 
> On 08.10.2018 13:50, Jiri Olsa wrote:
> > On Mon, Oct 08, 2018 at 09:14:29AM +0300, Alexey Budankov wrote:
> <SNIP>
> >>  
> >> +#ifdef HAVE_AIO_SUPPORT
> >> +static void perf_mmap__aio_munmap(struct perf_mmap *map)
> >> +{
> >> +	if (map->data)
> >> +		zfree(&map->data);
> >> +}
> > 
> > if we really need to keep this optional for compilation,
> > please make it as single block with dummy functions
> > for when it's not compiled in, like:
> > 
> > #ifdef 
> > static void perf_mmap__aio_munmap(struct perf_mmap *map)
> > {
> > 	if (map->data)
> > 		zfree(&map->data);
> > }
> > #else
> > static void perf_mmap__aio_munmap(struct perf_mmap *map) { }
> > #endif
> 
> Ok.
> 
> > 
> > thanks,
> > jirka
> > 
> > 
> >> +#endif
> >> +
> >>  void perf_mmap__munmap(struct perf_mmap *map)
> >>  {
> >> +#ifdef HAVE_AIO_SUPPORT
> >> +	perf_mmap__aio_munmap(map);
> >> +#endif
> >>  	if (map->base != NULL) {
> >>  		munmap(map->base, perf_mmap__mmap_len(map));
> >>  		map->base = NULL;
> >> @@ -164,8 +175,40 @@ void perf_mmap__munmap(struct perf_mmap *map)
> >>  	auxtrace_mmap__munmap(&map->auxtrace_mmap);
> >>  }
> 
> Well, let's go with stub functions design as opposite to pure 
> conditional compilation. However this may, probably, result in 
> unintended Perf tool binary size growth, even if NO_AIO is 
> defined during compilation.

hum, empty functions are be optimized out

jirka

^ permalink raw reply	[flat|nested] 33+ messages in thread

* Re: [PATCH v11 3/3]: perf record: extend trace writing to multi AIO
  2018-10-08 11:47     ` Alexey Budankov
@ 2018-10-08 12:45       ` Jiri Olsa
  2018-10-08 15:21         ` Alexey Budankov
  0 siblings, 1 reply; 33+ messages in thread
From: Jiri Olsa @ 2018-10-08 12:45 UTC (permalink / raw)
  To: Alexey Budankov
  Cc: Peter Zijlstra, Ingo Molnar, Arnaldo Carvalho de Melo,
	Alexander Shishkin, Namhyung Kim, Andi Kleen, linux-kernel

On Mon, Oct 08, 2018 at 02:47:13PM +0300, Alexey Budankov wrote:
> Hi,
> 
> On 08.10.2018 13:55, Jiri Olsa wrote:
> > On Mon, Oct 08, 2018 at 09:19:17AM +0300, Alexey Budankov wrote:
> <SNIP>
> >> +#ifdef HAVE_AIO_SUPPORT
> >> +	if (!strcmp(var, "record.aio-cblocks"))
> >> +		rec->opts.nr_cblocks = strtol(value, NULL, 0);
> >> +#endif
> >>  
> >>  	return 0;
> >>  }
> >> @@ -1837,6 +1864,10 @@ static struct option __record_options[] = {
> >>  			  "signal"),
> >>  	OPT_BOOLEAN(0, "dry-run", &dry_run,
> >>  		    "Parse options then exit"),
> >> +#ifdef HAVE_AIO_SUPPORT
> >> +	OPT_INTEGER(0, "aio-cblocks", &record.opts.nr_cblocks,
> >> +		    "Max number of simultaneous per-mmap trace writes (default: 0 - serial, max: 4)"),
> >> +#endif
> > 
> > could you please move the option to enable that to the previou patch?
> > so we could test the simple variant as well
> 
> Ok.
> 
> > 
> > also I think it'd be better if we have simple '--aio' option that would
> > enable this with some default values..  and add --aio-cblocks to configure
> > that further
> 
> Well, absence of --aio option, on the command line or in .perfconfig, or --aio=0 
> means serial writing.
> 
> From this perspective perf record --aio --aio-cblocks=N looks like a complication.

'perf record --aio-cblocks=N' could imply '--aio',

I'd like to have simple/intuitive way to enable that,
without studing the meaning of the cblocks argument

  # perf record --aio ...

means that I'm storing data via async writes,
and using some reasonable default for cblocks

and later on we'd add 'perf record --threads'
allowing threads based writers ;-)

jirka

^ permalink raw reply	[flat|nested] 33+ messages in thread

* Re: [PATCH v11 2/3]: perf record: enable asynchronous trace writing
  2018-10-08 12:24     ` Alexey Budankov
@ 2018-10-08 12:53       ` Jiri Olsa
  2018-10-08 14:42         ` Alexey Budankov
  0 siblings, 1 reply; 33+ messages in thread
From: Jiri Olsa @ 2018-10-08 12:53 UTC (permalink / raw)
  To: Alexey Budankov
  Cc: Peter Zijlstra, Ingo Molnar, Arnaldo Carvalho de Melo,
	Alexander Shishkin, Namhyung Kim, Andi Kleen, linux-kernel

On Mon, Oct 08, 2018 at 03:24:31PM +0300, Alexey Budankov wrote:
> Hi,
> 
> On 08.10.2018 13:52, Jiri Olsa wrote:
> > On Mon, Oct 08, 2018 at 09:17:11AM +0300, Alexey Budankov wrote:
> <SNIP>
> >> +#ifdef HAVE_AIO_SUPPORT
> >> +	off = lseek(trace_fd, 0, SEEK_CUR);
> >> +#endif
> > 
> > I'm still little puzzled why we need to do this,
> > when the aio write takes the offset value, but
> 
> pwrite() syscall [1] which is the base for aio_write() doesn't 
> advance file pos value so it requires to be calculated and 
> updated by callers of aio_write() API.

ok, so aio_write does not need the offset to be updated,
who needs it then?

jirka

^ permalink raw reply	[flat|nested] 33+ messages in thread

* Re: [PATCH v11 2/3]: perf record: enable asynchronous trace writing
  2018-10-08 12:53       ` Jiri Olsa
@ 2018-10-08 14:42         ` Alexey Budankov
  2018-10-08 15:11           ` Jiri Olsa
  0 siblings, 1 reply; 33+ messages in thread
From: Alexey Budankov @ 2018-10-08 14:42 UTC (permalink / raw)
  To: Jiri Olsa
  Cc: Peter Zijlstra, Ingo Molnar, Arnaldo Carvalho de Melo,
	Alexander Shishkin, Namhyung Kim, Andi Kleen, linux-kernel

Hi,
On 08.10.2018 15:53, Jiri Olsa wrote:
> On Mon, Oct 08, 2018 at 03:24:31PM +0300, Alexey Budankov wrote:
>> Hi,
>>
>> On 08.10.2018 13:52, Jiri Olsa wrote:
>>> On Mon, Oct 08, 2018 at 09:17:11AM +0300, Alexey Budankov wrote:
>> <SNIP>
>>>> +#ifdef HAVE_AIO_SUPPORT
>>>> +	off = lseek(trace_fd, 0, SEEK_CUR);
>>>> +#endif
>>>
>>> I'm still little puzzled why we need to do this,
>>> when the aio write takes the offset value, but
>>
>> pwrite() syscall [1] which is the base for aio_write() doesn't 
>> advance file pos value so it requires to be calculated and 
>> updated by callers of aio_write() API.
> 
> ok, so aio_write does not need the offset to be updated,
> who needs it then?

aio_write() needs this offset as an input parameter.
aio_write() gets offset as a part of cblock object.
Adjacent aio_write() records should not overlap in the trace file so
off value is incremented by size in every loop iteration after 
successful aio_write() call.

Thanks,
Alexey

> 
> jirka
> 

^ permalink raw reply	[flat|nested] 33+ messages in thread

* Re: [PATCH v11 1/3]: perf util: map data buffer for preserving collected data
  2018-10-08 12:38       ` Jiri Olsa
@ 2018-10-08 14:43         ` Alexey Budankov
  2018-10-08 15:12           ` Jiri Olsa
  0 siblings, 1 reply; 33+ messages in thread
From: Alexey Budankov @ 2018-10-08 14:43 UTC (permalink / raw)
  To: Jiri Olsa
  Cc: Peter Zijlstra, Ingo Molnar, Arnaldo Carvalho de Melo,
	Alexander Shishkin, Namhyung Kim, Andi Kleen, linux-kernel

Hi,

On 08.10.2018 15:38, Jiri Olsa wrote:
> On Mon, Oct 08, 2018 at 03:03:18PM +0300, Alexey Budankov wrote:
<SNIP>
>>
>> Well, let's go with stub functions design as opposite to pure 
>> conditional compilation. However this may, probably, result in 
>> unintended Perf tool binary size growth, even if NO_AIO is 
>> defined during compilation.
> 
> hum, empty functions are be optimized out

Yes, but there will be references to stubs in else branch of 
record__mmap_read_evlist() so I am not sure that *all* compilers 
are smart enough to recognize and still optimize it out.

Thanks,
Alexey

> 
> jirka
> 

^ permalink raw reply	[flat|nested] 33+ messages in thread

* Re: [PATCH v11 2/3]: perf record: enable asynchronous trace writing
  2018-10-08 14:42         ` Alexey Budankov
@ 2018-10-08 15:11           ` Jiri Olsa
  2018-10-08 15:39             ` Alexey Budankov
  0 siblings, 1 reply; 33+ messages in thread
From: Jiri Olsa @ 2018-10-08 15:11 UTC (permalink / raw)
  To: Alexey Budankov
  Cc: Peter Zijlstra, Ingo Molnar, Arnaldo Carvalho de Melo,
	Alexander Shishkin, Namhyung Kim, Andi Kleen, linux-kernel

On Mon, Oct 08, 2018 at 05:42:18PM +0300, Alexey Budankov wrote:
> Hi,
> On 08.10.2018 15:53, Jiri Olsa wrote:
> > On Mon, Oct 08, 2018 at 03:24:31PM +0300, Alexey Budankov wrote:
> >> Hi,
> >>
> >> On 08.10.2018 13:52, Jiri Olsa wrote:
> >>> On Mon, Oct 08, 2018 at 09:17:11AM +0300, Alexey Budankov wrote:
> >> <SNIP>
> >>>> +#ifdef HAVE_AIO_SUPPORT
> >>>> +	off = lseek(trace_fd, 0, SEEK_CUR);
> >>>> +#endif
> >>>
> >>> I'm still little puzzled why we need to do this,
> >>> when the aio write takes the offset value, but
> >>
> >> pwrite() syscall [1] which is the base for aio_write() doesn't 
> >> advance file pos value so it requires to be calculated and 
> >> updated by callers of aio_write() API.
> > 
> > ok, so aio_write does not need the offset to be updated,
> > who needs it then?
> 
> aio_write() needs this offset as an input parameter.
> aio_write() gets offset as a part of cblock object.

yes, it's an 'arg' to aio_write syscall

> Adjacent aio_write() records should not overlap in the trace file so
> off value is incremented by size in every loop iteration after 
> successful aio_write() call.

but does the aio_write need the lseek 'set' call? if not, we could
keep the 'offset' value within perf (like in the struct perf_data_file
or struct record) without any need to call lseek

jirka

^ permalink raw reply	[flat|nested] 33+ messages in thread

* Re: [PATCH v11 1/3]: perf util: map data buffer for preserving collected data
  2018-10-08 14:43         ` Alexey Budankov
@ 2018-10-08 15:12           ` Jiri Olsa
  2018-10-08 15:38             ` Alexey Budankov
  0 siblings, 1 reply; 33+ messages in thread
From: Jiri Olsa @ 2018-10-08 15:12 UTC (permalink / raw)
  To: Alexey Budankov
  Cc: Peter Zijlstra, Ingo Molnar, Arnaldo Carvalho de Melo,
	Alexander Shishkin, Namhyung Kim, Andi Kleen, linux-kernel

On Mon, Oct 08, 2018 at 05:43:38PM +0300, Alexey Budankov wrote:
> Hi,
> 
> On 08.10.2018 15:38, Jiri Olsa wrote:
> > On Mon, Oct 08, 2018 at 03:03:18PM +0300, Alexey Budankov wrote:
> <SNIP>
> >>
> >> Well, let's go with stub functions design as opposite to pure 
> >> conditional compilation. However this may, probably, result in 
> >> unintended Perf tool binary size growth, even if NO_AIO is 
> >> defined during compilation.
> > 
> > hum, empty functions are be optimized out
> 
> Yes, but there will be references to stubs in else branch of 
> record__mmap_read_evlist() so I am not sure that *all* compilers 
> are smart enough to recognize and still optimize it out.

well, kernel is doing that and it seem to be fine,
so I wouldn't worry about it

jirka

^ permalink raw reply	[flat|nested] 33+ messages in thread

* Re: [PATCH v11 3/3]: perf record: extend trace writing to multi AIO
  2018-10-08 12:45       ` Jiri Olsa
@ 2018-10-08 15:21         ` Alexey Budankov
  0 siblings, 0 replies; 33+ messages in thread
From: Alexey Budankov @ 2018-10-08 15:21 UTC (permalink / raw)
  To: Jiri Olsa
  Cc: Peter Zijlstra, Ingo Molnar, Arnaldo Carvalho de Melo,
	Alexander Shishkin, Namhyung Kim, Andi Kleen, linux-kernel

Hi,

On 08.10.2018 15:45, Jiri Olsa wrote:
<SNIP>
> 
> 'perf record --aio-cblocks=N' could imply '--aio',
> 
> I'd like to have simple/intuitive way to enable that,
> without studing the meaning of the cblocks argument
> 
>   # perf record --aio ...
> 
> means that I'm storing data via async writes,
> and using some reasonable default for cblocks

Semantically, --aio is equal to --aio-cblocks=default, as you said.
So, options duplication could be avoided by having some parsing 
callback using OPT_CALLBACK().

> 
> and later on we'd add 'perf record --threads'
> allowing threads based writers ;-)

Oh, I see. Cool.

Thanks,
Alexey

> 
> jirka
> 

^ permalink raw reply	[flat|nested] 33+ messages in thread

* Re: [PATCH v11 1/3]: perf util: map data buffer for preserving collected data
  2018-10-08 15:12           ` Jiri Olsa
@ 2018-10-08 15:38             ` Alexey Budankov
  0 siblings, 0 replies; 33+ messages in thread
From: Alexey Budankov @ 2018-10-08 15:38 UTC (permalink / raw)
  To: Jiri Olsa
  Cc: Peter Zijlstra, Ingo Molnar, Arnaldo Carvalho de Melo,
	Alexander Shishkin, Namhyung Kim, Andi Kleen, linux-kernel

Hi,

On 08.10.2018 18:12, Jiri Olsa wrote:
> On Mon, Oct 08, 2018 at 05:43:38PM +0300, Alexey Budankov wrote:
>> Hi,
>>
>> On 08.10.2018 15:38, Jiri Olsa wrote:
>>> On Mon, Oct 08, 2018 at 03:03:18PM +0300, Alexey Budankov wrote:
>> <SNIP>
>>>>
>>>> Well, let's go with stub functions design as opposite to pure 
>>>> conditional compilation. However this may, probably, result in 
>>>> unintended Perf tool binary size growth, even if NO_AIO is 
>>>> defined during compilation.
>>>
>>> hum, empty functions are be optimized out
>>
>> Yes, but there will be references to stubs in else branch of 
>> record__mmap_read_evlist() so I am not sure that *all* compilers 
>> are smart enough to recognize and still optimize it out.
> 
> well, kernel is doing that and it seem to be fine,
> so I wouldn't worry about it

Then, Ok. Good to know.

Thanks,
Alexey

> 
> jirka
> 

^ permalink raw reply	[flat|nested] 33+ messages in thread

* Re: [PATCH v11 2/3]: perf record: enable asynchronous trace writing
  2018-10-08 15:11           ` Jiri Olsa
@ 2018-10-08 15:39             ` Alexey Budankov
  0 siblings, 0 replies; 33+ messages in thread
From: Alexey Budankov @ 2018-10-08 15:39 UTC (permalink / raw)
  To: Jiri Olsa
  Cc: Peter Zijlstra, Ingo Molnar, Arnaldo Carvalho de Melo,
	Alexander Shishkin, Namhyung Kim, Andi Kleen, linux-kernel

Hi,
On 08.10.2018 18:11, Jiri Olsa wrote:
> On Mon, Oct 08, 2018 at 05:42:18PM +0300, Alexey Budankov wrote:
>> Hi,
>> On 08.10.2018 15:53, Jiri Olsa wrote:
>>> On Mon, Oct 08, 2018 at 03:24:31PM +0300, Alexey Budankov wrote:
>>>> Hi,
>>>>
>>>> On 08.10.2018 13:52, Jiri Olsa wrote:
>>>>> On Mon, Oct 08, 2018 at 09:17:11AM +0300, Alexey Budankov wrote:
>>>> <SNIP>
>>>>>> +#ifdef HAVE_AIO_SUPPORT
>>>>>> +	off = lseek(trace_fd, 0, SEEK_CUR);
>>>>>> +#endif
>>>>>
>>>>> I'm still little puzzled why we need to do this,
>>>>> when the aio write takes the offset value, but
>>>>
>>>> pwrite() syscall [1] which is the base for aio_write() doesn't 
>>>> advance file pos value so it requires to be calculated and 
>>>> updated by callers of aio_write() API.
>>>
>>> ok, so aio_write does not need the offset to be updated,
>>> who needs it then?
>>
>> aio_write() needs this offset as an input parameter.
>> aio_write() gets offset as a part of cblock object.
> 
> yes, it's an 'arg' to aio_write syscall
> 
>> Adjacent aio_write() records should not overlap in the trace file so
>> off value is incremented by size in every loop iteration after 
>> successful aio_write() call.
> 
> but does the aio_write need the lseek 'set' call? if not, we could

Yes, lseek(,, SEEK_SET) is still required after the loop, so the next 
synchronous records would go after AIO written parts that might be
still *in-flight* by that time.

Thanks,
Alexey

> keep the 'offset' value within perf (like in the struct perf_data_file
> or struct record) without any need to call lseek
> 
> jirka
> 

^ permalink raw reply	[flat|nested] 33+ messages in thread

end of thread, other threads:[~2018-10-08 15:39 UTC | newest]

Thread overview: 33+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2018-10-08  5:55 [PATCH v11 0/3]: perf: reduce data loss when profiling highly parallel CPU bound workloads Alexey Budankov
2018-10-08  6:14 ` [PATCH v11 1/3]: perf util: map data buffer for preserving collected data Alexey Budankov
2018-10-08 10:50   ` Jiri Olsa
2018-10-08 11:55     ` Alexey Budankov
2018-10-08 12:37       ` Jiri Olsa
2018-10-08 10:50   ` Jiri Olsa
2018-10-08 12:03     ` Alexey Budankov
2018-10-08 12:38       ` Jiri Olsa
2018-10-08 14:43         ` Alexey Budankov
2018-10-08 15:12           ` Jiri Olsa
2018-10-08 15:38             ` Alexey Budankov
2018-10-08 10:51   ` Jiri Olsa
2018-10-08 12:05     ` Alexey Budankov
2018-10-08  6:17 ` [PATCH v11 2/3]: perf record: enable asynchronous trace writing Alexey Budankov
2018-10-08 10:50   ` Jiri Olsa
2018-10-08 12:17     ` Alexey Budankov
2018-10-08 10:51   ` Jiri Olsa
2018-10-08 12:19     ` Alexey Budankov
2018-10-08 10:52   ` Jiri Olsa
2018-10-08 12:24     ` Alexey Budankov
2018-10-08 12:53       ` Jiri Olsa
2018-10-08 14:42         ` Alexey Budankov
2018-10-08 15:11           ` Jiri Olsa
2018-10-08 15:39             ` Alexey Budankov
2018-10-08 10:57   ` Jiri Olsa
2018-10-08 12:15     ` Alexey Budankov
2018-10-08 10:58   ` Jiri Olsa
2018-10-08 12:26     ` Alexey Budankov
2018-10-08  6:19 ` [PATCH v11 3/3]: perf record: extend trace writing to multi AIO Alexey Budankov
2018-10-08 10:55   ` Jiri Olsa
2018-10-08 11:47     ` Alexey Budankov
2018-10-08 12:45       ` Jiri Olsa
2018-10-08 15:21         ` Alexey Budankov

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).