From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-0.8 required=3.0 tests=HEADER_FROM_DIFFERENT_DOMAINS, MAILING_LIST_MULTI,SPF_PASS,URIBL_BLOCKED autolearn=ham autolearn_force=no version=3.4.0 Received: from mail.kernel.org (mail.kernel.org [198.145.29.99]) by smtp.lore.kernel.org (Postfix) with ESMTP id E7DADC433F5 for ; Thu, 6 Sep 2018 06:57:29 +0000 (UTC) Received: from vger.kernel.org (vger.kernel.org [209.132.180.67]) by mail.kernel.org (Postfix) with ESMTP id 982442075E for ; Thu, 6 Sep 2018 06:57:29 +0000 (UTC) DMARC-Filter: OpenDMARC Filter v1.3.2 mail.kernel.org 982442075E Authentication-Results: mail.kernel.org; dmarc=fail (p=none dis=none) header.from=linux.intel.com Authentication-Results: mail.kernel.org; spf=none smtp.mailfrom=linux-kernel-owner@vger.kernel.org Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1727669AbeIFLbZ (ORCPT ); Thu, 6 Sep 2018 07:31:25 -0400 Received: from mga04.intel.com ([192.55.52.120]:44883 "EHLO mga04.intel.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1725880AbeIFLbZ (ORCPT ); Thu, 6 Sep 2018 07:31:25 -0400 X-Amp-Result: SKIPPED(no attachment in message) X-Amp-File-Uploaded: False Received: from orsmga003.jf.intel.com ([10.7.209.27]) by fmsmga104.fm.intel.com with ESMTP/TLS/DHE-RSA-AES256-GCM-SHA384; 05 Sep 2018 23:57:27 -0700 X-ExtLoop1: 1 X-IronPort-AV: E=Sophos;i="5.53,334,1531810800"; d="scan'208";a="81306307" Received: from linux.intel.com ([10.54.29.200]) by orsmga003.jf.intel.com with ESMTP; 05 Sep 2018 23:57:27 -0700 Received: from [10.125.252.28] (abudanko-mobl.ccr.corp.intel.com [10.125.252.28]) by linux.intel.com (Postfix) with ESMTP id EC5A05802B3; Wed, 5 Sep 2018 23:57:24 -0700 (PDT) Subject: Re: [PATCH v7 0/2]: perf: reduce data loss when profiling highly parallel CPU bound workloads To: Jiri Olsa Cc: Ingo Molnar , Peter Zijlstra , Arnaldo Carvalho de Melo , Alexander Shishkin , Namhyung Kim , Andi Kleen , linux-kernel References: <1fc1fc5b-a8cc-2b05-d43c-692e58855c81@linux.intel.com> <20180905112851.GA29759@krava> From: Alexey Budankov Organization: Intel Corp. Message-ID: Date: Thu, 6 Sep 2018 09:57:23 +0300 User-Agent: Mozilla/5.0 (Windows NT 10.0; WOW64; rv:52.0) Gecko/20100101 Thunderbird/52.9.1 MIME-Version: 1.0 In-Reply-To: <20180905112851.GA29759@krava> Content-Type: text/plain; charset=utf-8 Content-Language: en-US Content-Transfer-Encoding: 7bit Sender: linux-kernel-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org Hi, On 05.09.2018 14:28, Jiri Olsa wrote: > On Wed, Sep 05, 2018 at 10:16:42AM +0300, Alexey Budankov wrote: >> >> Currently in record mode the tool implements trace writing serially. >> The algorithm loops over mapped per-cpu data buffers and stores >> ready data chunks into a trace file using write() system call. >> >> At some circumstances the kernel may lack free space in a buffer >> because the other buffer's half is not yet written to disk due to >> some other buffer's data writing by the tool at the moment. >> >> Thus serial trace writing implementation may cause the kernel >> to loose profiling data and that is what observed when profiling >> highly parallel CPU bound workloads on machines with big number >> of cores. >> >> Experiment with profiling matrix multiplication code executing 128 >> threads on Intel Xeon Phi (KNM) with 272 cores, like below, >> demonstrates data loss metrics value of 98%: >> >> /usr/bin/time perf record -o /tmp/perf-ser.data -a -N -B -T -R -g \ >> --call-graph dwarf,1024 --user-regs=IP,SP,BP \ >> --switch-events -e cycles,instructions,ref-cycles,software/period=1,name=cs,config=0x3/Duk -- \ >> matrix.gcc >> >> Data loss metrics is the ratio lost_time/elapsed_time where >> lost_time is the sum of time intervals containing PERF_RECORD_LOST >> records and elapsed_time is the elapsed application run time >> under profiling. >> >> Applying asynchronous trace streaming thru Posix AIO API >> (http://man7.org/linux/man-pages/man7/aio.7.html) >> lowers data loss metrics value providing 2x improvement - >> lowering 98% loss to almost 0%. >> >> --- >> Alexey Budankov (2): >> perf util: map data buffer for preserving collected data >> perf record: enable asynchronous trace writing >> >> tools/perf/builtin-record.c | 197 +++++++++++++++++++++++++++++++++++++++++++- >> tools/perf/perf.h | 1 + >> tools/perf/util/evlist.c | 7 +- >> tools/perf/util/evlist.h | 3 +- >> tools/perf/util/mmap.c | 110 +++++++++++++++++++++---- >> tools/perf/util/mmap.h | 10 ++- >> 6 files changed, 302 insertions(+), 26 deletions(-) >> >> --- >> Changes in v7: >> - implemented handling record.aio setting from perfconfig file > > can't apply this version on Arnaldo's perf/core... > > [jolsa@krava linux-perf]$ git am /tmp/ab/ > Applying: perf util: map data buffer for preserving collected data > error: patch failed: tools/perf/util/mmap.c:166 > error: tools/perf/util/mmap.c: patch does not apply Here it is: tools/perf/builtin-record.c | 197 +++++++++++++++++++++++++++++++++++++++++++- tools/perf/perf.h | 1 + tools/perf/util/evlist.c | 7 +- tools/perf/util/evlist.h | 3 +- tools/perf/util/mmap.c | 110 +++++++++++++++++++++---- tools/perf/util/mmap.h | 10 ++- 6 files changed, 302 insertions(+), 26 deletions(-) diff --git a/tools/perf/builtin-record.c b/tools/perf/builtin-record.c index 9853552bcf16..0f1324549e35 100644 --- a/tools/perf/builtin-record.c +++ b/tools/perf/builtin-record.c @@ -121,6 +121,91 @@ static int record__write(struct record *rec, void *bf, size_t size) return 0; } +static int record__aio_write(struct aiocb *cblock, int trace_fd, + void *buf, size_t size, off_t off) +{ + int rc; + + cblock->aio_fildes = trace_fd; + cblock->aio_buf = buf; + cblock->aio_nbytes = size; + cblock->aio_offset = off; + cblock->aio_sigevent.sigev_notify = SIGEV_NONE; + + do { + rc = aio_write(cblock); + if (rc == 0) { + break; + } else if (errno != EAGAIN) { + cblock->aio_fildes = -1; + pr_err("failed to queue perf data, error: %m\n"); + break; + } + } while (1); + + return rc; +} + +static int record__aio_sync(struct perf_mmap *md) +{ + void *rem_buf; + off_t rem_off; + size_t rem_size; + ssize_t aio_ret, written; + int i, aio_errno, do_suspend, idx = -1; + struct aiocb **cblocks = md->cblocks; + struct timespec timeout = { 0, 1000 * 1000 * 1 }; // 1ms + + for (i = 0; i < md->nr_cblocks; ++i) + if (cblocks[i]->aio_fildes == -1) + return i; + + do { + do_suspend = 0; + if (aio_suspend((const struct aiocb**)cblocks, md->nr_cblocks, &timeout)) { + if (!(errno == EAGAIN || errno == EINTR)) + pr_err("failed to sync perf data, error: %m\n"); + do_suspend = 1; + continue; + } + for (i = 0; i < md->nr_cblocks; ++i) { + aio_errno = aio_error(cblocks[i]); + if (aio_errno == EINPROGRESS) + continue; + written = aio_ret = aio_return(cblocks[i]); + if (aio_ret < 0) { + if (aio_errno != EINTR) + pr_err("failed to write perf data, error: %m\n"); + written = 0; + } + rem_size = cblocks[i]->aio_nbytes - written; + if (rem_size == 0) { + cblocks[i]->aio_fildes = -1; + /* md->refcount is incremented in perf_mmap__push() for + * every enqueued aio write request so decrement it because + * the request is now complete. + */ + perf_mmap__put(md); + if (idx == -1) + idx = i; + } else { + /* aio write request may require restart with the + * reminder if the kernel didn't write whole + * chunk at once. + */ + rem_off = cblocks[i]->aio_offset + written; + rem_buf = (void*)(cblocks[i]->aio_buf + written); + record__aio_write(cblocks[i], cblocks[i]->aio_fildes, + rem_buf, rem_size, rem_off); + } + } + if (idx == -1) + do_suspend = 1; + } while (do_suspend); + + return idx; +} + static int process_synthesized_event(struct perf_tool *tool, union perf_event *event, struct perf_sample *sample __maybe_unused, @@ -130,12 +215,28 @@ static int process_synthesized_event(struct perf_tool *tool, return record__write(rec, event, event->header.size); } -static int record__pushfn(void *to, void *bf, size_t size) +static int record__pushfn(void *to, struct aiocb *cblock, void *data, size_t size) { + off_t off; struct record *rec = to; + int ret, trace_fd = rec->session->data->file.fd; rec->samples++; - return record__write(rec, bf, size); + + off = + lseek(trace_fd, 0, SEEK_CUR); + lseek(trace_fd, off + size, SEEK_SET); + + ret = record__aio_write(cblock, trace_fd, data, size, off); + if (!ret) { + rec->bytes_written += size; + if (switch_output_size(rec)) + trigger_hit(&switch_output_trigger); + } else { + lseek(trace_fd, off, SEEK_SET); + } + + return ret; } static volatile int done; @@ -326,7 +427,8 @@ static int record__mmap_evlist(struct record *rec, if (perf_evlist__mmap_ex(evlist, opts->mmap_pages, opts->auxtrace_mmap_pages, - opts->auxtrace_snapshot_mode) < 0) { + opts->auxtrace_snapshot_mode, + opts->nr_cblocks) < 0) { if (errno == EPERM) { pr_err("Permission error mapping pages.\n" "Consider increasing " @@ -510,6 +612,70 @@ static struct perf_event_header finished_round_event = { .type = PERF_RECORD_FINISHED_ROUND, }; +static void record__aio_complete(struct perf_mmap *md, int i) +{ + int aio_errno; + void *rem_buf; + off_t rem_off; + size_t rem_size; + ssize_t aio_ret, written; + struct aiocb *cblock = md->cblocks[i]; + struct timespec timeout = { 0, 1000 * 1000 * 1 }; // 1ms + + if (cblock->aio_fildes == -1) + return; + + do { + if (aio_suspend((const struct aiocb**)&cblock, 1, &timeout)) { + if (!(errno == EAGAIN || errno == EINTR)) + pr_err("failed to sync perf data, error: %m\n"); + continue; + } + aio_errno = aio_error(cblock); + if (aio_errno == EINPROGRESS) + continue; + written = aio_ret = aio_return(cblock); + if (aio_ret < 0) { + if (!(aio_errno == EINTR)) + pr_err("failed to write perf data, error: %m\n"); + written = 0; + } + rem_size = cblock->aio_nbytes - written; + if (rem_size == 0) { + cblock->aio_fildes = -1; + /* md->refcount is incremented in perf_mmap__push() for + * every enqueued aio write request so decrement it because + * the request is now complete. + */ + perf_mmap__put(md); + return; + } else { + /* aio write request may require restart with the + * reminder if the kernel didn't write whole + * chunk at once. + */ + rem_off = cblock->aio_offset + written; + rem_buf = (void*)(cblock->aio_buf + written); + record__aio_write(cblock, cblock->aio_fildes, + rem_buf, rem_size, rem_off); + } + } while (1); +} + +static void record__mmap_read_sync(struct record *rec) +{ + int i, j; + struct perf_evlist *evlist = rec->evlist; + struct perf_mmap *maps = evlist->mmap; + + for (i = 0; i < evlist->nr_mmaps; i++) { + struct perf_mmap *map = &maps[i]; + if (map->base) + for (j = 0; j < map->nr_cblocks; ++j) + record__aio_complete(map, j); + } +} + static int record__mmap_read_evlist(struct record *rec, struct perf_evlist *evlist, bool overwrite) { @@ -532,7 +698,13 @@ static int record__mmap_read_evlist(struct record *rec, struct perf_evlist *evli struct auxtrace_mmap *mm = &maps[i].auxtrace_mmap; if (maps[i].base) { - if (perf_mmap__push(&maps[i], rec, record__pushfn) != 0) { + /* Call record__aio_sync() to allocate free + * map->data buffer or wait till one of the + * buffers becomes available after previous + * aio write request. + */ + if (perf_mmap__push(&maps[i], rec, record__aio_sync(&maps[i]), + record__pushfn) != 0) { rc = -1; goto out; } @@ -1054,6 +1226,7 @@ static int __cmd_record(struct record *rec, int argc, const char **argv) perf_evlist__toggle_bkw_mmap(rec->evlist, BKW_MMAP_DATA_PENDING); if (record__mmap_read_all(rec) < 0) { + record__mmap_read_sync(rec); trigger_error(&auxtrace_snapshot_trigger); trigger_error(&switch_output_trigger); err = -1; @@ -1065,6 +1238,7 @@ static int __cmd_record(struct record *rec, int argc, const char **argv) if (!trigger_is_error(&auxtrace_snapshot_trigger)) record__read_auxtrace_snapshot(rec); if (trigger_is_error(&auxtrace_snapshot_trigger)) { + record__mmap_read_sync(rec); pr_err("AUX area tracing snapshot failed\n"); err = -1; goto out_child; @@ -1083,6 +1257,8 @@ static int __cmd_record(struct record *rec, int argc, const char **argv) */ if (rec->evlist->bkw_mmap_state == BKW_MMAP_RUNNING) continue; + + record__mmap_read_sync(rec); trigger_ready(&switch_output_trigger); /* @@ -1136,6 +1312,7 @@ static int __cmd_record(struct record *rec, int argc, const char **argv) disabled = true; } } + record__mmap_read_sync(rec); trigger_off(&auxtrace_snapshot_trigger); trigger_off(&switch_output_trigger); @@ -1287,6 +1464,8 @@ static int perf_record_config(const char *var, const char *value, void *cb) var = "call-graph.record-mode"; return perf_default_config(var, value, cb); } + if (!strcmp(var, "record.aio")) + rec->opts.nr_cblocks = strtol(value, NULL, 0); return 0; } @@ -1519,6 +1698,7 @@ static struct record record = { .default_per_cpu = true, }, .proc_map_timeout = 500, + .nr_cblocks = 2 }, .tool = { .sample = process_sample_event, @@ -1678,6 +1858,8 @@ static struct option __record_options[] = { "signal"), OPT_BOOLEAN(0, "dry-run", &dry_run, "Parse options then exit"), + OPT_INTEGER(0, "aio", &record.opts.nr_cblocks, + "asynchronous trace write operations (min: 1, max: 32, default: 2)"), OPT_END() }; @@ -1870,6 +2052,13 @@ int cmd_record(int argc, const char **argv) goto out; } + if (!(1 <= rec->opts.nr_cblocks && rec->opts.nr_cblocks <= 32)) + rec->opts.nr_cblocks = 2; + + if (verbose > 0) + pr_info("AIO trace writes: %d\n", rec->opts.nr_cblocks); + + err = __cmd_record(&record, argc, argv); out: perf_evlist__delete(rec->evlist); diff --git a/tools/perf/perf.h b/tools/perf/perf.h index 21bf7f5a3cf5..0a1ae2ae567a 100644 --- a/tools/perf/perf.h +++ b/tools/perf/perf.h @@ -82,6 +82,7 @@ struct record_opts { bool use_clockid; clockid_t clockid; unsigned int proc_map_timeout; + int nr_cblocks; }; struct option; diff --git a/tools/perf/util/evlist.c b/tools/perf/util/evlist.c index be440df29615..2c8c0270886b 100644 --- a/tools/perf/util/evlist.c +++ b/tools/perf/util/evlist.c @@ -1018,7 +1018,8 @@ int perf_evlist__parse_mmap_pages(const struct option *opt, const char *str, */ int perf_evlist__mmap_ex(struct perf_evlist *evlist, unsigned int pages, unsigned int auxtrace_pages, - bool auxtrace_overwrite) + bool auxtrace_overwrite, + int nr_cblocks) { struct perf_evsel *evsel; const struct cpu_map *cpus = evlist->cpus; @@ -1028,7 +1029,7 @@ int perf_evlist__mmap_ex(struct perf_evlist *evlist, unsigned int pages, * Its value is decided by evsel's write_backward. * So &mp should not be passed through const pointer. */ - struct mmap_params mp; + struct mmap_params mp = { .nr_cblocks = nr_cblocks }; if (!evlist->mmap) evlist->mmap = perf_evlist__alloc_mmap(evlist, false); @@ -1060,7 +1061,7 @@ int perf_evlist__mmap_ex(struct perf_evlist *evlist, unsigned int pages, int perf_evlist__mmap(struct perf_evlist *evlist, unsigned int pages) { - return perf_evlist__mmap_ex(evlist, pages, 0, false); + return perf_evlist__mmap_ex(evlist, pages, 0, false, 2); } int perf_evlist__create_maps(struct perf_evlist *evlist, struct target *target) diff --git a/tools/perf/util/evlist.h b/tools/perf/util/evlist.h index dc66436add98..a94d3c613254 100644 --- a/tools/perf/util/evlist.h +++ b/tools/perf/util/evlist.h @@ -162,7 +162,8 @@ unsigned long perf_event_mlock_kb_in_pages(void); int perf_evlist__mmap_ex(struct perf_evlist *evlist, unsigned int pages, unsigned int auxtrace_pages, - bool auxtrace_overwrite); + bool auxtrace_overwrite, + int nr_cblocks); int perf_evlist__mmap(struct perf_evlist *evlist, unsigned int pages); void perf_evlist__munmap(struct perf_evlist *evlist); diff --git a/tools/perf/util/mmap.c b/tools/perf/util/mmap.c index 215f69f41672..a2c8ead9344c 100644 --- a/tools/perf/util/mmap.c +++ b/tools/perf/util/mmap.c @@ -155,6 +155,14 @@ void __weak auxtrace_mmap_params__set_idx(struct auxtrace_mmap_params *mp __mayb void perf_mmap__munmap(struct perf_mmap *map) { + int i; + if (map->data) { + for (i = 0; i < map->nr_cblocks; ++i) + zfree(&(map->data[i])); + zfree(&(map->data)); + } + if (map->cblocks) + zfree(&(map->cblocks)); if (map->base != NULL) { munmap(map->base, perf_mmap__mmap_len(map)); map->base = NULL; @@ -166,6 +174,7 @@ void perf_mmap__munmap(struct perf_mmap *map) int perf_mmap__mmap(struct perf_mmap *map, struct mmap_params *mp, int fd, int cpu) { + int i; /* * The last one will be done at perf_mmap__consume(), so that we * make sure we don't prevent tools from consuming every last event in @@ -190,6 +199,50 @@ int perf_mmap__mmap(struct perf_mmap *map, struct mmap_params *mp, int fd, int c map->base = NULL; return -1; } + map->nr_cblocks = mp->nr_cblocks; + map->cblocks = calloc(map->nr_cblocks, sizeof(struct aiocb*)); + if (!map->cblocks) { + pr_debug2("failed to allocate perf event data buffers, error %d\n", + errno); + return -1; + } + map->data = calloc(map->nr_cblocks, sizeof(void*)); + if (map->data) { + int delta_max = sysconf(_SC_AIO_PRIO_DELTA_MAX); + for (i = 0; i < map->nr_cblocks; ++i) { + map->data[i] = malloc(perf_mmap__mmap_len(map)); + if (map->data[i]) { + int prio; + unsigned char *data = map->data[i]; + map->cblocks[i] = (struct aiocb *)&data[map->mask + 1]; + memset(map->cblocks[i], 0, sizeof(struct aiocb)); + /* Use cblock.aio_fildes value different from -1 + * to denote started aio write operation on the + * cblock so it requires explicit record__aio_sync() + * call prior the cblock may be reused again. + */ + map->cblocks[i]->aio_fildes = -1; + /* Allocate cblocks with decreasing priority to + * have faster aio_write() calls because queued + * requests are kept in separate per-prio queues + * and adding a new request iterates thru shorter + * per-prio list. + */ + prio = delta_max - i; + if (prio < 0) + prio = 0; + map->cblocks[i]->aio_reqprio = prio; + } else { + pr_debug2("failed to allocate perf event data buffer, error %d\n", + errno); + return -1; + } + } + } else { + pr_debug2("failed to alloc perf event data buffers, error %d\n", + errno); + return -1; + } map->fd = fd; map->cpu = cpu; @@ -280,12 +333,12 @@ int perf_mmap__read_init(struct perf_mmap *map) return __perf_mmap__read_init(map); } -int perf_mmap__push(struct perf_mmap *md, void *to, - int push(void *to, void *buf, size_t size)) +int perf_mmap__push(struct perf_mmap *md, void *to, int idx, + int push(void *to, struct aiocb *cblock, void *data, size_t size)) { u64 head = perf_mmap__read_head(md); unsigned char *data = md->base + page_size; - unsigned long size; + unsigned long size, size0 = 0; void *buf; int rc = 0; @@ -293,31 +346,58 @@ int perf_mmap__push(struct perf_mmap *md, void *to, if (rc < 0) return (rc == -EAGAIN) ? 0 : -1; + /* md->base data is copied into md->data[idx] buffer to + * release space in the kernel buffer as fast as possible, + * thru perf_mmap__consume() below. + * + * That lets the kernel to proceed with storing more + * profiling data into the kernel buffer earlier than other + * per-cpu kernel buffers are handled. + * + * Coping can be done in two steps in case the chunk of + * profiling data crosses the upper bound of the kernel buffer. + * In this case we first move part of data from md->start + * till the upper bound and then the reminder from the + * beginning of the kernel buffer till the end of + * the data chunk. + */ + size = md->end - md->start; if ((md->start & md->mask) + size != (md->end & md->mask)) { buf = &data[md->start & md->mask]; - size = md->mask + 1 - (md->start & md->mask); - md->start += size; - - if (push(to, buf, size) < 0) { - rc = -1; - goto out; - } + size0 = md->mask + 1 - (md->start & md->mask); + md->start += size0; + memcpy(md->data[idx], buf, size0); } buf = &data[md->start & md->mask]; size = md->end - md->start; md->start += size; + memcpy(md->data[idx] + size0, buf, size); - if (push(to, buf, size) < 0) { - rc = -1; - goto out; - } + /* Increment md->refcount to guard md->data[idx] buffer + * from premature deallocation because md object can be + * released earlier than aio write request started + * on mmap->data[idx] is complete. + * + * perf_mmap__put() is done at record__aio_sync() or + * record__aio_complete() after started request completion. + */ + + perf_mmap__get(md); md->prev = head; perf_mmap__consume(md); -out: + + rc = push(to, md->cblocks[idx], md->data[idx], size0 + size); + if (rc) { + /* Decrement md->refcount back if aio write + * operation failed to start. + */ + perf_mmap__put(md); + } + return rc; } diff --git a/tools/perf/util/mmap.h b/tools/perf/util/mmap.h index 05a6d47c7956..8fbe5cb6d95b 100644 --- a/tools/perf/util/mmap.h +++ b/tools/perf/util/mmap.h @@ -6,6 +6,7 @@ #include #include #include +#include #include "auxtrace.h" #include "event.h" @@ -26,6 +27,9 @@ struct perf_mmap { bool overwrite; struct auxtrace_mmap auxtrace_mmap; char event_copy[PERF_SAMPLE_MAX_SIZE] __aligned(8); + void **data; + struct aiocb **cblocks; + int nr_cblocks; }; /* @@ -57,7 +61,7 @@ enum bkw_mmap_state { }; struct mmap_params { - int prot, mask; + int prot, mask, nr_cblocks; struct auxtrace_mmap_params auxtrace_mp; }; @@ -92,8 +96,8 @@ union perf_event *perf_mmap__read_forward(struct perf_mmap *map); union perf_event *perf_mmap__read_event(struct perf_mmap *map); -int perf_mmap__push(struct perf_mmap *md, void *to, - int push(void *to, void *buf, size_t size)); +int perf_mmap__push(struct perf_mmap *md, void *to, int aio_idx, + int push(void *to, struct aiocb *cblock, void *data, size_t size)); size_t perf_mmap__mmap_len(struct perf_mmap *map); > > thanks, > jirka >