From: Alexey Budankov <email@example.com> To: Jiri Olsa <firstname.lastname@example.org>, Arnaldo Carvalho de Melo <email@example.com> Cc: Peter Zijlstra <firstname.lastname@example.org>, Ingo Molnar <email@example.com>, Alexander Shishkin <firstname.lastname@example.org>, Namhyung Kim <email@example.com>, Andi Kleen <firstname.lastname@example.org>, linux-kernel <email@example.com> Subject: Re: [PATCH v14 0/3]: perf: reduce data loss when profiling highly parallel CPU bound workloads Date: Thu, 25 Oct 2018 10:59:36 +0300 [thread overview] Message-ID: <firstname.lastname@example.org> (raw) In-Reply-To: <20181015101748.GB29504@krava> Hi, On 15.10.2018 13:17, Jiri Olsa wrote: > On Mon, Oct 15, 2018 at 09:26:09AM +0300, Alexey Budankov wrote: >> >> Currently in record mode the tool implements trace writing serially. >> The algorithm loops over mapped per-cpu data buffers and stores >> ready data chunks into a trace file using write() system call. >> >> At some circumstances the kernel may lack free space in a buffer >> because the other buffer's half is not yet written to disk due to >> some other buffer's data writing by the tool at the moment. >> >> Thus serial trace writing implementation may cause the kernel >> to loose profiling data and that is what observed when profiling >> highly parallel CPU bound workloads on machines with big number >> of cores. >> >> Experiment with profiling matrix multiplication code executing 128 >> threads on Intel Xeon Phi (KNM) with 272 cores, like below, >> demonstrates data loss metrics value of 98%: >> >> /usr/bin/time perf record -o /tmp/perf-ser.data -a -N -B -T -R -g \ >> --call-graph dwarf,1024 --user-regs=IP,SP,BP --switch-events \ >> -e cycles,instructions,ref-cycles,software/period=1,name=cs,config=0x3/Duk -- \ >> matrix.gcc > > I ran above on 24 cpu server and could not see the gain, > but I guess I'd need much bigger server to see that > > anyway, the code is now nicely separated, and given the > advertised results below I have no objections > > Reviewed-by: Jiri Olsa <email@example.com> Is the plan Jiri mentioned earlier to have it as a stand alone patch kit or upstream the changes into mainline? Thanks, Alexey > > thanks, > jirka > > >> >> Data loss metrics is the ratio lost_time/elapsed_time where >> lost_time is the sum of time intervals containing PERF_RECORD_LOST >> records and elapsed_time is the elapsed application run time >> under profiling. >> >> Applying asynchronous trace streaming thru Posix AIO API  lowers >> data loss metrics value providing 2x improvement (from 98% to ~1%) >> >> Asynchronous trace streaming is currently limited to glibc linkage. >> musl libc  also provides Posix AIO API implementation, however >> the patchkit is not tested with it. There may be other libc libraries >> linked by Perf tool that currently lack Posix AIO API support , >> ,  so NO_AIO define may be used to limit Perf tool binary to >> serial streaming only. >> >> --- >> Alexey Budankov (3): >> perf util: map data buffer for preserving collected data >> perf record: enable asynchronous trace writing >> perf record: extend trace writing to multi AIO >> >> tools/perf/Documentation/perf-record.txt | 5 + >> tools/perf/Makefile.config | 5 + >> tools/perf/Makefile.perf | 7 +- >> tools/perf/builtin-record.c | 252 ++++++++++++++++++++++++++++++- >> tools/perf/perf.h | 1 + >> tools/perf/util/evlist.c | 6 +- >> tools/perf/util/evlist.h | 2 +- >> tools/perf/util/mmap.c | 146 +++++++++++++++++- >> tools/perf/util/mmap.h | 26 +++- >> 9 files changed, 439 insertions(+), 11 deletions(-) >> >> --- >> Changes in v14: >> - implement default nr_cblocks_default variable >> - fix --aio option handling >> Changes in v13: >> - named new functions with _aio_ word >> - grouped aio functions under single #ifdef HAVE_AIO_SUPPORT >> - moved perf_mmap__aio_push() stub into header >> - removed trailed white space >> Changes in v12: >> - applied stub functions design for the whole patch kit >> - grouped AIO related data into a struct under struct perf_mmap >> - implemented record__aio_get/set_pos(), record__aio_enabled() >> - implemented simple --aio option >> - extended --aio option to --aio-cblocks=<n> >> Changes in v11: >> - replacing the both lseek() syscalls in every loop iteration by the only >> two syscalls just before and after the loop at record__mmap_read_evlist() >> and advancing *in-flight* off file pos value at perf_mmap__aio_push() >> Changes in v10: >> - moved specific code to perf_mmap__aio_mmap(), perf_mmap__aio_munmap(); >> - adjusted error reporting by using %m >> - avoided lseek() setting file pos back in case of record__aio_write() failure >> - compacted code selecting between serial and AIO streaming >> - optimized call places of record__mmap_read_sync() >> - added description of aio-cblocks option into perf-record.txt >> Changes in v9: >> - enable AIO streaming only when --aio-cblocks option is specified explicitly >> - enable AIO based implementation when linking with glibc only >> - define NO_AIO to limit Perf binary to serial implementation >> Changes in v8: >> - run the whole thing thru checkpatch.pl and corrected found issues except >> lines longer than 80 symbols >> - corrected comments alignment and formatting >> - moved multi AIO implementation into 3rd patch in the series >> - implemented explicit cblocks array allocation >> - split AIO completion check into separate record__aio_complete() >> - set nr_cblocks default to 1 and max allowed value to 4 >> Changes in v7: >> - implemented handling record.aio setting from perfconfig file >> Changes in v6: >> - adjusted setting of priorities for cblocks; >> - handled errno == EAGAIN case from aio_write() return; >> Changes in v5: >> - resolved livelock on perf record -e intel_pt// -- dd if=/dev/zero of=/dev/null count=100000 >> - data loss metrics decreased from 25% to 2x in trialed configuration; >> - reshaped layout of data structures; >> - implemented --aio option; >> - avoided nanosleep() prior calling aio_suspend(); >> - switched to per-cpu aio multi buffer record__aio_sync(); >> - record_mmap_read_sync() now does global sync just before >> switching trace file or collection stop; >> Changes in v4: >> - converted mmap()/munmap() to malloc()/free() for mmap->data buffer management >> - converted void *bf to struct perf_mmap *md in signatures >> - written comment in perf_mmap__push() just before perf_mmap__get(); >> - written comment in record__mmap_read_sync() on possible restarting >> of aio_write() operation and releasing perf_mmap object after all; >> - added perf_mmap__put() for the cases of failed aio_write(); >> Changes in v3: >> - written comments about nanosleep(0.5ms) call prior aio_suspend() >> to cope with intrusiveness of its implementation in glibc; >> - written comments about rationale behind coping profiling data >> into mmap->data buffer; >> Changes in v2: >> - converted zalloc() to calloc() for allocation of mmap_aio array, >> - cleared typo and adjusted fallback branch code; >> >> --- >> >>  http://man7.org/linux/man-pages/man7/aio.7.html >>  https://android.googlesource.com/platform/bionic/+/master/docs/status.md >>  https://www.uclibc.org/ >>  https://uclibc-ng.org/ >>  https://www.musl-libc.org/ >
next prev parent reply other threads:[~2018-10-25 7:59 UTC|newest] Thread overview: 14+ messages / expand[flat|nested] mbox.gz Atom feed top 2018-10-15 6:26 Alexey Budankov 2018-10-15 6:42 ` [PATCH v14 1/3]: perf util: map data buffer for preserving collected data Alexey Budankov 2018-11-01 18:47 ` Song Liu 2018-11-02 16:14 ` Alexey Budankov 2018-10-15 6:44 ` [PATCH v14 2/3]: perf record: enable asynchronous trace writing Alexey Budankov 2018-11-01 18:42 ` Song Liu 2018-11-02 16:11 ` Alexey Budankov 2018-10-15 6:46 ` [PATCH v14 3/3]: perf record: extend trace writing to multi AIO Alexey Budankov 2018-11-01 18:30 ` Song Liu 2018-11-02 16:12 ` Alexey Budankov 2018-10-15 10:17 ` [PATCH v14 0/3]: perf: reduce data loss when profiling highly parallel CPU bound workloads Jiri Olsa 2018-10-25 7:59 ` Alexey Budankov [this message] 2018-10-25 8:54 ` Jiri Olsa 2018-10-25 11:12 ` Arnaldo Carvalho de Melo
Reply instructions: You may reply publicly to this message via plain-text email using any one of the following methods: * Save the following mbox file, import it into your mail client, and reply-to-all from there: mbox Avoid top-posting and favor interleaved quoting: https://en.wikipedia.org/wiki/Posting_style#Interleaved_style * Reply using the --to, --cc, and --in-reply-to switches of git-send-email(1): git send-email \ --firstname.lastname@example.org \ --email@example.com \ --firstname.lastname@example.org \ --email@example.com \ --firstname.lastname@example.org \ --email@example.com \ --firstname.lastname@example.org \ --email@example.com \ --firstname.lastname@example.org \ --email@example.com \ --subject='Re: [PATCH v14 0/3]: perf: reduce data loss when profiling highly parallel CPU bound workloads' \ /path/to/YOUR_REPLY https://kernel.org/pub/software/scm/git/docs/git-send-email.html * If your mail client supports setting the In-Reply-To header via mailto: links, try the mailto: link
This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox; as well as URLs for NNTP newsgroup(s).