linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: "Bayduraev, Alexey V" <alexey.v.bayduraev@linux.intel.com>
To: Namhyung Kim <namhyung@kernel.org>
Cc: Arnaldo Carvalho de Melo <acme@kernel.org>,
	Jiri Olsa <jolsa@redhat.com>,
	Alexander Shishkin <alexander.shishkin@linux.intel.com>,
	Peter Zijlstra <peterz@infradead.org>,
	Ingo Molnar <mingo@redhat.com>,
	linux-kernel <linux-kernel@vger.kernel.org>,
	Andi Kleen <ak@linux.intel.com>,
	Adrian Hunter <adrian.hunter@intel.com>,
	Alexander Antonov <alexander.antonov@linux.intel.com>,
	Alexei Budankov <abudankov@huawei.com>
Subject: Re: [PATCH v5 00/20] Introduce threaded trace streaming for basic perf record operation
Date: Thu, 6 May 2021 15:43:55 +0300	[thread overview]
Message-ID: <9f178dde-751f-9ac9-f5a0-fd1bfba3ca32@linux.intel.com> (raw)
In-Reply-To: <CAM9d7citW_NGb0vjMM2ytp=Mbq5YNe4GEaWspEkMGf=KAm+ugw@mail.gmail.com>

Hi,

On 06.05.2021 9:20, Namhyung Kim wrote:
> Hello,
> 
> On Tue, May 4, 2021 at 12:05 AM Alexey Bayduraev
> <alexey.v.bayduraev@linux.intel.com> wrote:
>>
<SNIP>>>
>> Basic analysis of data directories is provided in perf report mode.
>> Raw dump and aggregated reports are available for data directories,
>> still with no memory consumption optimizations.
> 
> Do you have an idea how to improve it?
> 
> I have to say again that I don't like merely adding more threads to
> record.  Yeah, parallelizing the perf record is good, but we have to
> think about the perf report (and others) too.

There is your idea about separating tracking records and process them 
first, but these changes can be much larger than my patch and I think 
they looks like independent patch and could be introduced as extension 
of parallel data loading.

I also thought and experimented with the intermediate flushing of 
the ordered queue. This is simple for per-cpu data files (sorted 
by time), but not clear for arbitrary CPU masks.

I think my patch can be the first step to introduce parallel mode 
to the perf tool. It just extends perf-record (already used in our 
vtune tool) and allows to load parallel data in experimental mode. 
Next patches could optimize and extend parallel data loading.

Regards,
Alexey

> 
> Thanks,
> Namhyung
> 
> 
>>
>> Tested:
>>
>> tools/perf/perf record -o prof.data --threads -- matrix.gcc.g.O3
>> tools/perf/perf record -o prof.data --threads= -- matrix.gcc.g.O3
>> tools/perf/perf record -o prof.data --threads=cpu -- matrix.gcc.g.O3
>> tools/perf/perf record -o prof.data --threads=core -- matrix.gcc.g.O3
>> tools/perf/perf record -o prof.data --threads=socket -- matrix.gcc.g.O3
>> tools/perf/perf record -o prof.data --threads=numa -- matrix.gcc.g.O3
>> tools/perf/perf record -o prof.data --threads=0-3/3:4-7/4 -- matrix.gcc.g.O3
>> tools/perf/perf record -o prof.data -C 2,5 --threads=0-3/3:4-7/4 -- matrix.gcc.g.O3
>> tools/perf/perf record -o prof.data -C 3,4 --threads=0-3/3:4-7/4 -- matrix.gcc.g.O3
>> tools/perf/perf record -o prof.data -C 0,4,2,6 --threads=core -- matrix.gcc.g.O3
>> tools/perf/perf record -o prof.data -C 0,4,2,6 --threads=numa -- matrix.gcc.g.O3
>> tools/perf/perf record -o prof.data --threads -g --call-graph dwarf,4096 -- matrix.gcc.g.O3
>> tools/perf/perf record -o prof.data --threads -g --call-graph dwarf,4096 --compression-level=3 -- matrix.gcc.g.O3
>> tools/perf/perf record -o prof.data --threads -a
>> tools/perf/perf record -D -1 -e cpu-cycles -a --control fd:10,11 -- sleep 30
>> tools/perf/perf record --threads -D -1 -e cpu-cycles -a --control fd:10,11 -- sleep 30
>>
>> tools/perf/perf report -i prof.data
>> tools/perf/perf report -i prof.data --call-graph=callee
>> tools/perf/perf report -i prof.data --stdio --header
>> tools/perf/perf report -i prof.data -D --header
>>
>> [1] git clone https://git.kernel.org/pub/scm/linux/kernel/git/jolsa/perf.git -b perf/record_threads
>> [2] https://lore.kernel.org/lkml/20180913125450.21342-1-jolsa@kernel.org/
>>
>> ---
>>
>> Alexey Bayduraev (20):
>>   perf record: introduce thread affinity and mmap masks
>>   perf record: introduce thread specific data array
>>   perf record: introduce thread local variable
>>   perf record: stop threads in the end of trace streaming
>>   perf record: start threads in the beginning of trace streaming
>>   perf record: introduce data file at mmap buffer object
>>   perf record: introduce data transferred and compressed stats
>>   perf record: init data file at mmap buffer object
>>   tools lib: introduce bitmap_intersects() operation
>>   perf record: introduce --threads=<spec> command line option
>>   perf record: document parallel data streaming mode
>>   perf report: output data file name in raw trace dump
>>   perf session: move reader structure to the top
>>   perf session: introduce reader_state in reader object
>>   perf session: introduce reader objects in session object
>>   perf session: introduce decompressor into trace reader object
>>   perf session: move init into reader__init function
>>   perf session: move map/unmap into reader__mmap function
>>   perf session: load single file for analysis
>>   perf session: load data directory files for analysis
>>
>>  tools/include/linux/bitmap.h             |   11 +
>>  tools/lib/api/fd/array.c                 |   17 +
>>  tools/lib/api/fd/array.h                 |    1 +
>>  tools/lib/bitmap.c                       |   14 +
>>  tools/perf/Documentation/perf-record.txt |   30 +
>>  tools/perf/builtin-inject.c              |    3 +-
>>  tools/perf/builtin-record.c              | 1066 ++++++++++++++++++++--
>>  tools/perf/util/evlist.c                 |   16 +
>>  tools/perf/util/evlist.h                 |    1 +
>>  tools/perf/util/mmap.c                   |    6 +
>>  tools/perf/util/mmap.h                   |    6 +
>>  tools/perf/util/ordered-events.h         |    1 +
>>  tools/perf/util/record.h                 |    2 +
>>  tools/perf/util/session.c                |  491 +++++++---
>>  tools/perf/util/session.h                |    5 +
>>  tools/perf/util/tool.h                   |    3 +-
>>  16 files changed, 1474 insertions(+), 199 deletions(-)
>>
>> --
>> 2.19.0
>>

  reply	other threads:[~2021-05-06 12:44 UTC|newest]

Thread overview: 27+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2021-05-04  7:04 [PATCH v5 00/20] Introduce threaded trace streaming for basic perf record operation Alexey Bayduraev
2021-05-04  7:04 ` [PATCH v5 01/20] perf record: introduce thread affinity and mmap masks Alexey Bayduraev
2021-05-04  7:04 ` [PATCH v5 02/20] perf record: introduce thread specific data array Alexey Bayduraev
2021-05-04  7:04 ` [PATCH v5 03/20] perf record: introduce thread local variable Alexey Bayduraev
2021-05-04  7:04 ` [PATCH v5 04/20] perf record: stop threads in the end of trace streaming Alexey Bayduraev
2021-05-04  7:04 ` [PATCH v5 05/20] perf record: start threads in the beginning " Alexey Bayduraev
2021-05-04  7:04 ` [PATCH v5 06/20] perf record: introduce data file at mmap buffer object Alexey Bayduraev
2021-05-04  7:04 ` [PATCH v5 07/20] perf record: introduce data transferred and compressed stats Alexey Bayduraev
2021-05-04  7:04 ` [PATCH v5 08/20] perf record: init data file at mmap buffer object Alexey Bayduraev
2021-05-04  7:04 ` [PATCH v5 09/20] tools lib: introduce bitmap_intersects() operation Alexey Bayduraev
2021-05-04  7:04 ` [PATCH v5 10/20] perf record: introduce --threads=<spec> command line option Alexey Bayduraev
2021-05-21  6:50   ` Namhyung Kim
2021-05-04  7:04 ` [PATCH v5 11/20] perf record: document parallel data streaming mode Alexey Bayduraev
2021-05-04  7:04 ` [PATCH v5 12/20] perf report: output data file name in raw trace dump Alexey Bayduraev
2021-05-04  7:04 ` [PATCH v5 13/20] perf session: move reader structure to the top Alexey Bayduraev
2021-05-04  7:04 ` [PATCH v5 14/20] perf session: introduce reader_state in reader object Alexey Bayduraev
2021-05-04  7:04 ` [PATCH v5 15/20] perf session: introduce reader objects in session object Alexey Bayduraev
2021-05-04  7:04 ` [PATCH v5 16/20] perf session: introduce decompressor into trace reader object Alexey Bayduraev
2021-05-04  7:04 ` [PATCH v5 17/20] perf session: move init into reader__init function Alexey Bayduraev
2021-05-04  7:04 ` [PATCH v5 18/20] perf session: move map/unmap into reader__mmap function Alexey Bayduraev
2021-05-04  7:04 ` [PATCH v5 19/20] perf session: load single file for analysis Alexey Bayduraev
2021-05-04  7:04 ` [PATCH v5 20/20] perf session: load data directory files " Alexey Bayduraev
2021-05-06  6:20 ` [PATCH v5 00/20] Introduce threaded trace streaming for basic perf record operation Namhyung Kim
2021-05-06 12:43   ` Bayduraev, Alexey V [this message]
2021-05-07  4:47     ` Namhyung Kim
2021-05-06 14:17   ` Andi Kleen
2021-05-07  4:52     ` Namhyung Kim

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=9f178dde-751f-9ac9-f5a0-fd1bfba3ca32@linux.intel.com \
    --to=alexey.v.bayduraev@linux.intel.com \
    --cc=abudankov@huawei.com \
    --cc=acme@kernel.org \
    --cc=adrian.hunter@intel.com \
    --cc=ak@linux.intel.com \
    --cc=alexander.antonov@linux.intel.com \
    --cc=alexander.shishkin@linux.intel.com \
    --cc=jolsa@redhat.com \
    --cc=linux-kernel@vger.kernel.org \
    --cc=mingo@redhat.com \
    --cc=namhyung@kernel.org \
    --cc=peterz@infradead.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).