linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [PATCH v1 00/15] Introduce threaded trace streaming for basic perf record operation
@ 2020-10-12  8:50 Alexey Budankov
  2020-10-12  8:53 ` [PATCH v1 01/15] perf session: introduce trace file path to be shown in raw trace dump Alexey Budankov
                   ` (16 more replies)
  0 siblings, 17 replies; 45+ messages in thread
From: Alexey Budankov @ 2020-10-12  8:50 UTC (permalink / raw)
  To: Arnaldo Carvalho de Melo, Jiri Olsa
  Cc: Namhyung Kim, Alexander Shishkin, Alexey Budankov, Andi Kleen,
	Adrian Hunter, Peter Zijlstra, Ingo Molnar, linux-kernel


Patch set provides threaded trace streaming for base perf record
operation. Provided streaming mode (--threads) mitigates profiling
data losses and resolves scalability issues of serial and asynchronous
(--aio) trace streaming modes on multicore server systems. The patch
set is based on the prototype [1], [2] and the most closely relates
to mode 3) "mode that creates thread for every monitored memory map".

The threaded mode executes one-to-one mapping of trace streaming threads
to mapped data buffers and streaming into per-CPU trace files located
at data directory. The data buffers and threads are affined to NUMA
nodes and monitored CPUs according to system topology. --cpu option
can be used to specify exact CPUs to be monitored.

Basic analysis of data directories is provided for perf report mode.
Raw dump (-D) and aggregated reports are available for data directories,
still with no memory consumption optimizations. However data directories
collected with --compression-level option enabled can be analyzed with
little less memory because trace files are unmaped from tool process
memory after loading collected data.

Provided streaming mode is available with Zstd compression/decompression
(--compression-level) and handling of external commands (--control).
AUX area tracing, related and derived modes like --snapshot or
--aux-sample are not enabled. --switch-output, --switch-output-event, 
--switch-max-files and --timestamp-filename options are not enabled.
Threaded trace streaming is not enabled for pipe mode. Asynchronous
(--aio) trace streaming and affinity (--affinity) modes are mutually
exclusive to threaded streaming mode.

See testing results and validation examples below.

[1] git clone https://git.kernel.org/pub/scm/linux/kernel/git/jolsa/perf.git -b perf/record_threads
[2] https://lore.kernel.org/lkml/20180913125450.21342-1-jolsa@kernel.org/

---
Alexey Budankov (15):
  perf session: introduce trace file path to be shown in raw trace dump
  perf report: output trace file name in raw trace dump
  perf data: open data directory in read access mode
  perf session: move reader object definition to header file
  perf session: introduce decompressor into trace reader object
  perf session: load data directory into tool process memory
  perf record: introduce trace file, compressor and stats in mmap object
  perf record: write trace data into mmap trace files
  perf record: introduce thread specific objects for trace streaming
  perf record: manage thread specific data array
  perf evlist: introduce evlist__ctlfd_update() to update ctl fd status
  perf record: introduce thread local variable for trace streaming
  perf record: stop threads in the end of trace streaming
  perf record: start threads in the beginning of trace streaming
  perf record: introduce --threads command line option

 tools/perf/Documentation/perf-record.txt |   7 +
 tools/perf/builtin-record.c              | 514 +++++++++++++++++++++--
 tools/perf/util/data.c                   |   4 +
 tools/perf/util/evlist.c                 |  16 +
 tools/perf/util/evlist.h                 |   1 +
 tools/perf/util/mmap.c                   |   6 +
 tools/perf/util/mmap.h                   |   5 +
 tools/perf/util/ordered-events.h         |   1 +
 tools/perf/util/record.h                 |   1 +
 tools/perf/util/session.c                | 150 ++++---
 tools/perf/util/session.h                |  28 ++
 tools/perf/util/tool.h                   |   3 +-
 12 files changed, 635 insertions(+), 101 deletions(-)

---
Testing results:

 $ perf test
 1: vmlinux symtab matches kallsyms                                 : Skip
 2: Detect openat syscall event                                     : Ok
 3: Detect openat syscall event on all cpus                         : Ok
 4: Read samples using the mmap interface                           : Ok
 5: Test data source output                                         : Ok
 6: Parse event definition strings                                  : Ok
 7: Simple expression parser                                        : Ok
 8: PERF_RECORD_* events & perf_sample fields                       : Ok
 9: Parse perf pmu format                                           : Ok
10: PMU events                                                      :
10.1: PMU event table sanity                                        : Ok
10.2: PMU event map aliases                                         : Ok
10.3: Parsing of PMU event table metrics                            : Skip (some metrics failed)
10.4: Parsing of PMU event table metrics with fake PMUs             : Ok
11: DSO data read                                                   : Ok
12: DSO data cache                                                  : Ok
13: DSO data reopen                                                 : FAILED!
14: Roundtrip evsel->name                                           : Ok
15: Parse sched tracepoints fields                                  : Ok
16: syscalls:sys_enter_openat event fields                          : Ok
17: Setup struct perf_event_attr                                    : Ok
18: Match and link multiple hists                                   : Ok
19: 'import perf' in python                                         : FAILED!
20: Breakpoint overflow signal handler                              : Ok
21: Breakpoint overflow sampling                                    : Ok
22: Breakpoint accounting                                           : Ok
23: Watchpoint                                                      :
23.1: Read Only Watchpoint                                          : Skip
23.2: Write Only Watchpoint                                         : Ok
23.3: Read / Write Watchpoint                                       : Ok
23.4: Modify Watchpoint                                             : Ok
24: Number of exit events of a simple workload                      : Ok
25: Software clock events period values                             : Ok
26: Object code reading                                             : Ok
27: Sample parsing                                                  : Ok
28: Use a dummy software event to keep tracking                     : Ok
29: Parse with no sample_id_all bit set                             : Ok
30: Filter hist entries                                             : Ok
31: Lookup mmap thread                                              : Ok
32: Share thread maps                                               : Ok
33: Sort output of hist entries                                     : Ok
34: Cumulate child hist entries                                     : Ok
35: Track with sched_switch                                         : Ok
36: Filter fds with revents mask in a fdarray                       : Ok
37: Add fd to a fdarray, making it autogrow                         : Ok
38: kmod_path__parse                                                : Ok
39: Thread map                                                      : Ok
40: LLVM search and compile                                         :
40.1: Basic BPF llvm compile                                        : Skip
40.2: kbuild searching                                              : Skip
40.3: Compile source for BPF prologue generation                    : Skip
40.4: Compile source for BPF relocation                             : Skip
41: Session topology                                                : Ok
42: BPF filter                                                      :
42.1: Basic BPF filtering                                           : Skip
42.2: BPF pinning                                                   : Skip
42.3: BPF prologue generation                                       : Skip
42.4: BPF relocation checker                                        : Skip
43: Synthesize thread map                                           : Ok
44: Remove thread map                                               : Ok
45: Synthesize cpu map                                              : Ok
46: Synthesize stat config                                          : Ok
47: Synthesize stat                                                 : Ok
48: Synthesize stat round                                           : Ok
49: Synthesize attr update                                          : Ok
50: Event times                                                     : Ok
51: Read backward ring buffer                                       : Ok
52: Print cpu map                                                   : Ok
53: Merge cpu map                                                   : Ok
54: Probe SDT events                                                : Ok
55: is_printable_array                                              : Ok
56: Print bitmap                                                    : Ok
57: perf hooks                                                      : Ok
58: builtin clang support                                           : Skip (not compiled in)
59: unit_number__scnprintf                                          : Ok
60: mem2node                                                        : Ok
61: time utils                                                      : Ok
62: Test jit_write_elf                                              : Ok
63: Test libpfm4 support                                            : Skip (not compiled in)
64: Test api io                                                     : Ok
65: maps__merge_in                                                  : Ok
66: Demangle Java                                                   : Ok
67: Parse and process metrics                                       : Ok
68: PE file support                                                 : Ok
69: Event expansion for cgroups                                     : Ok
70: x86 rdpmc                                                       : Ok
71: Convert perf time to TSC                                        : Ok
72: DWARF unwind                                                    : Ok
73: x86 instruction decoder - new instructions                      : Ok
74: Intel PT packet decoder                                         : Ok
75: x86 bp modify                                                   : Ok
76: Check open filename arg using perf trace + vfs_getname          : FAILED!
77: Add vfs_getname probe to get syscall args filenames             : FAILED!
78: probe libc's inet_pton & backtrace it with ping                 : Ok
79: Use vfs_getname probe to get syscall args filenames             : FAILED!
80: Zstd perf.data compression/decompression                        : Ok
81: Check Arm CoreSight trace data recording and synthesized samples: Skip

Validation:

  $ tools/perf/perf record --threads --compression-level=3 -e cycles,instructions -- ../../bench/matrix/linux/matrix.gcc.g.O3
  Addr of buf1 = 0x7f8af24ae010
  Offs of buf1 = 0x7f8af24ae180
  Addr of buf2 = 0x7f8af04ad010
  Offs of buf2 = 0x7f8af04ad1c0
  Addr of buf3 = 0x7f8aee4ac010
  Offs of buf3 = 0x7f8aee4ac100
  Addr of buf4 = 0x7f8aec4ab010
  Offs of buf4 = 0x7f8aec4ab140
  Threads #: 8 Pthreads
  Matrix size: 2048
  Using multiply kernel: multiply1
  Execution time = 24.579 seconds
  [ perf record: Woken up 972 times to write data ]
  [ perf record: Captured and wrote 11.280 MB perf.data (403 samples), compressed (original 66.670 MB, ratio is 5.926) ]

  $ perf report --header --stats
  # ========
  # captured on    : Wed Oct  7 19:07:50 2020
  # header version : 1
  # data offset    : 504
  # data size      : 30056
  # feat offset    : 30560
  # hostname : nntvtune39
  # os release : 5.8.8-100.fc31.x86_64
  # perf version : 4.13.rc5.gdf19b1347b82
  # arch : x86_64
  # nrcpus online : 8
  # nrcpus avail : 8
  # cpudesc : Intel(R) Core(TM) i7-6700K CPU @ 4.00GHz
  # cpuid : GenuineIntel,6,94,3
  # total memory : 16119484 kB
  # cmdline : /root/abudanko/kernel/acme/tools/perf/perf record --threads --compression-level=3 -e cycles,instructions -- ../../bench  /matrix/linux/matrix.gcc.g.O3 
  # event : name = cycles, , id = { 54562, 54563, 54564, 54565, 54566, 54567, 54568, 54569 }, size = 120, { sample_period, sample_freq } = 4000, sample_type = IP|TID|TIME|ID|>
  # event : name = instructions, , id = { 54570, 54571, 54572, 54573, 54574, 54575, 54576, 54577 }, size = 120, config = 0x1, { sample_period, sample_freq } = 4000, sample_ty>
  # sibling sockets : 0-7
  # sibling dies    : 0-7
  # sibling threads : 0,4
  # sibling threads : 1,5
  # sibling threads : 2,6
  # sibling threads : 3,7
  # CPU 0: Core ID 0, Die ID 0, Socket ID 0
  # CPU 1: Core ID 1, Die ID 0, Socket ID 0
  # CPU 2: Core ID 2, Die ID 0, Socket ID 0
  # CPU 3: Core ID 3, Die ID 0, Socket ID 0
  # CPU 4: Core ID 0, Die ID 0, Socket ID 0
  # CPU 5: Core ID 1, Die ID 0, Socket ID 0
  # CPU 6: Core ID 2, Die ID 0, Socket ID 0
  # CPU 7: Core ID 3, Die ID 0, Socket ID 0
  # node0 meminfo  : total = 16119484 kB, free = 933088 kB
  # node0 cpu list : 0-7
  # pmu mappings: intel_pt = 8, software = 1, power = 19, uprobe = 7, uncore_imc = 11, cpu = 4, cstate_core = 17, uncore_cbox_2 = 14, breakpoint = 5, uncore_cbox_0 = 12, trac>
  # CPU cache info:
  #  L1 Data                 32K [0,4]
  #  L1 Instruction          32K [0,4]
  #  L1 Data                 32K [1,5]
  #  L1 Instruction          32K [1,5]
  #  L1 Data                 32K [1,5]
  #  L1 Instruction          32K [1,5]
  #  L1 Data                 32K [2,6]
  #  L1 Instruction          32K [2,6]
  #  L1 Data                 32K [3,7]
  #  L1 Instruction          32K [3,7]
  #  L2 Unified             256K [0,4]
  #  L2 Unified             256K [1,5]
  #  L2 Unified             256K [2,6]
  #  L2 Unified             256K [3,7]
  #  L3 Unified            8192K [0-7]
  # time of first sample : 0.000000
  # time of last sample : 0.000000
  # sample duration :      0.000 ms
  # memory nodes (nr 1, block size 0x8000000):
  #    0 [16G]: 0-23,32-135
  # directory data version : 1
  # bpf_prog_info 835: bpf_prog_6deef7357e7b4530 addr 0xffffffffc07c5138 size 66
  # bpf_prog_info 836: bpf_prog_6deef7357e7b4530 addr 0xffffffffc0804558 size 66
  # bpf_prog_info 837: bpf_prog_84efc2eecc454ca6 addr 0xffffffffc080804c size 373
  # bpf_prog_info 838: bpf_prog_6deef7357e7b4530 addr 0xffffffffc0817f50 size 66
  # bpf_prog_info 839: bpf_prog_6deef7357e7b4530 addr 0xffffffffc082aee8 size 66
  # bpf_prog_info 840: bpf_prog_5a2b06eab81b8f51 addr 0xffffffffc082c0fc size 1132
  # bpf_prog_info 841: bpf_prog_8a642ed2819e9acc addr 0xffffffffc0aac380 size 62
  # bpf_prog_info 842: bpf_prog_8a642ed2819e9acc addr 0xffffffffc0d4c970 size 62
  # bpf_prog_info 843: bpf_prog_6deef7357e7b4530 addr 0xffffffffc0d4e8bc size 66
  # bpf_prog_info 844: bpf_prog_6deef7357e7b4530 addr 0xffffffffc0d5050c size 66
  # bpf_prog_info 845: bpf_prog_3ff695c0afc2579c addr 0xffffffffc0d56098 size 1552
  # bpf_prog_info 846: bpf_prog_6deef7357e7b4530 addr 0xffffffffc0d58364 size 66
  # bpf_prog_info 847: bpf_prog_6deef7357e7b4530 addr 0xffffffffc0d6ec00 size 66
  # compressed : Zstd, level = 3, ratio = 6
  # cpu pmu capabilities: branches=32, max_precise=3, pmu_name=skylake
  # missing features: TRACING_DATA BUILD_ID BRANCH_STACK GROUP_DESC AUXTRACE STAT CLOCKID CLOCK_DATA 
  # ========
  #
  Aggregated stats:
           TOTAL events:    1457107
            MMAP events:         98
            LOST events:          0
            COMM events:          2
            EXIT events:          6
        THROTTLE events:          0
      UNTHROTTLE events:          0
            FORK events:          8
            READ events:          0
          SAMPLE events:    1456388
           MMAP2 events:          7
             AUX events:          0
    ITRACE_START events:          0
    LOST_SAMPLES events:          0
          SWITCH events:          0
 SWITCH_CPU_WIDE events:          0
      NAMESPACES events:          0
         KSYMBOL events:         13
       BPF_EVENT events:         13
          CGROUP events:        166
       TEXT_POKE events:          0
            ATTR events:          0
      EVENT_TYPE events:          0
    TRACING_DATA events:          0
        BUILD_ID events:          0
  FINISHED_ROUND events:          0
        ID_INDEX events:          0
   AUXTRACE_INFO events:          0
        AUXTRACE events:          0
  AUXTRACE_ERROR events:          0
      THREAD_MAP events:          1
         CPU_MAP events:          1
     STAT_CONFIG events:          0
            STAT events:          0
      STAT_ROUND events:          0
    EVENT_UPDATE events:          0
       TIME_CONV events:          1
         FEATURE events:          0
      COMPRESSED events:        403

  $ tools/perf/perf record --threads --compression-level=3 -g --call-graph dwarf,1024 -e cycles,instructions -- ../../bench/matrix/linux/matrix.gcc.g.O3
  Addr of buf1 = 0x7f15e8b89010
  Offs of buf1 = 0x7f15e8b89180
  Addr of buf2 = 0x7f15e6b88010
  Offs of buf2 = 0x7f15e6b881c0
  Addr of buf3 = 0x7f15e4b87010
  Offs of buf3 = 0x7f15e4b87100
  Addr of buf4 = 0x7f15e2b86010
  Offs of buf4 = 0x7f15e2b86140
  Threads #: 8 Pthreads
  Matrix size: 2048
  Using multiply kernel: multiply1
  Execution time = 24.281 seconds
  [ perf record: Woken up 6923 times to write data ]
  [ perf record: Captured and wrote 34.933 MB perf.data (10414 samples), compressed (original 1728.795 MB, ratio is 49.530) ]

  $ perf report --stdio
  # To display the perf.data header info, please use --header/--header-only options.
  #
  #
  # Total Lost Samples: 0
  #
  # Samples: 708K of event 'cycles'
  # Event count (approx.): 277718834163
  #
  # Children      Self  Command          Shared Object             Symbol                                                  
  # ........  ........  ...............  ........................  ........................................................
  #
    99.97%     0.00%  matrix.gcc.g.O3  libc-2.30.so              [.] __GI___clone (inlined)
            |
            ---__GI___clone (inlined)
               |          
                --99.97%--start_thread
                          |          
                           --99.97%--ThreadFunction
                                     multiply1
                                     |          
                                      --0.96%--asm_sysvec_apic_timer_interrupt
                                                |          
                                                 --0.70%--subflow_syn_recv_sock
                                                           |          
                                                            --0.60%--__sysvec_apic_timer_interrupt
                                                                      |          
                                                                       --0.57%--hrtimer_interrupt

    99.97%     0.00%  matrix.gcc.g.O3  libpthread-2.30.so        [.] start_thread
            |
            ---start_thread
               |          
                --99.97%--ThreadFunction
                          |          
                           --99.97%--multiply1
                                     |          
                                      --0.96%--asm_sysvec_apic_timer_interrupt
                                                |          
                                                 --0.70%--subflow_syn_recv_sock
                                                           |          
                                                            --0.60%--__sysvec_apic_timer_interrupt
                                                                      |          
                                                                       --0.57%--hrtimer_interrupt

    99.97%     0.00%  matrix.gcc.g.O3  matrix.gcc.g.O3           [.] ThreadFunction
            |
            ---ThreadFunction
               |          
                --99.97%--multiply1
                          |          
                           --0.96%--asm_sysvec_apic_timer_interrupt
                                     |          
                                      --0.70%--subflow_syn_recv_sock
                                                |          
                                                 --0.60%--__sysvec_apic_timer_interrupt
                                                           |          
                                                            --0.57%--hrtimer_interrupt

    99.97%    98.68%  matrix.gcc.g.O3  matrix.gcc.g.O3           [.] multiply1
            |          
            |--98.68%--__GI___clone (inlined)
            |          start_thread
            |          ThreadFunction
            |          multiply1
            |          
             --1.28%--multiply1
                       |          
                        --0.96%--asm_sysvec_apic_timer_interrupt
                                  |          
                                   --0.70%--subflow_syn_recv_sock
                                             |          
                                              --0.60%--__sysvec_apic_timer_interrupt
                                                        |          
                                                         --0.57%--hrtimer_interrupt

     0.97%     0.05%  matrix.gcc.g.O3  [kernel.kallsyms]         [k] asm_sysvec_apic_timer_interrupt
            |          
             --0.91%--asm_sysvec_apic_timer_interrupt
                       |          
                        --0.70%--subflow_syn_recv_sock


-- 
2.24.1


^ permalink raw reply	[flat|nested] 45+ messages in thread

* [PATCH v1 01/15] perf session: introduce trace file path to be shown in raw trace dump
  2020-10-12  8:50 [PATCH v1 00/15] Introduce threaded trace streaming for basic perf record operation Alexey Budankov
@ 2020-10-12  8:53 ` Alexey Budankov
  2020-10-12  8:54 ` [PATCH v1 02/15] perf report: output trace file name " Alexey Budankov
                   ` (15 subsequent siblings)
  16 siblings, 0 replies; 45+ messages in thread
From: Alexey Budankov @ 2020-10-12  8:53 UTC (permalink / raw)
  To: Arnaldo Carvalho de Melo, Jiri Olsa
  Cc: Namhyung Kim, Alexander Shishkin, Andi Kleen, Adrian Hunter,
	Peter Zijlstra, Ingo Molnar, linux-kernel


Extend reader, ordered_event and decomp objects to contain path
of a trace file being displayed.

Signed-off-by: Alexey Budankov <alexey.budankov@linux.intel.com>
---
 tools/perf/util/ordered-events.h | 1 +
 tools/perf/util/session.c        | 2 ++
 tools/perf/util/session.h        | 1 +
 3 files changed, 4 insertions(+)

diff --git a/tools/perf/util/ordered-events.h b/tools/perf/util/ordered-events.h
index 75345946c4b9..42c9764c6b5b 100644
--- a/tools/perf/util/ordered-events.h
+++ b/tools/perf/util/ordered-events.h
@@ -9,6 +9,7 @@ struct perf_sample;
 struct ordered_event {
 	u64			timestamp;
 	u64			file_offset;
+	const char		*file_path;
 	union perf_event	*event;
 	struct list_head	list;
 };
diff --git a/tools/perf/util/session.c b/tools/perf/util/session.c
index 7a5f03764702..4478ddae485b 100644
--- a/tools/perf/util/session.c
+++ b/tools/perf/util/session.c
@@ -2119,6 +2119,7 @@ typedef s64 (*reader_cb_t)(struct perf_session *session,
 
 struct reader {
 	int		 fd;
+	const char	 *path;
 	u64		 data_size;
 	u64		 data_offset;
 	reader_cb_t	 process;
@@ -2241,6 +2242,7 @@ static int __perf_session__process_events(struct perf_session *session)
 		.data_size	= session->header.data_size,
 		.data_offset	= session->header.data_offset,
 		.process	= process_simple,
+		.path		= session->data->file.path,
 	};
 	struct ordered_events *oe = &session->ordered_events;
 	struct perf_tool *tool = session->tool;
diff --git a/tools/perf/util/session.h b/tools/perf/util/session.h
index f76480166d38..378ffc3e2809 100644
--- a/tools/perf/util/session.h
+++ b/tools/perf/util/session.h
@@ -46,6 +46,7 @@ struct perf_session {
 struct decomp {
 	struct decomp *next;
 	u64 file_pos;
+	const char *file_path;
 	size_t mmap_len;
 	u64 head;
 	size_t size;
-- 
2.24.1


^ permalink raw reply related	[flat|nested] 45+ messages in thread

* [PATCH v1 02/15] perf report: output trace file name in raw trace dump
  2020-10-12  8:50 [PATCH v1 00/15] Introduce threaded trace streaming for basic perf record operation Alexey Budankov
  2020-10-12  8:53 ` [PATCH v1 01/15] perf session: introduce trace file path to be shown in raw trace dump Alexey Budankov
@ 2020-10-12  8:54 ` Alexey Budankov
  2020-10-12 16:01   ` Andi Kleen
  2020-10-13 19:54   ` Jiri Olsa
  2020-10-12  8:55 ` [PATCH v1 03/15] perf data: open data directory in read access mode Alexey Budankov
                   ` (14 subsequent siblings)
  16 siblings, 2 replies; 45+ messages in thread
From: Alexey Budankov @ 2020-10-12  8:54 UTC (permalink / raw)
  To: Arnaldo Carvalho de Melo, Jiri Olsa
  Cc: Namhyung Kim, Alexander Shishkin, Andi Kleen, Adrian Hunter,
	Peter Zijlstra, Ingo Molnar, linux-kernel


Output path of a trace file into raw dump (-D) <file_offset>@<path/file>.
Print offset of PERF_RECORD_COMPRESSED record instead of zero for
decompressed records:
  0x2226a@perf.data [0x30]: event: 9
or
  0x15cc36@perf.data/data.7 [0x30]: event: 9

Signed-off-by: Alexey Budankov <alexey.budankov@linux.intel.com>
---
 tools/perf/util/session.c | 75 +++++++++++++++++++++++----------------
 tools/perf/util/tool.h    |  3 +-
 2 files changed, 46 insertions(+), 32 deletions(-)

diff --git a/tools/perf/util/session.c b/tools/perf/util/session.c
index 4478ddae485b..6f09d506b2f6 100644
--- a/tools/perf/util/session.c
+++ b/tools/perf/util/session.c
@@ -36,7 +36,8 @@
 
 #ifdef HAVE_ZSTD_SUPPORT
 static int perf_session__process_compressed_event(struct perf_session *session,
-						  union perf_event *event, u64 file_offset)
+						  union perf_event *event, u64 file_offset,
+						  const char *file_path)
 {
 	void *src;
 	size_t decomp_size, src_size;
@@ -58,6 +59,7 @@ static int perf_session__process_compressed_event(struct perf_session *session,
 	}
 
 	decomp->file_pos = file_offset;
+	decomp->file_path = file_path;
 	decomp->mmap_len = mmap_len;
 	decomp->head = 0;
 
@@ -98,7 +100,8 @@ static int perf_session__process_compressed_event(struct perf_session *session,
 static int perf_session__deliver_event(struct perf_session *session,
 				       union perf_event *event,
 				       struct perf_tool *tool,
-				       u64 file_offset);
+				       u64 file_offset,
+				       const char *file_path);
 
 static int perf_session__open(struct perf_session *session)
 {
@@ -180,7 +183,8 @@ static int ordered_events__deliver_event(struct ordered_events *oe,
 						    ordered_events);
 
 	return perf_session__deliver_event(session, event->event,
-					   session->tool, event->file_offset);
+					   session->tool, event->file_offset,
+					   event->file_path);
 }
 
 struct perf_session *perf_session__new(struct perf_data *data,
@@ -452,7 +456,8 @@ static int process_stat_round_stub(struct perf_session *perf_session __maybe_unu
 
 static int perf_session__process_compressed_event_stub(struct perf_session *session __maybe_unused,
 						       union perf_event *event __maybe_unused,
-						       u64 file_offset __maybe_unused)
+						       u64 file_offset __maybe_unused,
+						       const char *file_path __maybe_unused)
 {
        dump_printf(": unhandled!\n");
        return 0;
@@ -1227,13 +1232,14 @@ static void sample_read__printf(struct perf_sample *sample, u64 read_format)
 }
 
 static void dump_event(struct evlist *evlist, union perf_event *event,
-		       u64 file_offset, struct perf_sample *sample)
+		       u64 file_offset, struct perf_sample *sample,
+		       const char *file_path)
 {
 	if (!dump_trace)
 		return;
 
-	printf("\n%#" PRIx64 " [%#x]: event: %d\n",
-	       file_offset, event->header.size, event->header.type);
+	printf("\n%#" PRIx64 "@%s [%#x]: event: %d\n",
+	       file_offset, file_path, event->header.size, event->header.type);
 
 	trace_event(event);
 	if (event->header.type == PERF_RECORD_SAMPLE && evlist->trace_event_sample_raw)
@@ -1424,12 +1430,13 @@ static int machines__deliver_event(struct machines *machines,
 				   struct evlist *evlist,
 				   union perf_event *event,
 				   struct perf_sample *sample,
-				   struct perf_tool *tool, u64 file_offset)
+				   struct perf_tool *tool, u64 file_offset,
+				   const char *file_path)
 {
 	struct evsel *evsel;
 	struct machine *machine;
 
-	dump_event(evlist, event, file_offset, sample);
+	dump_event(evlist, event, file_offset, sample, file_path);
 
 	evsel = perf_evlist__id2evsel(evlist, sample->id);
 
@@ -1506,7 +1513,8 @@ static int machines__deliver_event(struct machines *machines,
 static int perf_session__deliver_event(struct perf_session *session,
 				       union perf_event *event,
 				       struct perf_tool *tool,
-				       u64 file_offset)
+				       u64 file_offset,
+				       const char *file_path)
 {
 	struct perf_sample sample;
 	int ret;
@@ -1524,7 +1532,7 @@ static int perf_session__deliver_event(struct perf_session *session,
 		return 0;
 
 	ret = machines__deliver_event(&session->machines, session->evlist,
-				      event, &sample, tool, file_offset);
+				      event, &sample, tool, file_offset, file_path);
 
 	if (dump_trace && sample.aux_sample.size)
 		auxtrace__dump_auxtrace_sample(session, &sample);
@@ -1534,7 +1542,8 @@ static int perf_session__deliver_event(struct perf_session *session,
 
 static s64 perf_session__process_user_event(struct perf_session *session,
 					    union perf_event *event,
-					    u64 file_offset)
+					    u64 file_offset,
+					    const char *file_path)
 {
 	struct ordered_events *oe = &session->ordered_events;
 	struct perf_tool *tool = session->tool;
@@ -1544,7 +1553,7 @@ static s64 perf_session__process_user_event(struct perf_session *session,
 
 	if (event->header.type != PERF_RECORD_COMPRESSED ||
 	    tool->compressed == perf_session__process_compressed_event_stub)
-		dump_event(session->evlist, event, file_offset, &sample);
+		dump_event(session->evlist, event, file_offset, &sample, file_path);
 
 	/* These events are processed right away */
 	switch (event->header.type) {
@@ -1603,9 +1612,9 @@ static s64 perf_session__process_user_event(struct perf_session *session,
 	case PERF_RECORD_HEADER_FEATURE:
 		return tool->feature(session, event);
 	case PERF_RECORD_COMPRESSED:
-		err = tool->compressed(session, event, file_offset);
+		err = tool->compressed(session, event, file_offset, file_path);
 		if (err)
-			dump_event(session->evlist, event, file_offset, &sample);
+			dump_event(session->evlist, event, file_offset, &sample, file_path);
 		return err;
 	default:
 		return -EINVAL;
@@ -1622,9 +1631,9 @@ int perf_session__deliver_synth_event(struct perf_session *session,
 	events_stats__inc(&evlist->stats, event->header.type);
 
 	if (event->header.type >= PERF_RECORD_USER_TYPE_START)
-		return perf_session__process_user_event(session, event, 0);
+		return perf_session__process_user_event(session, event, 0, NULL);
 
-	return machines__deliver_event(&session->machines, evlist, event, sample, tool, 0);
+	return machines__deliver_event(&session->machines, evlist, event, sample, tool, 0, NULL);
 }
 
 static void event_swap(union perf_event *event, bool sample_id_all)
@@ -1720,7 +1729,8 @@ int perf_session__peek_events(struct perf_session *session, u64 offset,
 }
 
 static s64 perf_session__process_event(struct perf_session *session,
-				       union perf_event *event, u64 file_offset)
+				       union perf_event *event, u64 file_offset,
+				       const char *file_path)
 {
 	struct evlist *evlist = session->evlist;
 	struct perf_tool *tool = session->tool;
@@ -1735,7 +1745,7 @@ static s64 perf_session__process_event(struct perf_session *session,
 	events_stats__inc(&evlist->stats, event->header.type);
 
 	if (event->header.type >= PERF_RECORD_USER_TYPE_START)
-		return perf_session__process_user_event(session, event, file_offset);
+		return perf_session__process_user_event(session, event, file_offset, file_path);
 
 	if (tool->ordered_events) {
 		u64 timestamp = -1ULL;
@@ -1749,7 +1759,7 @@ static s64 perf_session__process_event(struct perf_session *session,
 			return ret;
 	}
 
-	return perf_session__deliver_event(session, event, tool, file_offset);
+	return perf_session__deliver_event(session, event, tool, file_offset, file_path);
 }
 
 void perf_event_header__bswap(struct perf_event_header *hdr)
@@ -1987,7 +1997,8 @@ static int __perf_session__process_pipe_events(struct perf_session *session)
 		}
 	}
 
-	if ((skip = perf_session__process_event(session, event, head)) < 0) {
+	skip = perf_session__process_event(session, event, head, "pipe");
+	if (skip < 0) {
 		pr_err("%#" PRIx64 " [%#x]: failed to process type: %d\n",
 		       head, event->header.size, event->header.type);
 		err = -EINVAL;
@@ -2068,7 +2079,7 @@ fetch_decomp_event(u64 head, size_t mmap_size, char *buf, bool needs_swap)
 static int __perf_session__process_decomp_events(struct perf_session *session)
 {
 	s64 skip;
-	u64 size, file_pos = 0;
+	u64 size;
 	struct decomp *decomp = session->decomp_last;
 
 	if (!decomp)
@@ -2082,9 +2093,9 @@ static int __perf_session__process_decomp_events(struct perf_session *session)
 			break;
 
 		size = event->header.size;
-
-		if (size < sizeof(struct perf_event_header) ||
-		    (skip = perf_session__process_event(session, event, file_pos)) < 0) {
+		skip = perf_session__process_event(session, event, decomp->file_pos,
+						   decomp->file_path);
+		if (size < sizeof(struct perf_event_header) || skip < 0) {
 			pr_err("%#" PRIx64 " [%#x]: failed to process type: %d\n",
 				decomp->file_pos + decomp->head, event->header.size, event->header.type);
 			return -EINVAL;
@@ -2115,7 +2126,8 @@ struct reader;
 
 typedef s64 (*reader_cb_t)(struct perf_session *session,
 			   union perf_event *event,
-			   u64 file_offset);
+			   u64 file_offset,
+			   const char *file_path);
 
 struct reader {
 	int		 fd;
@@ -2198,9 +2210,9 @@ reader__process_events(struct reader *rd, struct perf_session *session,
 	skip = -EINVAL;
 
 	if (size < sizeof(struct perf_event_header) ||
-	    (skip = rd->process(session, event, file_pos)) < 0) {
-		pr_err("%#" PRIx64 " [%#x]: failed to process type: %d [%s]\n",
-		       file_offset + head, event->header.size,
+	    (skip = rd->process(session, event, file_pos, rd->path)) < 0) {
+		pr_err("%#" PRIx64 " [%s] [%#x]: failed to process type: %d [%s]\n",
+		       file_offset + head, rd->path, event->header.size,
 		       event->header.type, strerror(-skip));
 		err = skip;
 		goto out;
@@ -2230,9 +2242,10 @@ reader__process_events(struct reader *rd, struct perf_session *session,
 
 static s64 process_simple(struct perf_session *session,
 			  union perf_event *event,
-			  u64 file_offset)
+			  u64 file_offset,
+			  const char *file_path)
 {
-	return perf_session__process_event(session, event, file_offset);
+	return perf_session__process_event(session, event, file_offset, file_path);
 }
 
 static int __perf_session__process_events(struct perf_session *session)
diff --git a/tools/perf/util/tool.h b/tools/perf/util/tool.h
index bbbc0dcd461f..c966531d3eca 100644
--- a/tools/perf/util/tool.h
+++ b/tools/perf/util/tool.h
@@ -28,7 +28,8 @@ typedef int (*event_attr_op)(struct perf_tool *tool,
 
 typedef int (*event_op2)(struct perf_session *session, union perf_event *event);
 typedef s64 (*event_op3)(struct perf_session *session, union perf_event *event);
-typedef int (*event_op4)(struct perf_session *session, union perf_event *event, u64 data);
+typedef int (*event_op4)(struct perf_session *session, union perf_event *event, u64 data,
+			 const char *str);
 
 typedef int (*event_oe)(struct perf_tool *tool, union perf_event *event,
 			struct ordered_events *oe);
-- 
2.24.1


^ permalink raw reply related	[flat|nested] 45+ messages in thread

* [PATCH v1 03/15] perf data: open data directory in read access mode
  2020-10-12  8:50 [PATCH v1 00/15] Introduce threaded trace streaming for basic perf record operation Alexey Budankov
  2020-10-12  8:53 ` [PATCH v1 01/15] perf session: introduce trace file path to be shown in raw trace dump Alexey Budankov
  2020-10-12  8:54 ` [PATCH v1 02/15] perf report: output trace file name " Alexey Budankov
@ 2020-10-12  8:55 ` Alexey Budankov
  2020-10-12 16:03   ` Andi Kleen
  2020-10-12  8:56 ` [PATCH v1 04/15] perf session: move reader object definition to header file Alexey Budankov
                   ` (13 subsequent siblings)
  16 siblings, 1 reply; 45+ messages in thread
From: Alexey Budankov @ 2020-10-12  8:55 UTC (permalink / raw)
  To: Arnaldo Carvalho de Melo, Jiri Olsa
  Cc: Namhyung Kim, Alexander Shishkin, Andi Kleen, Adrian Hunter,
	Peter Zijlstra, Ingo Molnar, linux-kernel


Open files located at data directory in case of read access mode.

Signed-off-by: Alexey Budankov <alexey.budankov@linux.intel.com>
---
 tools/perf/util/data.c | 4 ++++
 1 file changed, 4 insertions(+)

diff --git a/tools/perf/util/data.c b/tools/perf/util/data.c
index c47aa34fdc0a..6ad61ac6ba67 100644
--- a/tools/perf/util/data.c
+++ b/tools/perf/util/data.c
@@ -321,6 +321,10 @@ static int open_dir(struct perf_data *data)
 		return -1;
 
 	ret = open_file(data);
+	if (!ret && perf_data__is_dir(data)) {
+		if (perf_data__is_read(data))
+			ret = perf_data__open_dir(data);
+	}
 
 	/* Cleanup whatever we managed to create so far. */
 	if (ret && perf_data__is_write(data))
-- 
2.24.1



^ permalink raw reply related	[flat|nested] 45+ messages in thread

* [PATCH v1 04/15] perf session: move reader object definition to header file
  2020-10-12  8:50 [PATCH v1 00/15] Introduce threaded trace streaming for basic perf record operation Alexey Budankov
                   ` (2 preceding siblings ...)
  2020-10-12  8:55 ` [PATCH v1 03/15] perf data: open data directory in read access mode Alexey Budankov
@ 2020-10-12  8:56 ` Alexey Budankov
  2020-10-12  8:57 ` [PATCH v1 05/15] perf session: introduce decompressor into trace reader object Alexey Budankov
                   ` (12 subsequent siblings)
  16 siblings, 0 replies; 45+ messages in thread
From: Alexey Budankov @ 2020-10-12  8:56 UTC (permalink / raw)
  To: Arnaldo Carvalho de Melo, Jiri Olsa
  Cc: Namhyung Kim, Alexander Shishkin, Andi Kleen, Adrian Hunter,
	Peter Zijlstra, Ingo Molnar, linux-kernel


Move definition of reader to session header file to be shared
among different source files. Introduce reference to active
reader object from session object.

Signed-off-by: Alexey Budankov <alexey.budankov@linux.intel.com>
---
 tools/perf/util/session.c | 27 ---------------------------
 tools/perf/util/session.h | 25 +++++++++++++++++++++++++
 2 files changed, 25 insertions(+), 27 deletions(-)

diff --git a/tools/perf/util/session.c b/tools/perf/util/session.c
index 6f09d506b2f6..911b2dbcd0ac 100644
--- a/tools/perf/util/session.c
+++ b/tools/perf/util/session.c
@@ -2110,33 +2110,6 @@ static int __perf_session__process_decomp_events(struct perf_session *session)
 	return 0;
 }
 
-/*
- * On 64bit we can mmap the data file in one go. No need for tiny mmap
- * slices. On 32bit we use 32MB.
- */
-#if BITS_PER_LONG == 64
-#define MMAP_SIZE ULLONG_MAX
-#define NUM_MMAPS 1
-#else
-#define MMAP_SIZE (32 * 1024 * 1024ULL)
-#define NUM_MMAPS 128
-#endif
-
-struct reader;
-
-typedef s64 (*reader_cb_t)(struct perf_session *session,
-			   union perf_event *event,
-			   u64 file_offset,
-			   const char *file_path);
-
-struct reader {
-	int		 fd;
-	const char	 *path;
-	u64		 data_size;
-	u64		 data_offset;
-	reader_cb_t	 process;
-};
-
 static int
 reader__process_events(struct reader *rd, struct perf_session *session,
 		       struct ui_progress *prog)
diff --git a/tools/perf/util/session.h b/tools/perf/util/session.h
index 378ffc3e2809..abdb8518a81f 100644
--- a/tools/perf/util/session.h
+++ b/tools/perf/util/session.h
@@ -20,6 +20,30 @@ struct thread;
 struct auxtrace;
 struct itrace_synth_opts;
 
+/*
+ * On 64bit we can mmap the data file in one go. No need for tiny mmap
+ * slices. On 32bit we use 32MB.
+ */
+#if BITS_PER_LONG == 64
+#define MMAP_SIZE ULLONG_MAX
+#define NUM_MMAPS 1
+#else
+#define MMAP_SIZE (32 * 1024 * 1024ULL)
+#define NUM_MMAPS 128
+#endif
+
+typedef s64 (*reader_cb_t)(struct perf_session *session,
+			   union perf_event *event,
+			   u64 file_offset, const char *file_path);
+
+struct reader {
+	int		 fd;
+	const char	 *path;
+	u64		 data_size;
+	u64		 data_offset;
+	reader_cb_t	 process;
+};
+
 struct perf_session {
 	struct perf_header	header;
 	struct machines		machines;
@@ -41,6 +65,7 @@ struct perf_session {
 	struct zstd_data	zstd_data;
 	struct decomp		*decomp;
 	struct decomp		*decomp_last;
+	struct reader		*reader;
 };
 
 struct decomp {
-- 
2.24.1



^ permalink raw reply related	[flat|nested] 45+ messages in thread

* [PATCH v1 05/15] perf session: introduce decompressor into trace reader object
  2020-10-12  8:50 [PATCH v1 00/15] Introduce threaded trace streaming for basic perf record operation Alexey Budankov
                   ` (3 preceding siblings ...)
  2020-10-12  8:56 ` [PATCH v1 04/15] perf session: move reader object definition to header file Alexey Budankov
@ 2020-10-12  8:57 ` Alexey Budankov
  2020-10-12 16:05   ` Andi Kleen
  2020-10-12  8:58 ` [PATCH v1 06/15] perf session: load data directory into tool process memory Alexey Budankov
                   ` (11 subsequent siblings)
  16 siblings, 1 reply; 45+ messages in thread
From: Alexey Budankov @ 2020-10-12  8:57 UTC (permalink / raw)
  To: Arnaldo Carvalho de Melo, Jiri Olsa
  Cc: Namhyung Kim, Alexander Shishkin, Andi Kleen, Adrian Hunter,
	Peter Zijlstra, Ingo Molnar, linux-kernel


Introduce decompressor to trace reader object so that decompression
could be executed on per trace file basis separately for every
trace file located at data directory.

Signed-off-by: Alexey Budankov <alexey.budankov@linux.intel.com>
---
 tools/perf/util/session.c | 4 +++-
 tools/perf/util/session.h | 1 +
 2 files changed, 4 insertions(+), 1 deletion(-)

diff --git a/tools/perf/util/session.c b/tools/perf/util/session.c
index 911b2dbcd0ac..6afc670fdf0c 100644
--- a/tools/perf/util/session.c
+++ b/tools/perf/util/session.c
@@ -44,6 +44,8 @@ static int perf_session__process_compressed_event(struct perf_session *session,
 	u64 decomp_last_rem = 0;
 	size_t mmap_len, decomp_len = session->header.env.comp_mmap_len;
 	struct decomp *decomp, *decomp_last = session->decomp_last;
+	struct zstd_data *zstd_data = session->reader ?
+		&(session->reader->zstd_data) : &(session->zstd_data);
 
 	if (decomp_last) {
 		decomp_last_rem = decomp_last->size - decomp_last->head;
@@ -71,7 +73,7 @@ static int perf_session__process_compressed_event(struct perf_session *session,
 	src = (void *)event + sizeof(struct perf_record_compressed);
 	src_size = event->pack.header.size - sizeof(struct perf_record_compressed);
 
-	decomp_size = zstd_decompress_stream(&(session->zstd_data), src, src_size,
+	decomp_size = zstd_decompress_stream(zstd_data, src, src_size,
 				&(decomp->data[decomp_last_rem]), decomp_len - decomp_last_rem);
 	if (!decomp_size) {
 		munmap(decomp, mmap_len);
diff --git a/tools/perf/util/session.h b/tools/perf/util/session.h
index abdb8518a81f..4fc9ccdf7970 100644
--- a/tools/perf/util/session.h
+++ b/tools/perf/util/session.h
@@ -42,6 +42,7 @@ struct reader {
 	u64		 data_size;
 	u64		 data_offset;
 	reader_cb_t	 process;
+	struct zstd_data zstd_data;
 };
 
 struct perf_session {
-- 
2.24.1



^ permalink raw reply related	[flat|nested] 45+ messages in thread

* [PATCH v1 06/15] perf session: load data directory into tool process memory
  2020-10-12  8:50 [PATCH v1 00/15] Introduce threaded trace streaming for basic perf record operation Alexey Budankov
                   ` (4 preceding siblings ...)
  2020-10-12  8:57 ` [PATCH v1 05/15] perf session: introduce decompressor into trace reader object Alexey Budankov
@ 2020-10-12  8:58 ` Alexey Budankov
  2020-10-12 16:09   ` Andi Kleen
  2020-10-12  8:59 ` [PATCH v1 07/15] perf record: introduce trace file, compressor and stats in mmap object Alexey Budankov
                   ` (10 subsequent siblings)
  16 siblings, 1 reply; 45+ messages in thread
From: Alexey Budankov @ 2020-10-12  8:58 UTC (permalink / raw)
  To: Arnaldo Carvalho de Melo, Jiri Olsa
  Cc: Namhyung Kim, Alexander Shishkin, Andi Kleen, Adrian Hunter,
	Peter Zijlstra, Ingo Molnar, linux-kernel


Read trace files located at data directory into tool process memory.
Basic analysis support of data directories is provided for report
mode. Raw dump (-D) and aggregated reports are available for data
directories, still with no memory consumption optimizations. However
data directories collected with --compression-level option enabled
can be analyzed with little less memory because trace files are
unmaped from tool process memory after loading collected data.
The implementation is based on the prototype [1], [2].

[1] git clone https://git.kernel.org/pub/scm/linux/kernel/git/jolsa/perf.git -b perf/record_threads
[2] https://lore.kernel.org/lkml/20180913125450.21342-1-jolsa@kernel.org/

Signed-off-by: Alexey Budankov <alexey.budankov@linux.intel.com>
---
 tools/perf/util/session.c | 48 +++++++++++++++++++++++++++++++++++++++
 tools/perf/util/session.h |  1 +
 2 files changed, 49 insertions(+)

diff --git a/tools/perf/util/session.c b/tools/perf/util/session.c
index 6afc670fdf0c..0752eec19813 100644
--- a/tools/perf/util/session.c
+++ b/tools/perf/util/session.c
@@ -2212,6 +2212,17 @@ reader__process_events(struct reader *rd, struct perf_session *session,
 		goto more;
 
 out:
+	if (rd->unmap_file) {
+		int i;
+
+		for (i = 0; i < NUM_MMAPS; i++) {
+			if (mmaps[i]) {
+				munmap(mmaps[i], mmap_size);
+				mmaps[i] = NULL;
+			}
+		}
+	}
+
 	return err;
 }
 
@@ -2231,6 +2242,7 @@ static int __perf_session__process_events(struct perf_session *session)
 		.data_offset	= session->header.data_offset,
 		.process	= process_simple,
 		.path		= session->data->file.path,
+		.unmap_file	= false,
 	};
 	struct ordered_events *oe = &session->ordered_events;
 	struct perf_tool *tool = session->tool;
@@ -2247,6 +2259,42 @@ static int __perf_session__process_events(struct perf_session *session)
 	err = reader__process_events(&rd, session, &prog);
 	if (err)
 		goto out_err;
+
+	if (perf_data__is_dir(session->data)) {
+		int i, nr = session->data->dir.nr;
+		struct reader file_rd[nr];
+		u64 total_size = perf_data__size(session->data);
+
+		total_size -= session->data->file.size;
+		ui_progress__init_size(&prog, total_size, "Sorting events...");
+
+		memset(&file_rd, 0, nr * sizeof(file_rd[0]));
+
+		for (i = 0; i < nr ; i++) {
+			struct perf_data_file *file;
+
+			file = &session->data->dir.files[i];
+			file_rd[i] = (struct reader) {
+				.fd             = file->fd,
+				.path           = file->path,
+				.data_size      = file->size,
+				.data_offset    = 0,
+				.process	= process_simple,
+			};
+			file_rd[i].unmap_file = perf_header__has_feat(&session->header,
+								      HEADER_COMPRESSED);
+			session->reader = &file_rd[i];
+
+			if (zstd_init(&(file_rd[i].zstd_data), 0))
+				goto out_err;
+			err = reader__process_events(&file_rd[i], session, &prog);
+			zstd_fini(&(file_rd[i].zstd_data));
+			session->reader = NULL;
+			if (err)
+				goto out_err;
+		}
+	}
+
 	/* do the final flush for ordered samples */
 	err = ordered_events__flush(oe, OE_FLUSH__FINAL);
 	if (err)
diff --git a/tools/perf/util/session.h b/tools/perf/util/session.h
index 4fc9ccdf7970..d428f3eaf7fd 100644
--- a/tools/perf/util/session.h
+++ b/tools/perf/util/session.h
@@ -43,6 +43,7 @@ struct reader {
 	u64		 data_offset;
 	reader_cb_t	 process;
 	struct zstd_data zstd_data;
+	bool		 unmap_file;
 };
 
 struct perf_session {
-- 
2.24.1



^ permalink raw reply related	[flat|nested] 45+ messages in thread

* [PATCH v1 07/15] perf record: introduce trace file, compressor and stats in mmap object
  2020-10-12  8:50 [PATCH v1 00/15] Introduce threaded trace streaming for basic perf record operation Alexey Budankov
                   ` (5 preceding siblings ...)
  2020-10-12  8:58 ` [PATCH v1 06/15] perf session: load data directory into tool process memory Alexey Budankov
@ 2020-10-12  8:59 ` Alexey Budankov
  2020-10-12  9:01 ` [PATCH v1 08/15] perf record: write trace data into mmap trace files Alexey Budankov
                   ` (9 subsequent siblings)
  16 siblings, 0 replies; 45+ messages in thread
From: Alexey Budankov @ 2020-10-12  8:59 UTC (permalink / raw)
  To: Arnaldo Carvalho de Melo, Jiri Olsa
  Cc: Namhyung Kim, Alexander Shishkin, Andi Kleen, Adrian Hunter,
	Peter Zijlstra, Ingo Molnar, linux-kernel


Introduce trace file and compressor objects into mmap object so
they could be used to process and store data stream from the
corresponding kernel data buffer. Introduce bytes_transferred
and bytes_compressed stats so they would capture statistics for
the related data buffer transfers. Make use of the introduced
per mmap file, compressor and stats when they are initialized
and available.

Signed-off-by: Alexey Budankov <alexey.budankov@linux.intel.com>
---
 tools/perf/builtin-record.c | 47 +++++++++++++++++++++++++++----------
 tools/perf/util/mmap.c      |  6 +++++
 tools/perf/util/mmap.h      |  5 ++++
 3 files changed, 46 insertions(+), 12 deletions(-)

diff --git a/tools/perf/builtin-record.c b/tools/perf/builtin-record.c
index adf311d15d3d..619aaee11231 100644
--- a/tools/perf/builtin-record.c
+++ b/tools/perf/builtin-record.c
@@ -150,11 +150,17 @@ static int record__write(struct record *rec, struct mmap *map __maybe_unused,
 {
 	struct perf_data_file *file = &rec->session->data->file;
 
+	if (map && map->file)
+		file = map->file;
+
 	if (perf_data_file__write(file, bf, size) < 0) {
 		pr_err("failed to write perf data, error: %m\n");
 		return -1;
 	}
 
+	if (file != &rec->session->data->file)
+		return 0;
+
 	rec->bytes_written += size;
 
 	if (record__output_max_size_exceeded(rec) && !done) {
@@ -172,8 +178,8 @@ static int record__write(struct record *rec, struct mmap *map __maybe_unused,
 
 static int record__aio_enabled(struct record *rec);
 static int record__comp_enabled(struct record *rec);
-static size_t zstd_compress(struct perf_session *session, void *dst, size_t dst_size,
-			    void *src, size_t src_size);
+static size_t zstd_compress(struct zstd_data *data,
+			    void *dst, size_t dst_size, void *src, size_t src_size);
 
 #ifdef HAVE_AIO_SUPPORT
 static int record__aio_write(struct aiocb *cblock, int trace_fd,
@@ -291,6 +297,7 @@ struct record_aio {
 static int record__aio_pushfn(struct mmap *map, void *to, void *buf, size_t size)
 {
 	struct record_aio *aio = to;
+	size_t compressed = size;
 
 	/*
 	 * map->core.base data pointed by buf is copied into free map->aio.data[] buffer
@@ -307,9 +314,14 @@ static int record__aio_pushfn(struct mmap *map, void *to, void *buf, size_t size
 	 */
 
 	if (record__comp_enabled(aio->rec)) {
-		size = zstd_compress(aio->rec->session, aio->data + aio->size,
-				     mmap__mmap_len(map) - aio->size,
+		struct zstd_data *zstd_data = &(aio->rec->session->zstd_data);
+
+		compressed = zstd_compress(zstd_data,
+				     aio->data + aio->size, mmap__mmap_len(map) - aio->size,
 				     buf, size);
+
+		aio->rec->session->bytes_transferred += size;
+		aio->rec->session->bytes_compressed  += compressed;
 	} else {
 		memcpy(aio->data + aio->size, buf, size);
 	}
@@ -328,7 +340,7 @@ static int record__aio_pushfn(struct mmap *map, void *to, void *buf, size_t size
 		perf_mmap__get(&map->core);
 	}
 
-	aio->size += size;
+	aio->size += compressed;
 
 	return size;
 }
@@ -532,14 +544,28 @@ static int process_locked_synthesized_event(struct perf_tool *tool,
 static int record__pushfn(struct mmap *map, void *to, void *bf, size_t size)
 {
 	struct record *rec = to;
+	size_t compressed = size;
 
 	if (record__comp_enabled(rec)) {
-		size = zstd_compress(rec->session, map->data, mmap__mmap_len(map), bf, size);
+		struct zstd_data *zstd_data = &rec->session->zstd_data;
+
+		if (map->file)
+			zstd_data = &map->zstd_data;
+
+		compressed = zstd_compress(zstd_data, map->data, mmap__mmap_len(map), bf, size);
 		bf   = map->data;
+
+		if (map->file) {
+			map->bytes_transferred += size;
+			map->bytes_compressed  += compressed;
+		} else {
+			rec->session->bytes_transferred += size;
+			rec->session->bytes_compressed  += compressed;
+		}
 	}
 
 	rec->samples++;
-	return record__write(rec, map, bf, size);
+	return record__write(rec, map, bf, compressed);
 }
 
 static volatile int signr = -1;
@@ -1079,18 +1105,15 @@ static size_t process_comp_header(void *record, size_t increment)
 	return size;
 }
 
-static size_t zstd_compress(struct perf_session *session, void *dst, size_t dst_size,
+static size_t zstd_compress(struct zstd_data *zstd_data, void *dst, size_t dst_size,
 			    void *src, size_t src_size)
 {
 	size_t compressed;
 	size_t max_record_size = PERF_SAMPLE_MAX_SIZE - sizeof(struct perf_record_compressed) - 1;
 
-	compressed = zstd_compress_stream_to_records(&session->zstd_data, dst, dst_size, src, src_size,
+	compressed = zstd_compress_stream_to_records(zstd_data, dst, dst_size, src, src_size,
 						     max_record_size, process_comp_header);
 
-	session->bytes_transferred += src_size;
-	session->bytes_compressed  += compressed;
-
 	return compressed;
 }
 
diff --git a/tools/perf/util/mmap.c b/tools/perf/util/mmap.c
index ab7108d22428..a2c5e4237592 100644
--- a/tools/perf/util/mmap.c
+++ b/tools/perf/util/mmap.c
@@ -230,6 +230,8 @@ void mmap__munmap(struct mmap *map)
 {
 	bitmap_free(map->affinity_mask.bits);
 
+	zstd_fini(&map->zstd_data);
+
 	perf_mmap__aio_munmap(map);
 	if (map->data != NULL) {
 		munmap(map->data, mmap__mmap_len(map));
@@ -291,6 +293,10 @@ int mmap__mmap(struct mmap *map, struct mmap_params *mp, int fd, int cpu)
 	map->core.flush = mp->flush;
 
 	map->comp_level = mp->comp_level;
+	if (zstd_init(&map->zstd_data, map->comp_level)) {
+		pr_debug2("failed to init mmap commpressor, error %d\n", errno);
+		return -1;
+	}
 
 	if (map->comp_level && !perf_mmap__aio_enabled(map)) {
 		map->data = mmap(NULL, mmap__mmap_len(map), PROT_READ|PROT_WRITE,
diff --git a/tools/perf/util/mmap.h b/tools/perf/util/mmap.h
index 9d5f589f02ae..d98b0bdedeac 100644
--- a/tools/perf/util/mmap.h
+++ b/tools/perf/util/mmap.h
@@ -13,6 +13,7 @@
 #endif
 #include "auxtrace.h"
 #include "event.h"
+#include "util/compress.h"
 
 struct aiocb;
 
@@ -43,6 +44,10 @@ struct mmap {
 	struct mmap_cpu_mask	affinity_mask;
 	void		*data;
 	int		comp_level;
+	struct perf_data_file *file;
+	struct zstd_data      zstd_data;
+	u64		      bytes_transferred;
+	u64		      bytes_compressed;
 };
 
 struct mmap_params {
-- 
2.24.1



^ permalink raw reply related	[flat|nested] 45+ messages in thread

* [PATCH v1 08/15] perf record: write trace data into mmap trace files
  2020-10-12  8:50 [PATCH v1 00/15] Introduce threaded trace streaming for basic perf record operation Alexey Budankov
                   ` (6 preceding siblings ...)
  2020-10-12  8:59 ` [PATCH v1 07/15] perf record: introduce trace file, compressor and stats in mmap object Alexey Budankov
@ 2020-10-12  9:01 ` Alexey Budankov
  2020-10-14 10:52   ` Namhyung Kim
  2020-10-12  9:02 ` [PATCH v1 09/15] perf record: introduce thread specific objects for trace streaming Alexey Budankov
                   ` (8 subsequent siblings)
  16 siblings, 1 reply; 45+ messages in thread
From: Alexey Budankov @ 2020-10-12  9:01 UTC (permalink / raw)
  To: Arnaldo Carvalho de Melo, Jiri Olsa
  Cc: Namhyung Kim, Alexander Shishkin, Andi Kleen, Adrian Hunter,
	Peter Zijlstra, Ingo Molnar, linux-kernel


Write trace data into per mmap trace files located
at data directory. Streaming thread adjusts its affinity
according to mask of the buffer being processed.

Signed-off-by: Alexey Budankov <alexey.budankov@linux.intel.com>
---
 tools/perf/builtin-record.c | 44 ++++++++++++++++++++++++++++++++-----
 tools/perf/util/record.h    |  1 +
 2 files changed, 39 insertions(+), 6 deletions(-)

diff --git a/tools/perf/builtin-record.c b/tools/perf/builtin-record.c
index 619aaee11231..ba26d75c51d6 100644
--- a/tools/perf/builtin-record.c
+++ b/tools/perf/builtin-record.c
@@ -120,6 +120,11 @@ static const char *affinity_tags[PERF_AFFINITY_MAX] = {
 	"SYS", "NODE", "CPU"
 };
 
+static int record__threads_enabled(struct record *rec)
+{
+	return rec->opts.threads;
+}
+
 static bool switch_output_signal(struct record *rec)
 {
 	return rec->switch_output.signal &&
@@ -894,6 +899,20 @@ static int record__mmap_evlist(struct record *rec,
 				return -EINVAL;
 		}
 	}
+
+	if (record__threads_enabled(rec)) {
+		int i, ret, nr = evlist->core.nr_mmaps;
+		struct mmap *mmaps = rec->opts.overwrite ?
+				     evlist->overwrite_mmap : evlist->mmap;
+
+		ret = perf_data__create_dir(&rec->data, evlist->core.nr_mmaps);
+		if (ret)
+			return ret;
+
+		for (i = 0; i < nr; i++)
+			mmaps[i].file = &rec->data.dir.files[i];
+	}
+
 	return 0;
 }
 
@@ -1184,8 +1203,12 @@ static int record__mmap_read_evlist(struct record *rec, struct evlist *evlist,
 	/*
 	 * Mark the round finished in case we wrote
 	 * at least one event.
+	 *
+	 * No need for round events in directory mode,
+	 * because per-cpu maps and files have data
+	 * sorted by kernel.
 	 */
-	if (bytes_written != rec->bytes_written)
+	if (!record__threads_enabled(rec) && bytes_written != rec->bytes_written)
 		rc = record__write(rec, NULL, &finished_round_event, sizeof(finished_round_event));
 
 	if (overwrite)
@@ -1231,7 +1254,9 @@ static void record__init_features(struct record *rec)
 	if (!rec->opts.use_clockid)
 		perf_header__clear_feat(&session->header, HEADER_CLOCK_DATA);
 
-	perf_header__clear_feat(&session->header, HEADER_DIR_FORMAT);
+	if (!record__threads_enabled(rec))
+		perf_header__clear_feat(&session->header, HEADER_DIR_FORMAT);
+
 	if (!record__comp_enabled(rec))
 		perf_header__clear_feat(&session->header, HEADER_COMPRESSED);
 
@@ -1242,15 +1267,21 @@ static void
 record__finish_output(struct record *rec)
 {
 	struct perf_data *data = &rec->data;
-	int fd = perf_data__fd(data);
+	int i, fd = perf_data__fd(data);
 
 	if (data->is_pipe)
 		return;
 
 	rec->session->header.data_size += rec->bytes_written;
 	data->file.size = lseek(perf_data__fd(data), 0, SEEK_CUR);
+	if (record__threads_enabled(rec)) {
+		for (i = 0; i < data->dir.nr; i++)
+			data->dir.files[i].size = lseek(data->dir.files[i].fd, 0, SEEK_CUR);
+	}
 
 	if (!rec->no_buildid) {
+		/* this will be recalculated during process_buildids() */
+		rec->samples = 0;
 		process_buildids(rec);
 
 		if (rec->buildid_all)
@@ -2041,8 +2072,6 @@ static int __cmd_record(struct record *rec, int argc, const char **argv)
 		status = err;
 
 	record__synthesize(rec, true);
-	/* this will be recalculated during process_buildids() */
-	rec->samples = 0;
 
 	if (!err) {
 		if (!rec->timestamp_filename) {
@@ -2680,9 +2709,12 @@ int cmd_record(int argc, const char **argv)
 
 	}
 
-	if (rec->opts.kcore)
+	if (rec->opts.kcore || record__threads_enabled(rec))
 		rec->data.is_dir = true;
 
+	if (record__threads_enabled(rec))
+		rec->opts.affinity = PERF_AFFINITY_CPU;
+
 	if (rec->opts.comp_level != 0) {
 		pr_debug("Compression enabled, disabling build id collection at the end of the session.\n");
 		rec->no_buildid = true;
diff --git a/tools/perf/util/record.h b/tools/perf/util/record.h
index 266760ac9143..aeda3cdaa3e9 100644
--- a/tools/perf/util/record.h
+++ b/tools/perf/util/record.h
@@ -74,6 +74,7 @@ struct record_opts {
 	int	      ctl_fd;
 	int	      ctl_fd_ack;
 	bool	      ctl_fd_close;
+	bool	      threads;
 };
 
 extern const char * const *record_usage;
-- 
2.24.1


^ permalink raw reply related	[flat|nested] 45+ messages in thread

* [PATCH v1 09/15] perf record: introduce thread specific objects for trace streaming
  2020-10-12  8:50 [PATCH v1 00/15] Introduce threaded trace streaming for basic perf record operation Alexey Budankov
                   ` (7 preceding siblings ...)
  2020-10-12  9:01 ` [PATCH v1 08/15] perf record: write trace data into mmap trace files Alexey Budankov
@ 2020-10-12  9:02 ` Alexey Budankov
  2020-10-12  9:03 ` [PATCH v1 10/15] perf record: manage thread specific data array Alexey Budankov
                   ` (7 subsequent siblings)
  16 siblings, 0 replies; 45+ messages in thread
From: Alexey Budankov @ 2020-10-12  9:02 UTC (permalink / raw)
  To: Arnaldo Carvalho de Melo, Jiri Olsa
  Cc: Namhyung Kim, Alexander Shishkin, Andi Kleen, Adrian Hunter,
	Peter Zijlstra, Ingo Molnar, linux-kernel


Introduce thread local data object and its array to be used for
threaded trace streaming.

Signed-off-by: Alexey Budankov <alexey.budankov@linux.intel.com>
---
 tools/perf/builtin-record.c | 18 ++++++++++++++++++
 1 file changed, 18 insertions(+)

diff --git a/tools/perf/builtin-record.c b/tools/perf/builtin-record.c
index ba26d75c51d6..8e512096a060 100644
--- a/tools/perf/builtin-record.c
+++ b/tools/perf/builtin-record.c
@@ -85,11 +85,29 @@ struct switch_output {
 	int		 cur_file;
 };
 
+struct thread_data {
+	pid_t		   tid;
+	struct {
+		int	   msg[2];
+		int	   ack[2];
+	} comm;
+	struct fdarray	   pollfd;
+	int		   ctlfd_pos;
+	struct mmap	   *maps;
+	int		   nr_mmaps;
+	struct record	   *rec;
+	unsigned long long samples;
+	unsigned long	   waking;
+	u64		   bytes_written;
+};
+
 struct record {
 	struct perf_tool	tool;
 	struct record_opts	opts;
 	u64			bytes_written;
 	struct perf_data	data;
+	struct thread_data	*thread_data;
+	int			nr_thread_data;
 	struct auxtrace_record	*itr;
 	struct evlist	*evlist;
 	struct perf_session	*session;
-- 
2.24.1



^ permalink raw reply related	[flat|nested] 45+ messages in thread

* [PATCH v1 10/15] perf record: manage thread specific data array
  2020-10-12  8:50 [PATCH v1 00/15] Introduce threaded trace streaming for basic perf record operation Alexey Budankov
                   ` (8 preceding siblings ...)
  2020-10-12  9:02 ` [PATCH v1 09/15] perf record: introduce thread specific objects for trace streaming Alexey Budankov
@ 2020-10-12  9:03 ` Alexey Budankov
  2020-10-12  9:05 ` [PATCH v1 11/15] perf evlist: introduce evlist__ctlfd_update() to update ctl fd status Alexey Budankov
                   ` (6 subsequent siblings)
  16 siblings, 0 replies; 45+ messages in thread
From: Alexey Budankov @ 2020-10-12  9:03 UTC (permalink / raw)
  To: Arnaldo Carvalho de Melo, Jiri Olsa
  Cc: Namhyung Kim, Alexander Shishkin, Andi Kleen, Adrian Hunter,
	Peter Zijlstra, Ingo Molnar, linux-kernel


Provide allocation, initialization, finalization and releasing of
thread specific objects at thread specific data array. Allocate
thread specific object for every data buffer making one-to-one
relation between data buffer and a thread processing the buffer.
Deliver event fd related signals to thread's pollfd object.
Deliver thread control commands to ctlfd_pos fd of thread 1+.
Deliver tool external control commands via ctlfd_pos fd of thread 0.

Signed-off-by: Alexey Budankov <alexey.budankov@linux.intel.com>
---
 tools/perf/builtin-record.c | 101 ++++++++++++++++++++++++++++++++++--
 1 file changed, 98 insertions(+), 3 deletions(-)

diff --git a/tools/perf/builtin-record.c b/tools/perf/builtin-record.c
index 8e512096a060..89cb8e913fb3 100644
--- a/tools/perf/builtin-record.c
+++ b/tools/perf/builtin-record.c
@@ -884,6 +884,94 @@ static int record__kcore_copy(struct machine *machine, struct perf_data *data)
 	return kcore_copy(from_dir, kcore_dir);
 }
 
+static int record__alloc_thread_data(struct record *rec, struct mmap *mmaps, int nr_mmaps,
+				     struct fdarray *evlist_pollfd, int ctlfd_pos)
+{
+	int i, j, k, nr_thread_data;
+	struct thread_data *thread_data;
+
+	rec->nr_thread_data = nr_thread_data = nr_mmaps;
+	rec->thread_data = thread_data = zalloc(rec->nr_thread_data * sizeof(*(rec->thread_data)));
+	if (!thread_data) {
+		pr_err("Failed to allocate thread data\n");
+		return -ENOMEM;
+	}
+
+	for (i = 0; i < nr_thread_data; i++) {
+		short revents;
+		int pos, fd;
+
+		thread_data[i].tid = -1;
+
+		if (pipe(thread_data[i].comm.msg) ||
+		    pipe(thread_data[i].comm.ack)) {
+			pr_err("Failed to create thread comm pipes, errno %d\n", errno);
+			return -ENOMEM;
+		}
+
+		thread_data[i].maps = &mmaps[i];
+		thread_data[i].nr_mmaps = 1;
+
+		thread_data[i].rec = rec;
+
+		fdarray__init(&(thread_data[i].pollfd), 64);
+
+		for (j = 0; j < thread_data[i].nr_mmaps; j++) {
+			struct mmap *map = &(thread_data[i].maps[j]);
+
+			for (k = 0; k < evlist_pollfd->nr; k++) {
+				if (evlist_pollfd->priv[k].ptr != map)
+					continue;
+
+				fd = evlist_pollfd->entries[k].fd;
+				revents = evlist_pollfd->entries[k].events;
+				pos = fdarray__add(&(thread_data[i].pollfd),
+						fd, revents | POLLERR | POLLHUP,
+						fdarray_flag__default);
+				if (pos >= 0)
+					thread_data[i].pollfd.priv[pos].ptr = map;
+				else
+					return -ENOMEM;
+			}
+		}
+
+		if (i) {
+			fd = thread_data[i].comm.msg[0];
+			revents = POLLIN | POLLERR | POLLHUP;
+		} else {
+			if (ctlfd_pos == -1)
+				continue;
+			fd = evlist_pollfd->entries[ctlfd_pos].fd;
+			revents = evlist_pollfd->entries[ctlfd_pos].events;
+		}
+		thread_data[i].ctlfd_pos =
+				fdarray__add(&(thread_data[i].pollfd),
+					     fd, revents, fdarray_flag__nonfilterable);
+		if (thread_data[i].ctlfd_pos < 0)
+			return -ENOMEM;
+	}
+
+	return 0;
+}
+
+static int record__free_thread_data(struct record *rec)
+{
+	int i;
+
+	if (rec->thread_data) {
+		for (i = 0; i < rec->nr_thread_data; i++) {
+			close(rec->thread_data[i].comm.msg[0]);
+			close(rec->thread_data[i].comm.msg[1]);
+			close(rec->thread_data[i].comm.ack[0]);
+			close(rec->thread_data[i].comm.ack[1]);
+			fdarray__exit(&(rec->thread_data[i].pollfd));
+		}
+		zfree(&rec->thread_data);
+	}
+
+	return 0;
+}
+
 static int record__mmap_evlist(struct record *rec,
 			       struct evlist *evlist)
 {
@@ -918,6 +1006,9 @@ static int record__mmap_evlist(struct record *rec,
 		}
 	}
 
+	if (evlist__initialize_ctlfd(evlist, opts->ctl_fd, opts->ctl_fd_ack))
+		return -1;
+
 	if (record__threads_enabled(rec)) {
 		int i, ret, nr = evlist->core.nr_mmaps;
 		struct mmap *mmaps = rec->opts.overwrite ?
@@ -929,6 +1020,12 @@ static int record__mmap_evlist(struct record *rec,
 
 		for (i = 0; i < nr; i++)
 			mmaps[i].file = &rec->data.dir.files[i];
+
+		ret = record__alloc_thread_data(rec, mmaps, nr,
+						&evlist->core.pollfd,
+						evlist->ctl_fd.pos);
+		if (ret)
+			return ret;
 	}
 
 	return 0;
@@ -1910,9 +2007,6 @@ static int __cmd_record(struct record *rec, int argc, const char **argv)
 		perf_evlist__start_workload(rec->evlist);
 	}
 
-	if (evlist__initialize_ctlfd(rec->evlist, opts->ctl_fd, opts->ctl_fd_ack))
-		goto out_child;
-
 	if (opts->initial_delay) {
 		pr_info(EVLIST_DISABLED_MSG);
 		if (opts->initial_delay > 0) {
@@ -2063,6 +2157,7 @@ static int __cmd_record(struct record *rec, int argc, const char **argv)
 		record__synthesize_workload(rec, true);
 
 out_child:
+	record__free_thread_data(rec);
 	evlist__finalize_ctlfd(rec->evlist);
 	record__mmap_read_all(rec, true);
 	record__aio_mmap_read_sync(rec);
-- 
2.24.1


^ permalink raw reply related	[flat|nested] 45+ messages in thread

* [PATCH v1 11/15] perf evlist: introduce evlist__ctlfd_update() to update ctl fd status
  2020-10-12  8:50 [PATCH v1 00/15] Introduce threaded trace streaming for basic perf record operation Alexey Budankov
                   ` (9 preceding siblings ...)
  2020-10-12  9:03 ` [PATCH v1 10/15] perf record: manage thread specific data array Alexey Budankov
@ 2020-10-12  9:05 ` Alexey Budankov
  2020-10-12  9:05 ` [PATCH v1 12/15] perf record: introduce thread local variable for trace streaming Alexey Budankov
                   ` (5 subsequent siblings)
  16 siblings, 0 replies; 45+ messages in thread
From: Alexey Budankov @ 2020-10-12  9:05 UTC (permalink / raw)
  To: Arnaldo Carvalho de Melo, Jiri Olsa
  Cc: Namhyung Kim, Alexander Shishkin, Andi Kleen, Adrian Hunter,
	Peter Zijlstra, Ingo Molnar, linux-kernel


Introduce evlist__ctlfd_update() to update ctl fd poll status
in evlist pollfd array using other pollfd object.

Signed-off-by: Alexey Budankov <alexey.budankov@linux.intel.com>
---
 tools/perf/util/evlist.c | 16 ++++++++++++++++
 tools/perf/util/evlist.h |  1 +
 2 files changed, 17 insertions(+)

diff --git a/tools/perf/util/evlist.c b/tools/perf/util/evlist.c
index 8bdf3d2c907c..758a4896fedd 100644
--- a/tools/perf/util/evlist.c
+++ b/tools/perf/util/evlist.c
@@ -1970,6 +1970,22 @@ int evlist__ctlfd_process(struct evlist *evlist, enum evlist_ctl_cmd *cmd)
 	return err;
 }
 
+int evlist__ctlfd_update(struct evlist *evlist, struct pollfd *update)
+{
+	int ctlfd_pos = evlist->ctl_fd.pos;
+	struct pollfd *entries = evlist->core.pollfd.entries;
+
+	if (!evlist__ctlfd_initialized(evlist))
+		return 0;
+
+	if (entries[ctlfd_pos].fd != update->fd ||
+	    entries[ctlfd_pos].events != update->events)
+		return -1;
+
+	entries[ctlfd_pos].revents = update->revents;
+	return 0;
+}
+
 struct evsel *evlist__find_evsel(struct evlist *evlist, int idx)
 {
 	struct evsel *evsel;
diff --git a/tools/perf/util/evlist.h b/tools/perf/util/evlist.h
index e1a450322bc5..9b73d6ccf066 100644
--- a/tools/perf/util/evlist.h
+++ b/tools/perf/util/evlist.h
@@ -380,6 +380,7 @@ void evlist__close_control(int ctl_fd, int ctl_fd_ack, bool *ctl_fd_close);
 int evlist__initialize_ctlfd(struct evlist *evlist, int ctl_fd, int ctl_fd_ack);
 int evlist__finalize_ctlfd(struct evlist *evlist);
 bool evlist__ctlfd_initialized(struct evlist *evlist);
+int evlist__ctlfd_update(struct evlist *evlist, struct pollfd *update);
 int evlist__ctlfd_process(struct evlist *evlist, enum evlist_ctl_cmd *cmd);
 int evlist__ctlfd_ack(struct evlist *evlist);
 
-- 
2.24.1

^ permalink raw reply related	[flat|nested] 45+ messages in thread

* [PATCH v1 12/15] perf record: introduce thread local variable for trace streaming
  2020-10-12  8:50 [PATCH v1 00/15] Introduce threaded trace streaming for basic perf record operation Alexey Budankov
                   ` (10 preceding siblings ...)
  2020-10-12  9:05 ` [PATCH v1 11/15] perf evlist: introduce evlist__ctlfd_update() to update ctl fd status Alexey Budankov
@ 2020-10-12  9:05 ` Alexey Budankov
  2020-10-12  9:10 ` [PATCH v1 13/15] perf record: stop threads in the end of " Alexey Budankov
                   ` (4 subsequent siblings)
  16 siblings, 0 replies; 45+ messages in thread
From: Alexey Budankov @ 2020-10-12  9:05 UTC (permalink / raw)
  To: Arnaldo Carvalho de Melo, Jiri Olsa
  Cc: Namhyung Kim, Alexander Shishkin, Andi Kleen, Adrian Hunter,
	Peter Zijlstra, Ingo Molnar, linux-kernel


Introduce thread local variable and use it for threaded trace streaming.

Signed-off-by: Alexey Budankov <alexey.budankov@linux.intel.com>
---
 tools/perf/builtin-record.c | 71 ++++++++++++++++++++++++++++++++-----
 1 file changed, 62 insertions(+), 9 deletions(-)

diff --git a/tools/perf/builtin-record.c b/tools/perf/builtin-record.c
index 89cb8e913fb3..3b7e9026f25b 100644
--- a/tools/perf/builtin-record.c
+++ b/tools/perf/builtin-record.c
@@ -101,6 +101,8 @@ struct thread_data {
 	u64		   bytes_written;
 };
 
+static __thread struct thread_data *thread;
+
 struct record {
 	struct perf_tool	tool;
 	struct record_opts	opts;
@@ -587,7 +589,11 @@ static int record__pushfn(struct mmap *map, void *to, void *bf, size_t size)
 		}
 	}
 
-	rec->samples++;
+	if (thread)
+		thread->samples++;
+	else
+		rec->samples++;
+
 	return record__write(rec, map, bf, compressed);
 }
 
@@ -1258,6 +1264,7 @@ static int record__mmap_read_evlist(struct record *rec, struct evlist *evlist,
 	int i;
 	int rc = 0;
 	struct mmap *maps;
+	int nr_mmaps;
 	int trace_fd = rec->data.file.fd;
 	off_t off = 0;
 
@@ -1265,6 +1272,14 @@ static int record__mmap_read_evlist(struct record *rec, struct evlist *evlist,
 		return 0;
 
 	maps = overwrite ? evlist->overwrite_mmap : evlist->mmap;
+	nr_mmaps = evlist->core.nr_mmaps;
+
+	if (thread) {
+		bytes_written = thread->bytes_written;
+		maps = thread->maps;
+		nr_mmaps = thread->nr_mmaps;
+	}
+
 	if (!maps)
 		return 0;
 
@@ -1274,7 +1289,7 @@ static int record__mmap_read_evlist(struct record *rec, struct evlist *evlist,
 	if (record__aio_enabled(rec))
 		off = record__aio_get_pos(trace_fd);
 
-	for (i = 0; i < evlist->core.nr_mmaps; i++) {
+	for (i = 0; i < nr_mmaps; i++) {
 		u64 flush = 0;
 		struct mmap *map = &maps[i];
 
@@ -1323,7 +1338,7 @@ static int record__mmap_read_evlist(struct record *rec, struct evlist *evlist,
 	 * because per-cpu maps and files have data
 	 * sorted by kernel.
 	 */
-	if (!record__threads_enabled(rec) && bytes_written != rec->bytes_written)
+	if (!thread && bytes_written != rec->bytes_written)
 		rc = record__write(rec, NULL, &finished_round_event, sizeof(finished_round_event));
 
 	if (overwrite)
@@ -1343,6 +1358,15 @@ static int record__mmap_read_all(struct record *rec, bool synch)
 	return record__mmap_read_evlist(rec, rec->evlist, true, synch);
 }
 
+static void record__thread_munmap_filtered(struct fdarray *fda, int fd,
+					   void *arg __maybe_unused)
+{
+	struct perf_mmap *map = fda->priv[fd].ptr;
+
+	if (map)
+		perf_mmap__put(map);
+}
+
 static void record__init_features(struct record *rec)
 {
 	struct perf_session *session = rec->session;
@@ -2020,7 +2044,12 @@ static int __cmd_record(struct record *rec, int argc, const char **argv)
 	trigger_ready(&switch_output_trigger);
 	perf_hooks__invoke_record_start();
 	for (;;) {
-		unsigned long long hits = rec->samples;
+		unsigned long long hits0, hits1;
+
+		if (thread)
+			hits0 = thread->samples;
+		else
+			hits0 = rec->samples;
 
 		/*
 		 * rec->evlist->bkw_mmap_state is possible to be
@@ -2089,20 +2118,44 @@ static int __cmd_record(struct record *rec, int argc, const char **argv)
 				alarm(rec->switch_output.time);
 		}
 
-		if (hits == rec->samples) {
+		if (thread)
+			hits1 = thread->samples;
+		else
+			hits1 = rec->samples;
+
+		if (hits0 == hits1) {
 			if (done || draining)
 				break;
-			err = evlist__poll(rec->evlist, -1);
+
+			if (thread)
+				err = fdarray__poll(&thread->pollfd, -1);
+			else
+				err = evlist__poll(rec->evlist, -1);
 			/*
 			 * Propagate error, only if there's any. Ignore positive
 			 * number of returned events and interrupt error.
 			 */
 			if (err > 0 || (err < 0 && errno == EINTR))
 				err = 0;
-			waking++;
 
-			if (evlist__filter_pollfd(rec->evlist, POLLERR | POLLHUP) == 0)
-				draining = true;
+			if (thread) {
+				thread->waking++;
+				if (thread->ctlfd_pos != -1) {
+					evlist__ctlfd_update(rec->evlist,
+						&(thread->pollfd.entries[thread->ctlfd_pos]));
+				}
+			} else {
+				waking++;
+			}
+
+			if (thread) {
+				if (fdarray__filter(&thread->pollfd, POLLERR | POLLHUP,
+						    record__thread_munmap_filtered, NULL) == 0)
+					draining = true;
+			} else {
+				if (evlist__filter_pollfd(rec->evlist, POLLERR | POLLHUP) == 0)
+					draining = true;
+			}
 		}
 
 		if (evlist__ctlfd_process(rec->evlist, &cmd) > 0) {
-- 
2.24.1



^ permalink raw reply related	[flat|nested] 45+ messages in thread

* [PATCH v1 13/15] perf record: stop threads in the end of trace streaming
  2020-10-12  8:50 [PATCH v1 00/15] Introduce threaded trace streaming for basic perf record operation Alexey Budankov
                   ` (11 preceding siblings ...)
  2020-10-12  9:05 ` [PATCH v1 12/15] perf record: introduce thread local variable for trace streaming Alexey Budankov
@ 2020-10-12  9:10 ` Alexey Budankov
  2020-10-12  9:11 ` [PATCH v1 14/15] perf record: start threads in the beginning " Alexey Budankov
                   ` (3 subsequent siblings)
  16 siblings, 0 replies; 45+ messages in thread
From: Alexey Budankov @ 2020-10-12  9:10 UTC (permalink / raw)
  To: Arnaldo Carvalho de Melo, Jiri Olsa
  Cc: Namhyung Kim, Alexander Shishkin, Andi Kleen, Adrian Hunter,
	Peter Zijlstra, Ingo Molnar, linux-kernel


Close write fd of comm.msg pipe to signal thread to terminate
and receive THREAD_MSG__READY confirmation on termination.
Accumulate thread stats into global stats to be correctly
calculated and displayed in perf tool output.

Signed-off-by: Alexey Budankov <alexey.budankov@linux.intel.com>
---
 tools/perf/builtin-record.c | 64 ++++++++++++++++++++++++++++++++++---
 1 file changed, 60 insertions(+), 4 deletions(-)

diff --git a/tools/perf/builtin-record.c b/tools/perf/builtin-record.c
index 3b7e9026f25b..a15642656066 100644
--- a/tools/perf/builtin-record.c
+++ b/tools/perf/builtin-record.c
@@ -85,6 +85,16 @@ struct switch_output {
 	int		 cur_file;
 };
 
+enum thread_msg {
+	THREAD_MSG__UNSUPPORTED = 0,
+	THREAD_MSG__READY,
+	THREAD_MSG__MAX,
+};
+
+static const char *thread_msg_tags[THREAD_MSG__MAX] = {
+	"UNSUPPORTED", "READY"
+};
+
 struct thread_data {
 	pid_t		   tid;
 	struct {
@@ -1796,6 +1806,50 @@ static void hit_auxtrace_snapshot_trigger(struct record *rec)
 	}
 }
 
+static int record__terminate_thread(struct thread_data *thread_data)
+{
+	int res;
+	enum thread_msg ack = THREAD_MSG__UNSUPPORTED;
+	pid_t tid = thread_data->tid;
+
+	close(thread_data->comm.msg[1]);
+	res = read(thread_data->comm.ack[0], &ack, sizeof(ack));
+	if (res != -1)
+		pr_debug("threads: %d -> %s\n", tid, thread_msg_tags[ack]);
+	else
+		pr_err("threads: failed to recv msg=%s from %d\n",
+		       thread_msg_tags[ack], tid);
+
+	return 0;
+}
+
+static int record__stop_threads(struct record *rec, unsigned long *waking)
+{
+	int i, j, nr_thread_data = rec->nr_thread_data;
+	struct thread_data *thread_data = rec->thread_data;
+
+	if (!record__threads_enabled(rec))
+		return 0;
+
+	for (i = 1; i < nr_thread_data; i++)
+		record__terminate_thread(&thread_data[i]);
+
+	for (i = 0; i < nr_thread_data; i++) {
+		pr_debug("threads: %d : samples %lld, wakes %ld\n",
+			 thread_data[i].tid, thread_data[i].samples,
+			 thread_data[i].waking);
+
+		rec->samples += thread_data[i].samples;
+		*waking      += thread_data[i].waking;
+		for (j = 0; j < thread_data[i].nr_mmaps; j++) {
+			rec->session->bytes_transferred += thread_data[i].maps[j].bytes_transferred;
+			rec->session->bytes_compressed  += thread_data[i].maps[j].bytes_compressed;
+		}
+	}
+
+	return 0;
+}
+
 static int __cmd_record(struct record *rec, int argc, const char **argv)
 {
 	int err;
@@ -1903,7 +1957,7 @@ static int __cmd_record(struct record *rec, int argc, const char **argv)
 
 	if (record__open(rec) != 0) {
 		err = -1;
-		goto out_child;
+		goto out_free_threads;
 	}
 	session->header.env.comp_mmap_len = session->evlist->core.mmap_len;
 
@@ -2203,18 +2257,20 @@ static int __cmd_record(struct record *rec, int argc, const char **argv)
 		goto out_child;
 	}
 
-	if (!quiet)
-		fprintf(stderr, "[ perf record: Woken up %ld times to write data ]\n", waking);
-
 	if (target__none(&rec->opts.target))
 		record__synthesize_workload(rec, true);
 
 out_child:
+	record__stop_threads(rec, &waking);
+out_free_threads:
 	record__free_thread_data(rec);
 	evlist__finalize_ctlfd(rec->evlist);
 	record__mmap_read_all(rec, true);
 	record__aio_mmap_read_sync(rec);
 
+	if (!quiet)
+		fprintf(stderr, "[ perf record: Woken up %ld times to write data ]\n", waking);
+
 	if (rec->session->bytes_transferred && rec->session->bytes_compressed) {
 		ratio = (float)rec->session->bytes_transferred/(float)rec->session->bytes_compressed;
 		session->header.env.comp_ratio = ratio + 0.5;
-- 
2.24.1


^ permalink raw reply related	[flat|nested] 45+ messages in thread

* [PATCH v1 14/15] perf record: start threads in the beginning of trace streaming
  2020-10-12  8:50 [PATCH v1 00/15] Introduce threaded trace streaming for basic perf record operation Alexey Budankov
                   ` (12 preceding siblings ...)
  2020-10-12  9:10 ` [PATCH v1 13/15] perf record: stop threads in the end of " Alexey Budankov
@ 2020-10-12  9:11 ` Alexey Budankov
  2020-10-12  9:13 ` [PATCH v1 15/15] perf record: introduce --threads command line option Alexey Budankov
                   ` (2 subsequent siblings)
  16 siblings, 0 replies; 45+ messages in thread
From: Alexey Budankov @ 2020-10-12  9:11 UTC (permalink / raw)
  To: Arnaldo Carvalho de Melo, Jiri Olsa
  Cc: Namhyung Kim, Alexander Shishkin, Andi Kleen, Adrian Hunter,
	Peter Zijlstra, Ingo Molnar, linux-kernel


Start threads in detached state because its management is possible
via messaging. Block signals prior the threads start so only main
tool thread would be notified on external async signals during data
collection. Streaming threads connect one-to-one to mapped data
buffers and write into per-CPU trace files located at data directory.
Data buffers and threads are affined to local NUMA nodes and monitored
CPUs according to system topology. --cpu option can be used to specify
exact CPUs to be monitored.

Signed-off-by: Alexey Budankov <alexey.budankov@linux.intel.com>
---
 tools/perf/builtin-record.c | 128 +++++++++++++++++++++++++++++++++---
 1 file changed, 120 insertions(+), 8 deletions(-)

diff --git a/tools/perf/builtin-record.c b/tools/perf/builtin-record.c
index a15642656066..1d41e996a994 100644
--- a/tools/perf/builtin-record.c
+++ b/tools/perf/builtin-record.c
@@ -56,6 +56,7 @@
 #include <poll.h>
 #include <pthread.h>
 #include <unistd.h>
+#include <sys/syscall.h>
 #include <sched.h>
 #include <signal.h>
 #ifdef HAVE_EVENTFD_SUPPORT
@@ -1377,6 +1378,62 @@ static void record__thread_munmap_filtered(struct fdarray *fda, int fd,
 		perf_mmap__put(map);
 }
 
+static void *record__thread(void *arg)
+{
+	enum thread_msg msg = THREAD_MSG__READY;
+	bool terminate = false;
+	struct fdarray *pollfd;
+	int err, ctlfd_pos;
+
+	thread = arg;
+	thread->tid = syscall(SYS_gettid);
+
+	err = write(thread->comm.ack[1], &msg, sizeof(msg));
+	if (err == -1)
+		pr_err("threads: %d failed to notify on start. Error %m", thread->tid);
+
+	pollfd = &(thread->pollfd);
+	ctlfd_pos = thread->ctlfd_pos;
+
+	for (;;) {
+		unsigned long long hits = thread->samples;
+
+		if (record__mmap_read_all(thread->rec, false) < 0 || terminate)
+			break;
+
+		if (hits == thread->samples) {
+
+			err = fdarray__poll(pollfd, -1);
+			/*
+			 * Propagate error, only if there's any. Ignore positive
+			 * number of returned events and interrupt error.
+			 */
+			if (err > 0 || (err < 0 && errno == EINTR))
+				err = 0;
+			thread->waking++;
+
+			if (fdarray__filter(pollfd, POLLERR | POLLHUP,
+					    record__thread_munmap_filtered, NULL) == 0)
+				break;
+		}
+
+		if (pollfd->entries[ctlfd_pos].revents & POLLHUP) {
+			terminate = true;
+			close(thread->comm.msg[0]);
+			pollfd->entries[ctlfd_pos].fd = -1;
+			pollfd->entries[ctlfd_pos].events = 0;
+		}
+
+		pollfd->entries[ctlfd_pos].revents = 0;
+	}
+
+	err = write(thread->comm.ack[1], &msg, sizeof(msg));
+	if (err == -1)
+		pr_err("threads: %d failed to notify on termination. Error %m", thread->tid);
+
+	return NULL;
+}
+
 static void record__init_features(struct record *rec)
 {
 	struct perf_session *session = rec->session;
@@ -1823,6 +1880,58 @@ static int record__terminate_thread(struct thread_data *thread_data)
 	return 0;
 }
 
+static int record__start_threads(struct record *rec)
+{
+	int i, j, ret = 0;
+	sigset_t full, mask;
+	pthread_t handle;
+	pthread_attr_t attrs;
+	int nr_thread_data = rec->nr_thread_data;
+	struct thread_data *thread_data = rec->thread_data;
+
+	if (!record__threads_enabled(rec))
+		return 0;
+
+	sigfillset(&full);
+	if (sigprocmask(SIG_SETMASK, &full, &mask)) {
+		pr_err("Failed to block signals on threads start. Error: %m\n");
+		return -1;
+	}
+
+	pthread_attr_init(&attrs);
+	pthread_attr_setdetachstate(&attrs, PTHREAD_CREATE_DETACHED);
+
+	for (i = 1; i < nr_thread_data; i++) {
+		int res = 0;
+		enum thread_msg msg = THREAD_MSG__UNSUPPORTED;
+
+		if (pthread_create(&handle, &attrs, record__thread, &thread_data[i])) {
+			for (j = 1; j < i; j++)
+				record__terminate_thread(&thread_data[i]);
+			pr_err("Failed to start threads. Error: %m\n");
+			ret = -1;
+			goto out_err;
+		}
+
+		res = read(thread_data[i].comm.ack[0], &msg, sizeof(msg));
+		if (res > 0)
+			pr_debug("threads: %d -> %s\n", rec->thread_data[i].tid,
+				 thread_msg_tags[msg]);
+	}
+
+	thread = &thread_data[0];
+	thread->tid = syscall(SYS_gettid);
+	pr_debug("threads: %d -> %s\n", thread->tid, thread_msg_tags[THREAD_MSG__READY]);
+
+out_err:
+	if (sigprocmask(SIG_SETMASK, &mask, NULL)) {
+		pr_err("Failed to unblock signals on threads start. Error: %m\n");
+		ret = -1;
+	}
+
+	return ret;
+}
+
 static int record__stop_threads(struct record *rec, unsigned long *waking)
 {
 	int i, j, nr_thread_data = rec->nr_thread_data;
@@ -1965,7 +2074,7 @@ static int __cmd_record(struct record *rec, int argc, const char **argv)
 		err = record__kcore_copy(&session->machines.host, data);
 		if (err) {
 			pr_err("ERROR: Failed to copy kcore\n");
-			goto out_child;
+			goto out_free_threads;
 		}
 	}
 
@@ -1976,7 +2085,7 @@ static int __cmd_record(struct record *rec, int argc, const char **argv)
 		bpf__strerror_apply_obj_config(err, errbuf, sizeof(errbuf));
 		pr_err("ERROR: Apply config to BPF failed: %s\n",
 			 errbuf);
-		goto out_child;
+		goto out_free_threads;
 	}
 
 	/*
@@ -1994,11 +2103,11 @@ static int __cmd_record(struct record *rec, int argc, const char **argv)
 	if (data->is_pipe) {
 		err = perf_header__write_pipe(fd);
 		if (err < 0)
-			goto out_child;
+			goto out_free_threads;
 	} else {
 		err = perf_session__write_header(session, rec->evlist, fd, false);
 		if (err < 0)
-			goto out_child;
+			goto out_free_threads;
 	}
 
 	err = -1;
@@ -2006,16 +2115,16 @@ static int __cmd_record(struct record *rec, int argc, const char **argv)
 	    && !perf_header__has_feat(&session->header, HEADER_BUILD_ID)) {
 		pr_err("Couldn't generate buildids. "
 		       "Use --no-buildid to profile anyway.\n");
-		goto out_child;
+		goto out_free_threads;
 	}
 
 	err = record__setup_sb_evlist(rec);
 	if (err)
-		goto out_child;
+		goto out_free_threads;
 
 	err = record__synthesize(rec, false);
 	if (err < 0)
-		goto out_child;
+		goto out_free_threads;
 
 	if (rec->realtime_prio) {
 		struct sched_param param;
@@ -2024,10 +2133,13 @@ static int __cmd_record(struct record *rec, int argc, const char **argv)
 		if (sched_setscheduler(0, SCHED_FIFO, &param)) {
 			pr_err("Could not set realtime priority.\n");
 			err = -1;
-			goto out_child;
+			goto out_free_threads;
 		}
 	}
 
+	if (record__start_threads(rec))
+		goto out_free_threads;
+
 	/*
 	 * When perf is starting the traced process, all the events
 	 * (apart from group members) have enable_on_exec=1 set,
-- 
2.24.1


^ permalink raw reply related	[flat|nested] 45+ messages in thread

* [PATCH v1 15/15] perf record: introduce --threads command line option
  2020-10-12  8:50 [PATCH v1 00/15] Introduce threaded trace streaming for basic perf record operation Alexey Budankov
                   ` (13 preceding siblings ...)
  2020-10-12  9:11 ` [PATCH v1 14/15] perf record: start threads in the beginning " Alexey Budankov
@ 2020-10-12  9:13 ` Alexey Budankov
  2020-10-13 16:20 ` [PATCH v1 00/15] Introduce threaded trace streaming for basic perf record operation Arnaldo Carvalho de Melo
  2020-10-14 17:27 ` Ingo Molnar
  16 siblings, 0 replies; 45+ messages in thread
From: Alexey Budankov @ 2020-10-12  9:13 UTC (permalink / raw)
  To: Arnaldo Carvalho de Melo, Jiri Olsa
  Cc: Namhyung Kim, Alexander Shishkin, Andi Kleen, Adrian Hunter,
	Peter Zijlstra, Ingo Molnar, linux-kernel


Provide --threads option in perf record command line interface.
Threaded streaming mode mitigates profiling data losses and
resolves scalability issues of serial and asynchronous (--aio) trace
streaming modes on multicore server systems. The implementation is
based on the prototype [1], [2] and the most closely relates to mode 3)
"mode that creates thread for every monitored memory map".

Threaded streaming mode is available with Zstd compression/decompression
(--compression-level) and handling of external commands (--control).
AUX area tracing, related and derived modes like --snapshot or
--aux-sample are not enabled. --switch-output, --switch-output-event,
--switch-max-files and --timestamp-filename options are not enabled.
Threaded trace streaming is not enabled for pipe mode. Asynchronous
(--aio) trace streaming and affinity (--affinity) modes are mutually
exclusive to the exposed threaded streaming mode.

[1] git clone https://git.kernel.org/pub/scm/linux/kernel/git/jolsa/perf.git -b perf/record_threads
[2] https://lore.kernel.org/lkml/20180913125450.21342-1-jolsa@kernel.org/

Signed-off-by: Alexey Budankov <alexey.budankov@linux.intel.com>
---
 tools/perf/Documentation/perf-record.txt |  7 ++++
 tools/perf/builtin-record.c              | 45 +++++++++++++++++++++---
 2 files changed, 48 insertions(+), 4 deletions(-)

diff --git a/tools/perf/Documentation/perf-record.txt b/tools/perf/Documentation/perf-record.txt
index 768888b9326a..d8fa387da973 100644
--- a/tools/perf/Documentation/perf-record.txt
+++ b/tools/perf/Documentation/perf-record.txt
@@ -671,6 +671,13 @@ Example of bash shell script to enable and disable events during measurements:
  wait -n ${perf_pid}
  exit $?
 
+--threads::
+Write collected trace data into per-CPU trace files using parallel threads.
+List of monitored CPUs can be configured by a mask provided via --cpu option.
+Trace writing threads correspond one-to-one to mapped data buffers. Threads
+and buffers are affined to monitored CPUs and NUMA nodes according to system
+topology. Threaded trace streaming mode is mutually exclusive to asynchronous
+trace streaming (--aio) and affinity (--affinity) modes.
 
 SEE ALSO
 --------
diff --git a/tools/perf/builtin-record.c b/tools/perf/builtin-record.c
index 1d41e996a994..575b0b595081 100644
--- a/tools/perf/builtin-record.c
+++ b/tools/perf/builtin-record.c
@@ -780,6 +780,12 @@ static int record__auxtrace_init(struct record *rec)
 {
 	int err;
 
+	if ((rec->opts.auxtrace_snapshot_opts || rec->opts.auxtrace_sample_opts)
+	    && record__threads_enabled(rec)) {
+		pr_err("AUX area tracing options are not available in threaded streaming mode.\n");
+		return -EINVAL;
+	}
+
 	if (!rec->itr) {
 		rec->itr = auxtrace_record__init(rec->evlist, &err);
 		if (err)
@@ -2008,6 +2014,11 @@ static int __cmd_record(struct record *rec, int argc, const char **argv)
 		return PTR_ERR(session);
 	}
 
+	if (record__threads_enabled(rec) && perf_data__is_pipe(&rec->data)) {
+		pr_err("Threaded trace streaming is not available in pipe mode.\n");
+		return -1;
+	}
+
 	fd = perf_data__fd(data);
 	rec->session = session;
 
@@ -2680,12 +2691,22 @@ static int switch_output_setup(struct record *rec)
 	 * --switch-output=signal, as we'll send a SIGUSR2 from the side band
 	 *  thread to its parent.
 	 */
-	if (rec->switch_output_event_set)
+	if (rec->switch_output_event_set) {
+		if (record__threads_enabled(rec)) {
+			pr_warning("WARNING: --switch-output-event option is not available in threaded streaming mode.\n");
+			return 0;
+		}
 		goto do_signal;
+	}
 
 	if (!s->set)
 		return 0;
 
+	if (record__threads_enabled(rec)) {
+		pr_warning("WARNING: --switch-output option is not available in threaded streaming mode.\n");
+		return 0;
+	}
+
 	if (!strcmp(s->str, "signal")) {
 do_signal:
 		s->signal = true;
@@ -2964,8 +2985,8 @@ static struct option __record_options[] = {
 		     "Set affinity mask of trace reading thread to NUMA node cpu mask or cpu of processed mmap buffer",
 		     record__parse_affinity),
 #ifdef HAVE_ZSTD_SUPPORT
-	OPT_CALLBACK_OPTARG('z', "compression-level", &record.opts, &comp_level_default,
-			    "n", "Compressed records using specified level (default: 1 - fastest compression, 22 - greatest compression)",
+	OPT_CALLBACK_OPTARG('z', "compression-level", &record.opts, &comp_level_default, "n",
+			    "Compress records using specified level (default: 1 - fastest compression, 22 - greatest compression)",
 			    record__parse_comp_level),
 #endif
 	OPT_CALLBACK(0, "max-size", &record.output_max_size,
@@ -2984,6 +3005,8 @@ static struct option __record_options[] = {
 		     "\t\t\t  Optionally send control command completion ('ack\\n') to ack-fd descriptor.\n"
 		     "\t\t\t  Alternatively, ctl-fifo / ack-fifo will be opened and used as ctl-fd / ack-fd.",
 		      parse_control_option),
+	OPT_BOOLEAN(0, "threads", &record.opts.threads,
+		    "write collected trace data into per-CPU trace files using parallel threads"),
 	OPT_END()
 };
 
@@ -3046,8 +3069,17 @@ int cmd_record(int argc, const char **argv)
 	if (rec->opts.kcore || record__threads_enabled(rec))
 		rec->data.is_dir = true;
 
-	if (record__threads_enabled(rec))
+	if (record__threads_enabled(rec)) {
+		if (rec->opts.affinity != PERF_AFFINITY_SYS) {
+			pr_err("--affinity option is mutually exclusive to threaded streaming mode.\n");
+			goto out_opts;
+		}
 		rec->opts.affinity = PERF_AFFINITY_CPU;
+		if (record__aio_enabled(rec)) {
+			pr_err("Asynchronous streaming mode (--aio) is mutually exclusive to threaded streaming mode.\n");
+			goto out_opts;
+		}
+	}
 
 	if (rec->opts.comp_level != 0) {
 		pr_debug("Compression enabled, disabling build id collection at the end of the session.\n");
@@ -3082,6 +3114,11 @@ int cmd_record(int argc, const char **argv)
 		}
 	}
 
+	if (rec->timestamp_filename && record__threads_enabled(rec)) {
+		rec->timestamp_filename = false;
+		pr_warning("WARNING: --timestamp-filename option is not available in threaded streaming mode.\n");
+	}
+
 	/*
 	 * Allow aliases to facilitate the lookup of symbols for address
 	 * filters. Refer to auxtrace_parse_filters().
-- 
2.24.1


^ permalink raw reply related	[flat|nested] 45+ messages in thread

* Re: [PATCH v1 02/15] perf report: output trace file name in raw trace dump
  2020-10-12  8:54 ` [PATCH v1 02/15] perf report: output trace file name " Alexey Budankov
@ 2020-10-12 16:01   ` Andi Kleen
  2020-10-12 17:06     ` Alexey Budankov
  2020-10-13 19:54   ` Jiri Olsa
  1 sibling, 1 reply; 45+ messages in thread
From: Andi Kleen @ 2020-10-12 16:01 UTC (permalink / raw)
  To: Alexey Budankov
  Cc: Arnaldo Carvalho de Melo, Jiri Olsa, Namhyung Kim,
	Alexander Shishkin, Adrian Hunter, Peter Zijlstra, Ingo Molnar,
	linux-kernel

On Mon, Oct 12, 2020 at 11:54:24AM +0300, Alexey Budankov wrote:
> @@ -180,7 +183,8 @@ static int ordered_events__deliver_event(struct ordered_events *oe,
>  						    ordered_events);
>  
>  	return perf_session__deliver_event(session, event->event,
> -					   session->tool, event->file_offset);
> +					   session->tool, event->file_offset,
> +					   event->file_path);

Wouldn't it be better to pass "event" around now, which would contain at least
four of the arguments.

These functions are getting entirely too many arguments.

-Andi


^ permalink raw reply	[flat|nested] 45+ messages in thread

* Re: [PATCH v1 03/15] perf data: open data directory in read access mode
  2020-10-12  8:55 ` [PATCH v1 03/15] perf data: open data directory in read access mode Alexey Budankov
@ 2020-10-12 16:03   ` Andi Kleen
  2020-10-12 16:52     ` Alexey Budankov
  0 siblings, 1 reply; 45+ messages in thread
From: Andi Kleen @ 2020-10-12 16:03 UTC (permalink / raw)
  To: Alexey Budankov
  Cc: Arnaldo Carvalho de Melo, Jiri Olsa, Namhyung Kim,
	Alexander Shishkin, Adrian Hunter, Peter Zijlstra, Ingo Molnar,
	linux-kernel

On Mon, Oct 12, 2020 at 11:55:07AM +0300, Alexey Budankov wrote:
> 
> Open files located at data directory in case of read access mode.

Need some rationale. Is this a bug fix? Did the case not matter
before?

> 
> Signed-off-by: Alexey Budankov <alexey.budankov@linux.intel.com>
> ---
>  tools/perf/util/data.c | 4 ++++
>  1 file changed, 4 insertions(+)
> 
> diff --git a/tools/perf/util/data.c b/tools/perf/util/data.c
> index c47aa34fdc0a..6ad61ac6ba67 100644
> --- a/tools/perf/util/data.c
> +++ b/tools/perf/util/data.c
> @@ -321,6 +321,10 @@ static int open_dir(struct perf_data *data)
>  		return -1;
>  
>  	ret = open_file(data);
> +	if (!ret && perf_data__is_dir(data)) {
> +		if (perf_data__is_read(data))
> +			ret = perf_data__open_dir(data);
> +	}
>  
>  	/* Cleanup whatever we managed to create so far. */
>  	if (ret && perf_data__is_write(data))
> -- 
> 2.24.1
> 
> 

^ permalink raw reply	[flat|nested] 45+ messages in thread

* Re: [PATCH v1 05/15] perf session: introduce decompressor into trace reader object
  2020-10-12  8:57 ` [PATCH v1 05/15] perf session: introduce decompressor into trace reader object Alexey Budankov
@ 2020-10-12 16:05   ` Andi Kleen
  2020-10-12 16:50     ` Alexey Budankov
  0 siblings, 1 reply; 45+ messages in thread
From: Andi Kleen @ 2020-10-12 16:05 UTC (permalink / raw)
  To: Alexey Budankov
  Cc: Arnaldo Carvalho de Melo, Jiri Olsa, Namhyung Kim,
	Alexander Shishkin, Adrian Hunter, Peter Zijlstra, Ingo Molnar,
	linux-kernel

On Mon, Oct 12, 2020 at 11:57:00AM +0300, Alexey Budankov wrote:
> diff --git a/tools/perf/util/session.h b/tools/perf/util/session.h
> index abdb8518a81f..4fc9ccdf7970 100644
> --- a/tools/perf/util/session.h
> +++ b/tools/perf/util/session.h
> @@ -42,6 +42,7 @@ struct reader {
>  	u64		 data_size;
>  	u64		 data_offset;
>  	reader_cb_t	 process;
> +	struct zstd_data zstd_data;

Just to make sure, there is a suitable definition of zstd_data in case
the Zstd library not available?

-Andi


^ permalink raw reply	[flat|nested] 45+ messages in thread

* Re: [PATCH v1 06/15] perf session: load data directory into tool process memory
  2020-10-12  8:58 ` [PATCH v1 06/15] perf session: load data directory into tool process memory Alexey Budankov
@ 2020-10-12 16:09   ` Andi Kleen
  2020-10-12 16:49     ` Alexey Budankov
  0 siblings, 1 reply; 45+ messages in thread
From: Andi Kleen @ 2020-10-12 16:09 UTC (permalink / raw)
  To: Alexey Budankov
  Cc: Arnaldo Carvalho de Melo, Jiri Olsa, Namhyung Kim,
	Alexander Shishkin, Adrian Hunter, Peter Zijlstra, Ingo Molnar,
	linux-kernel

On Mon, Oct 12, 2020 at 11:58:58AM +0300, Alexey Budankov wrote:
> 
> Read trace files located at data directory into tool process memory.
> Basic analysis support of data directories is provided for report
> mode. Raw dump (-D) and aggregated reports are available for data
> directories, still with no memory consumption optimizations. However
> data directories collected with --compression-level option enabled
> can be analyzed with little less memory because trace files are
> unmaped from tool process memory after loading collected data.
> The implementation is based on the prototype [1], [2].

Should credit the author(s) of the prototypes.
> 
> [1] git clone https://git.kernel.org/pub/scm/linux/kernel/git/jolsa/perf.git -b perf/record_threads
> [2] https://lore.kernel.org/lkml/20180913125450.21342-1-jolsa@kernel.org/
> 
> Signed-off-by: Alexey Budankov <alexey.budankov@linux.intel.com>
> ---
>  tools/perf/util/session.c | 48 +++++++++++++++++++++++++++++++++++++++
>  tools/perf/util/session.h |  1 +
>  2 files changed, 49 insertions(+)
> 
> diff --git a/tools/perf/util/session.c b/tools/perf/util/session.c
> index 6afc670fdf0c..0752eec19813 100644
> --- a/tools/perf/util/session.c
> +++ b/tools/perf/util/session.c
> @@ -2212,6 +2212,17 @@ reader__process_events(struct reader *rd, struct perf_session *session,
>  		goto more;
>  
>  out:
> +	if (rd->unmap_file) {
> +		int i;
> +
> +		for (i = 0; i < NUM_MMAPS; i++) {
> +			if (mmaps[i]) {
> +				munmap(mmaps[i], mmap_size);
> +				mmaps[i] = NULL;

Okay so where is the mmap? Would make more sense to put that
into the same patch as who adds the mmap. Or is the mmap
code already in the perf source? In that case it should
probably be some common helper with the existing users.


^ permalink raw reply	[flat|nested] 45+ messages in thread

* Re: [PATCH v1 06/15] perf session: load data directory into tool process memory
  2020-10-12 16:09   ` Andi Kleen
@ 2020-10-12 16:49     ` Alexey Budankov
  2020-10-13 11:30       ` Alexey Budankov
  0 siblings, 1 reply; 45+ messages in thread
From: Alexey Budankov @ 2020-10-12 16:49 UTC (permalink / raw)
  To: Andi Kleen
  Cc: Arnaldo Carvalho de Melo, Jiri Olsa, Namhyung Kim,
	Alexander Shishkin, Adrian Hunter, Peter Zijlstra, Ingo Molnar,
	linux-kernel


On 12.10.2020 19:09, Andi Kleen wrote:
> On Mon, Oct 12, 2020 at 11:58:58AM +0300, Alexey Budankov wrote:
>>
>> Read trace files located at data directory into tool process memory.
>> Basic analysis support of data directories is provided for report
>> mode. Raw dump (-D) and aggregated reports are available for data
>> directories, still with no memory consumption optimizations. However
>> data directories collected with --compression-level option enabled
>> can be analyzed with little less memory because trace files are
>> unmaped from tool process memory after loading collected data.
>> The implementation is based on the prototype [1], [2].
> 
> Should credit the author(s) of the prototypes.

Sure. Will explicitly add:
 
Suggested-by: Jiri Olsa <jolsa@kernel.org>
Suggested-by: Namhyung Kim <namhyung@kernel.org>

here and [PATCH v2 15/15], additionally to [1], [2] below.

Thanks for pointing this out!

>>
>> [1] git clone https://git.kernel.org/pub/scm/linux/kernel/git/jolsa/perf.git -b perf/record_threads
>> [2] https://lore.kernel.org/lkml/20180913125450.21342-1-jolsa@kernel.org/
>>
>> Signed-off-by: Alexey Budankov <alexey.budankov@linux.intel.com>
>> ---
>>  tools/perf/util/session.c | 48 +++++++++++++++++++++++++++++++++++++++
>>  tools/perf/util/session.h |  1 +
>>  2 files changed, 49 insertions(+)
>>
>> diff --git a/tools/perf/util/session.c b/tools/perf/util/session.c
>> index 6afc670fdf0c..0752eec19813 100644
>> --- a/tools/perf/util/session.c
>> +++ b/tools/perf/util/session.c
>> @@ -2212,6 +2212,17 @@ reader__process_events(struct reader *rd, struct perf_session *session,
>>  		goto more;
>>  
>>  out:
>> +	if (rd->unmap_file) {
>> +		int i;
>> +
>> +		for (i = 0; i < NUM_MMAPS; i++) {
>> +			if (mmaps[i]) {
>> +				munmap(mmaps[i], mmap_size);
>> +				mmaps[i] = NULL;
> 
> Okay so where is the mmap? Would make more sense to put that
> into the same patch as who adds the mmap. Or is the mmap
> code already in the perf source? In that case it should
> probably be some common helper with the existing users.

That mmap is already in the code. Agree, this part of the patch
can be applied prior the whole patch set.

Alexei

^ permalink raw reply	[flat|nested] 45+ messages in thread

* Re: [PATCH v1 05/15] perf session: introduce decompressor into trace reader object
  2020-10-12 16:05   ` Andi Kleen
@ 2020-10-12 16:50     ` Alexey Budankov
  0 siblings, 0 replies; 45+ messages in thread
From: Alexey Budankov @ 2020-10-12 16:50 UTC (permalink / raw)
  To: Andi Kleen
  Cc: Arnaldo Carvalho de Melo, Jiri Olsa, Namhyung Kim,
	Alexander Shishkin, Adrian Hunter, Peter Zijlstra, Ingo Molnar,
	linux-kernel


On 12.10.2020 19:05, Andi Kleen wrote:
> On Mon, Oct 12, 2020 at 11:57:00AM +0300, Alexey Budankov wrote:
>> diff --git a/tools/perf/util/session.h b/tools/perf/util/session.h
>> index abdb8518a81f..4fc9ccdf7970 100644
>> --- a/tools/perf/util/session.h
>> +++ b/tools/perf/util/session.h
>> @@ -42,6 +42,7 @@ struct reader {
>>  	u64		 data_size;
>>  	u64		 data_offset;
>>  	reader_cb_t	 process;
>> +	struct zstd_data zstd_data;
> 
> Just to make sure, there is a suitable definition of zstd_data in case
> the Zstd library not available?

That's right.

Alexei

> 
> -Andi
> 

^ permalink raw reply	[flat|nested] 45+ messages in thread

* Re: [PATCH v1 03/15] perf data: open data directory in read access mode
  2020-10-12 16:03   ` Andi Kleen
@ 2020-10-12 16:52     ` Alexey Budankov
  2020-10-13 16:22       ` Arnaldo Carvalho de Melo
  0 siblings, 1 reply; 45+ messages in thread
From: Alexey Budankov @ 2020-10-12 16:52 UTC (permalink / raw)
  To: Andi Kleen
  Cc: Arnaldo Carvalho de Melo, Jiri Olsa, Namhyung Kim,
	Alexander Shishkin, Adrian Hunter, Peter Zijlstra, Ingo Molnar,
	linux-kernel


On 12.10.2020 19:03, Andi Kleen wrote:
> On Mon, Oct 12, 2020 at 11:55:07AM +0300, Alexey Budankov wrote:
>>
>> Open files located at data directory in case of read access mode.
> 
> Need some rationale. Is this a bug fix? Did the case not matter
> before?

This is not a bug fix. The case didn't matter before.

> 
>>
>> Signed-off-by: Alexey Budankov <alexey.budankov@linux.intel.com>
>> ---
>>  tools/perf/util/data.c | 4 ++++
>>  1 file changed, 4 insertions(+)
>>
>> diff --git a/tools/perf/util/data.c b/tools/perf/util/data.c
>> index c47aa34fdc0a..6ad61ac6ba67 100644
>> --- a/tools/perf/util/data.c
>> +++ b/tools/perf/util/data.c
>> @@ -321,6 +321,10 @@ static int open_dir(struct perf_data *data)
>>  		return -1;
>>  
>>  	ret = open_file(data);
>> +	if (!ret && perf_data__is_dir(data)) {
>> +		if (perf_data__is_read(data))
>> +			ret = perf_data__open_dir(data);
>> +	}
>>  
>>  	/* Cleanup whatever we managed to create so far. */
>>  	if (ret && perf_data__is_write(data))
>> -- 
>> 2.24.1

Alexei


^ permalink raw reply	[flat|nested] 45+ messages in thread

* Re: [PATCH v1 02/15] perf report: output trace file name in raw trace dump
  2020-10-12 16:01   ` Andi Kleen
@ 2020-10-12 17:06     ` Alexey Budankov
  2020-10-20 16:31       ` Alexey Budankov
  0 siblings, 1 reply; 45+ messages in thread
From: Alexey Budankov @ 2020-10-12 17:06 UTC (permalink / raw)
  To: Andi Kleen
  Cc: Arnaldo Carvalho de Melo, Jiri Olsa, Namhyung Kim,
	Alexander Shishkin, Adrian Hunter, Peter Zijlstra, Ingo Molnar,
	linux-kernel


On 12.10.2020 19:01, Andi Kleen wrote:
> On Mon, Oct 12, 2020 at 11:54:24AM +0300, Alexey Budankov wrote:
>> @@ -180,7 +183,8 @@ static int ordered_events__deliver_event(struct ordered_events *oe,
>>  						    ordered_events);
>>  
>>  	return perf_session__deliver_event(session, event->event,
>> -					   session->tool, event->file_offset);
>> +					   session->tool, event->file_offset,
>> +					   event->file_path);
> 
> Wouldn't it be better to pass "event" around now, which would contain at least
> four of the arguments.
> 
> These functions are getting entirely too many arguments.

Well, either approach is possible, and even shrink of two arguments kept at session object.
However changing function signature more than posted can cause bigger adjustments all over
the code. So this needs more evaluation prior implementation. 

Alexei

^ permalink raw reply	[flat|nested] 45+ messages in thread

* Re: [PATCH v1 06/15] perf session: load data directory into tool process memory
  2020-10-12 16:49     ` Alexey Budankov
@ 2020-10-13 11:30       ` Alexey Budankov
  2020-10-21  6:54         ` Namhyung Kim
  0 siblings, 1 reply; 45+ messages in thread
From: Alexey Budankov @ 2020-10-13 11:30 UTC (permalink / raw)
  To: Andi Kleen
  Cc: Arnaldo Carvalho de Melo, Jiri Olsa, Namhyung Kim,
	Alexander Shishkin, Adrian Hunter, Peter Zijlstra, Ingo Molnar,
	linux-kernel


On 12.10.2020 19:49, Alexey Budankov wrote:
> 
> On 12.10.2020 19:09, Andi Kleen wrote:
>> On Mon, Oct 12, 2020 at 11:58:58AM +0300, Alexey Budankov wrote:
>>>
>>> Read trace files located at data directory into tool process memory.
>>> Basic analysis support of data directories is provided for report
>>> mode. Raw dump (-D) and aggregated reports are available for data
>>> directories, still with no memory consumption optimizations. However
>>> data directories collected with --compression-level option enabled
>>> can be analyzed with little less memory because trace files are
>>> unmaped from tool process memory after loading collected data.
>>> The implementation is based on the prototype [1], [2].
>>
>> Should credit the author(s) of the prototypes.
> 
> Sure. Will explicitly add:
>  
> Suggested-by: Jiri Olsa <jolsa@kernel.org>
> Suggested-by: Namhyung Kim <namhyung@kernel.org>
> 
> here and [PATCH v2 15/15], additionally to [1], [2] below.
> 
> Thanks for pointing this out!
> 
>>>
>>> [1] git clone https://git.kernel.org/pub/scm/linux/kernel/git/jolsa/perf.git -b perf/record_threads
>>> [2] https://lore.kernel.org/lkml/20180913125450.21342-1-jolsa@kernel.org/
>>>
>>> Signed-off-by: Alexey Budankov <alexey.budankov@linux.intel.com>
>>> ---
>>>  tools/perf/util/session.c | 48 +++++++++++++++++++++++++++++++++++++++
>>>  tools/perf/util/session.h |  1 +
>>>  2 files changed, 49 insertions(+)
>>>
>>> diff --git a/tools/perf/util/session.c b/tools/perf/util/session.c
>>> index 6afc670fdf0c..0752eec19813 100644
>>> --- a/tools/perf/util/session.c
>>> +++ b/tools/perf/util/session.c
>>> @@ -2212,6 +2212,17 @@ reader__process_events(struct reader *rd, struct perf_session *session,
>>>  		goto more;
>>>  
>>>  out:
>>> +	if (rd->unmap_file) {
>>> +		int i;
>>> +
>>> +		for (i = 0; i < NUM_MMAPS; i++) {
>>> +			if (mmaps[i]) {
>>> +				munmap(mmaps[i], mmap_size);
>>> +				mmaps[i] = NULL;
>>
>> Okay so where is the mmap? Would make more sense to put that
>> into the same patch as who adds the mmap. Or is the mmap
>> code already in the perf source? In that case it should
>> probably be some common helper with the existing users.
> 
> That mmap is already in the code. Agree, this part of the patch
> can be applied prior the whole patch set.

I take it back. Single trace file can't be unmapped yet since it also
contains not only compressed records but also other records backing
the data for aggregated analysis.

Alexei

^ permalink raw reply	[flat|nested] 45+ messages in thread

* Re: [PATCH v1 00/15] Introduce threaded trace streaming for basic perf record operation
  2020-10-12  8:50 [PATCH v1 00/15] Introduce threaded trace streaming for basic perf record operation Alexey Budankov
                   ` (14 preceding siblings ...)
  2020-10-12  9:13 ` [PATCH v1 15/15] perf record: introduce --threads command line option Alexey Budankov
@ 2020-10-13 16:20 ` Arnaldo Carvalho de Melo
  2020-10-14 12:15   ` Alexey Budankov
  2020-10-14 17:27 ` Ingo Molnar
  16 siblings, 1 reply; 45+ messages in thread
From: Arnaldo Carvalho de Melo @ 2020-10-13 16:20 UTC (permalink / raw)
  To: Alexey Budankov
  Cc: Jiri Olsa, Namhyung Kim, Alexander Shishkin, Andi Kleen,
	Adrian Hunter, Peter Zijlstra, Ingo Molnar, linux-kernel

Em Mon, Oct 12, 2020 at 11:50:29AM +0300, Alexey Budankov escreveu:
> 
> Patch set provides threaded trace streaming for base perf record
> operation. Provided streaming mode (--threads) mitigates profiling
> data losses and resolves scalability issues of serial and asynchronous
> (--aio) trace streaming modes on multicore server systems. The patch
> set is based on the prototype [1], [2] and the most closely relates
> to mode 3) "mode that creates thread for every monitored memory map".
> 
> The threaded mode executes one-to-one mapping of trace streaming threads
> to mapped data buffers and streaming into per-CPU trace files located
> at data directory. The data buffers and threads are affined to NUMA
> nodes and monitored CPUs according to system topology. --cpu option
> can be used to specify exact CPUs to be monitored.
> 
> Basic analysis of data directories is provided for perf report mode.
> Raw dump (-D) and aggregated reports are available for data directories,
> still with no memory consumption optimizations. However data directories
> collected with --compression-level option enabled can be analyzed with
> little less memory because trace files are unmaped from tool process
> memory after loading collected data.
> 
> Provided streaming mode is available with Zstd compression/decompression
> (--compression-level) and handling of external commands (--control).
> AUX area tracing, related and derived modes like --snapshot or
> --aux-sample are not enabled. --switch-output, --switch-output-event, 
> --switch-max-files and --timestamp-filename options are not enabled.

Would be interesting to spell out what are the difficulties to have
those options working with this threaded mode, as I expect that once
this is all reviewed and tested we should switch to it by default, no?

- Arnaldo

> Threaded trace streaming is not enabled for pipe mode. Asynchronous
> (--aio) trace streaming and affinity (--affinity) modes are mutually
> exclusive to threaded streaming mode.
> 
> See testing results and validation examples below.
> 
> [1] git clone https://git.kernel.org/pub/scm/linux/kernel/git/jolsa/perf.git -b perf/record_threads
> [2] https://lore.kernel.org/lkml/20180913125450.21342-1-jolsa@kernel.org/
> 
> ---
> Alexey Budankov (15):
>   perf session: introduce trace file path to be shown in raw trace dump
>   perf report: output trace file name in raw trace dump
>   perf data: open data directory in read access mode
>   perf session: move reader object definition to header file
>   perf session: introduce decompressor into trace reader object
>   perf session: load data directory into tool process memory
>   perf record: introduce trace file, compressor and stats in mmap object
>   perf record: write trace data into mmap trace files
>   perf record: introduce thread specific objects for trace streaming
>   perf record: manage thread specific data array
>   perf evlist: introduce evlist__ctlfd_update() to update ctl fd status
>   perf record: introduce thread local variable for trace streaming
>   perf record: stop threads in the end of trace streaming
>   perf record: start threads in the beginning of trace streaming
>   perf record: introduce --threads command line option
> 
>  tools/perf/Documentation/perf-record.txt |   7 +
>  tools/perf/builtin-record.c              | 514 +++++++++++++++++++++--
>  tools/perf/util/data.c                   |   4 +
>  tools/perf/util/evlist.c                 |  16 +
>  tools/perf/util/evlist.h                 |   1 +
>  tools/perf/util/mmap.c                   |   6 +
>  tools/perf/util/mmap.h                   |   5 +
>  tools/perf/util/ordered-events.h         |   1 +
>  tools/perf/util/record.h                 |   1 +
>  tools/perf/util/session.c                | 150 ++++---
>  tools/perf/util/session.h                |  28 ++
>  tools/perf/util/tool.h                   |   3 +-
>  12 files changed, 635 insertions(+), 101 deletions(-)
> 
> ---
> Testing results:
> 
>  $ perf test
>  1: vmlinux symtab matches kallsyms                                 : Skip
>  2: Detect openat syscall event                                     : Ok
>  3: Detect openat syscall event on all cpus                         : Ok
>  4: Read samples using the mmap interface                           : Ok
>  5: Test data source output                                         : Ok
>  6: Parse event definition strings                                  : Ok
>  7: Simple expression parser                                        : Ok
>  8: PERF_RECORD_* events & perf_sample fields                       : Ok
>  9: Parse perf pmu format                                           : Ok
> 10: PMU events                                                      :
> 10.1: PMU event table sanity                                        : Ok
> 10.2: PMU event map aliases                                         : Ok
> 10.3: Parsing of PMU event table metrics                            : Skip (some metrics failed)
> 10.4: Parsing of PMU event table metrics with fake PMUs             : Ok
> 11: DSO data read                                                   : Ok
> 12: DSO data cache                                                  : Ok
> 13: DSO data reopen                                                 : FAILED!
> 14: Roundtrip evsel->name                                           : Ok
> 15: Parse sched tracepoints fields                                  : Ok
> 16: syscalls:sys_enter_openat event fields                          : Ok
> 17: Setup struct perf_event_attr                                    : Ok
> 18: Match and link multiple hists                                   : Ok
> 19: 'import perf' in python                                         : FAILED!
> 20: Breakpoint overflow signal handler                              : Ok
> 21: Breakpoint overflow sampling                                    : Ok
> 22: Breakpoint accounting                                           : Ok
> 23: Watchpoint                                                      :
> 23.1: Read Only Watchpoint                                          : Skip
> 23.2: Write Only Watchpoint                                         : Ok
> 23.3: Read / Write Watchpoint                                       : Ok
> 23.4: Modify Watchpoint                                             : Ok
> 24: Number of exit events of a simple workload                      : Ok
> 25: Software clock events period values                             : Ok
> 26: Object code reading                                             : Ok
> 27: Sample parsing                                                  : Ok
> 28: Use a dummy software event to keep tracking                     : Ok
> 29: Parse with no sample_id_all bit set                             : Ok
> 30: Filter hist entries                                             : Ok
> 31: Lookup mmap thread                                              : Ok
> 32: Share thread maps                                               : Ok
> 33: Sort output of hist entries                                     : Ok
> 34: Cumulate child hist entries                                     : Ok
> 35: Track with sched_switch                                         : Ok
> 36: Filter fds with revents mask in a fdarray                       : Ok
> 37: Add fd to a fdarray, making it autogrow                         : Ok
> 38: kmod_path__parse                                                : Ok
> 39: Thread map                                                      : Ok
> 40: LLVM search and compile                                         :
> 40.1: Basic BPF llvm compile                                        : Skip
> 40.2: kbuild searching                                              : Skip
> 40.3: Compile source for BPF prologue generation                    : Skip
> 40.4: Compile source for BPF relocation                             : Skip
> 41: Session topology                                                : Ok
> 42: BPF filter                                                      :
> 42.1: Basic BPF filtering                                           : Skip
> 42.2: BPF pinning                                                   : Skip
> 42.3: BPF prologue generation                                       : Skip
> 42.4: BPF relocation checker                                        : Skip
> 43: Synthesize thread map                                           : Ok
> 44: Remove thread map                                               : Ok
> 45: Synthesize cpu map                                              : Ok
> 46: Synthesize stat config                                          : Ok
> 47: Synthesize stat                                                 : Ok
> 48: Synthesize stat round                                           : Ok
> 49: Synthesize attr update                                          : Ok
> 50: Event times                                                     : Ok
> 51: Read backward ring buffer                                       : Ok
> 52: Print cpu map                                                   : Ok
> 53: Merge cpu map                                                   : Ok
> 54: Probe SDT events                                                : Ok
> 55: is_printable_array                                              : Ok
> 56: Print bitmap                                                    : Ok
> 57: perf hooks                                                      : Ok
> 58: builtin clang support                                           : Skip (not compiled in)
> 59: unit_number__scnprintf                                          : Ok
> 60: mem2node                                                        : Ok
> 61: time utils                                                      : Ok
> 62: Test jit_write_elf                                              : Ok
> 63: Test libpfm4 support                                            : Skip (not compiled in)
> 64: Test api io                                                     : Ok
> 65: maps__merge_in                                                  : Ok
> 66: Demangle Java                                                   : Ok
> 67: Parse and process metrics                                       : Ok
> 68: PE file support                                                 : Ok
> 69: Event expansion for cgroups                                     : Ok
> 70: x86 rdpmc                                                       : Ok
> 71: Convert perf time to TSC                                        : Ok
> 72: DWARF unwind                                                    : Ok
> 73: x86 instruction decoder - new instructions                      : Ok
> 74: Intel PT packet decoder                                         : Ok
> 75: x86 bp modify                                                   : Ok
> 76: Check open filename arg using perf trace + vfs_getname          : FAILED!
> 77: Add vfs_getname probe to get syscall args filenames             : FAILED!
> 78: probe libc's inet_pton & backtrace it with ping                 : Ok
> 79: Use vfs_getname probe to get syscall args filenames             : FAILED!
> 80: Zstd perf.data compression/decompression                        : Ok
> 81: Check Arm CoreSight trace data recording and synthesized samples: Skip
> 
> Validation:
> 
>   $ tools/perf/perf record --threads --compression-level=3 -e cycles,instructions -- ../../bench/matrix/linux/matrix.gcc.g.O3
>   Addr of buf1 = 0x7f8af24ae010
>   Offs of buf1 = 0x7f8af24ae180
>   Addr of buf2 = 0x7f8af04ad010
>   Offs of buf2 = 0x7f8af04ad1c0
>   Addr of buf3 = 0x7f8aee4ac010
>   Offs of buf3 = 0x7f8aee4ac100
>   Addr of buf4 = 0x7f8aec4ab010
>   Offs of buf4 = 0x7f8aec4ab140
>   Threads #: 8 Pthreads
>   Matrix size: 2048
>   Using multiply kernel: multiply1
>   Execution time = 24.579 seconds
>   [ perf record: Woken up 972 times to write data ]
>   [ perf record: Captured and wrote 11.280 MB perf.data (403 samples), compressed (original 66.670 MB, ratio is 5.926) ]
> 
>   $ perf report --header --stats
>   # ========
>   # captured on    : Wed Oct  7 19:07:50 2020
>   # header version : 1
>   # data offset    : 504
>   # data size      : 30056
>   # feat offset    : 30560
>   # hostname : nntvtune39
>   # os release : 5.8.8-100.fc31.x86_64
>   # perf version : 4.13.rc5.gdf19b1347b82
>   # arch : x86_64
>   # nrcpus online : 8
>   # nrcpus avail : 8
>   # cpudesc : Intel(R) Core(TM) i7-6700K CPU @ 4.00GHz
>   # cpuid : GenuineIntel,6,94,3
>   # total memory : 16119484 kB
>   # cmdline : /root/abudanko/kernel/acme/tools/perf/perf record --threads --compression-level=3 -e cycles,instructions -- ../../bench  /matrix/linux/matrix.gcc.g.O3 
>   # event : name = cycles, , id = { 54562, 54563, 54564, 54565, 54566, 54567, 54568, 54569 }, size = 120, { sample_period, sample_freq } = 4000, sample_type = IP|TID|TIME|ID|>
>   # event : name = instructions, , id = { 54570, 54571, 54572, 54573, 54574, 54575, 54576, 54577 }, size = 120, config = 0x1, { sample_period, sample_freq } = 4000, sample_ty>
>   # sibling sockets : 0-7
>   # sibling dies    : 0-7
>   # sibling threads : 0,4
>   # sibling threads : 1,5
>   # sibling threads : 2,6
>   # sibling threads : 3,7
>   # CPU 0: Core ID 0, Die ID 0, Socket ID 0
>   # CPU 1: Core ID 1, Die ID 0, Socket ID 0
>   # CPU 2: Core ID 2, Die ID 0, Socket ID 0
>   # CPU 3: Core ID 3, Die ID 0, Socket ID 0
>   # CPU 4: Core ID 0, Die ID 0, Socket ID 0
>   # CPU 5: Core ID 1, Die ID 0, Socket ID 0
>   # CPU 6: Core ID 2, Die ID 0, Socket ID 0
>   # CPU 7: Core ID 3, Die ID 0, Socket ID 0
>   # node0 meminfo  : total = 16119484 kB, free = 933088 kB
>   # node0 cpu list : 0-7
>   # pmu mappings: intel_pt = 8, software = 1, power = 19, uprobe = 7, uncore_imc = 11, cpu = 4, cstate_core = 17, uncore_cbox_2 = 14, breakpoint = 5, uncore_cbox_0 = 12, trac>
>   # CPU cache info:
>   #  L1 Data                 32K [0,4]
>   #  L1 Instruction          32K [0,4]
>   #  L1 Data                 32K [1,5]
>   #  L1 Instruction          32K [1,5]
>   #  L1 Data                 32K [1,5]
>   #  L1 Instruction          32K [1,5]
>   #  L1 Data                 32K [2,6]
>   #  L1 Instruction          32K [2,6]
>   #  L1 Data                 32K [3,7]
>   #  L1 Instruction          32K [3,7]
>   #  L2 Unified             256K [0,4]
>   #  L2 Unified             256K [1,5]
>   #  L2 Unified             256K [2,6]
>   #  L2 Unified             256K [3,7]
>   #  L3 Unified            8192K [0-7]
>   # time of first sample : 0.000000
>   # time of last sample : 0.000000
>   # sample duration :      0.000 ms
>   # memory nodes (nr 1, block size 0x8000000):
>   #    0 [16G]: 0-23,32-135
>   # directory data version : 1
>   # bpf_prog_info 835: bpf_prog_6deef7357e7b4530 addr 0xffffffffc07c5138 size 66
>   # bpf_prog_info 836: bpf_prog_6deef7357e7b4530 addr 0xffffffffc0804558 size 66
>   # bpf_prog_info 837: bpf_prog_84efc2eecc454ca6 addr 0xffffffffc080804c size 373
>   # bpf_prog_info 838: bpf_prog_6deef7357e7b4530 addr 0xffffffffc0817f50 size 66
>   # bpf_prog_info 839: bpf_prog_6deef7357e7b4530 addr 0xffffffffc082aee8 size 66
>   # bpf_prog_info 840: bpf_prog_5a2b06eab81b8f51 addr 0xffffffffc082c0fc size 1132
>   # bpf_prog_info 841: bpf_prog_8a642ed2819e9acc addr 0xffffffffc0aac380 size 62
>   # bpf_prog_info 842: bpf_prog_8a642ed2819e9acc addr 0xffffffffc0d4c970 size 62
>   # bpf_prog_info 843: bpf_prog_6deef7357e7b4530 addr 0xffffffffc0d4e8bc size 66
>   # bpf_prog_info 844: bpf_prog_6deef7357e7b4530 addr 0xffffffffc0d5050c size 66
>   # bpf_prog_info 845: bpf_prog_3ff695c0afc2579c addr 0xffffffffc0d56098 size 1552
>   # bpf_prog_info 846: bpf_prog_6deef7357e7b4530 addr 0xffffffffc0d58364 size 66
>   # bpf_prog_info 847: bpf_prog_6deef7357e7b4530 addr 0xffffffffc0d6ec00 size 66
>   # compressed : Zstd, level = 3, ratio = 6
>   # cpu pmu capabilities: branches=32, max_precise=3, pmu_name=skylake
>   # missing features: TRACING_DATA BUILD_ID BRANCH_STACK GROUP_DESC AUXTRACE STAT CLOCKID CLOCK_DATA 
>   # ========
>   #
>   Aggregated stats:
>            TOTAL events:    1457107
>             MMAP events:         98
>             LOST events:          0
>             COMM events:          2
>             EXIT events:          6
>         THROTTLE events:          0
>       UNTHROTTLE events:          0
>             FORK events:          8
>             READ events:          0
>           SAMPLE events:    1456388
>            MMAP2 events:          7
>              AUX events:          0
>     ITRACE_START events:          0
>     LOST_SAMPLES events:          0
>           SWITCH events:          0
>  SWITCH_CPU_WIDE events:          0
>       NAMESPACES events:          0
>          KSYMBOL events:         13
>        BPF_EVENT events:         13
>           CGROUP events:        166
>        TEXT_POKE events:          0
>             ATTR events:          0
>       EVENT_TYPE events:          0
>     TRACING_DATA events:          0
>         BUILD_ID events:          0
>   FINISHED_ROUND events:          0
>         ID_INDEX events:          0
>    AUXTRACE_INFO events:          0
>         AUXTRACE events:          0
>   AUXTRACE_ERROR events:          0
>       THREAD_MAP events:          1
>          CPU_MAP events:          1
>      STAT_CONFIG events:          0
>             STAT events:          0
>       STAT_ROUND events:          0
>     EVENT_UPDATE events:          0
>        TIME_CONV events:          1
>          FEATURE events:          0
>       COMPRESSED events:        403
> 
>   $ tools/perf/perf record --threads --compression-level=3 -g --call-graph dwarf,1024 -e cycles,instructions -- ../../bench/matrix/linux/matrix.gcc.g.O3
>   Addr of buf1 = 0x7f15e8b89010
>   Offs of buf1 = 0x7f15e8b89180
>   Addr of buf2 = 0x7f15e6b88010
>   Offs of buf2 = 0x7f15e6b881c0
>   Addr of buf3 = 0x7f15e4b87010
>   Offs of buf3 = 0x7f15e4b87100
>   Addr of buf4 = 0x7f15e2b86010
>   Offs of buf4 = 0x7f15e2b86140
>   Threads #: 8 Pthreads
>   Matrix size: 2048
>   Using multiply kernel: multiply1
>   Execution time = 24.281 seconds
>   [ perf record: Woken up 6923 times to write data ]
>   [ perf record: Captured and wrote 34.933 MB perf.data (10414 samples), compressed (original 1728.795 MB, ratio is 49.530) ]
> 
>   $ perf report --stdio
>   # To display the perf.data header info, please use --header/--header-only options.
>   #
>   #
>   # Total Lost Samples: 0
>   #
>   # Samples: 708K of event 'cycles'
>   # Event count (approx.): 277718834163
>   #
>   # Children      Self  Command          Shared Object             Symbol                                                  
>   # ........  ........  ...............  ........................  ........................................................
>   #
>     99.97%     0.00%  matrix.gcc.g.O3  libc-2.30.so              [.] __GI___clone (inlined)
>             |
>             ---__GI___clone (inlined)
>                |          
>                 --99.97%--start_thread
>                           |          
>                            --99.97%--ThreadFunction
>                                      multiply1
>                                      |          
>                                       --0.96%--asm_sysvec_apic_timer_interrupt
>                                                 |          
>                                                  --0.70%--subflow_syn_recv_sock
>                                                            |          
>                                                             --0.60%--__sysvec_apic_timer_interrupt
>                                                                       |          
>                                                                        --0.57%--hrtimer_interrupt
> 
>     99.97%     0.00%  matrix.gcc.g.O3  libpthread-2.30.so        [.] start_thread
>             |
>             ---start_thread
>                |          
>                 --99.97%--ThreadFunction
>                           |          
>                            --99.97%--multiply1
>                                      |          
>                                       --0.96%--asm_sysvec_apic_timer_interrupt
>                                                 |          
>                                                  --0.70%--subflow_syn_recv_sock
>                                                            |          
>                                                             --0.60%--__sysvec_apic_timer_interrupt
>                                                                       |          
>                                                                        --0.57%--hrtimer_interrupt
> 
>     99.97%     0.00%  matrix.gcc.g.O3  matrix.gcc.g.O3           [.] ThreadFunction
>             |
>             ---ThreadFunction
>                |          
>                 --99.97%--multiply1
>                           |          
>                            --0.96%--asm_sysvec_apic_timer_interrupt
>                                      |          
>                                       --0.70%--subflow_syn_recv_sock
>                                                 |          
>                                                  --0.60%--__sysvec_apic_timer_interrupt
>                                                            |          
>                                                             --0.57%--hrtimer_interrupt
> 
>     99.97%    98.68%  matrix.gcc.g.O3  matrix.gcc.g.O3           [.] multiply1
>             |          
>             |--98.68%--__GI___clone (inlined)
>             |          start_thread
>             |          ThreadFunction
>             |          multiply1
>             |          
>              --1.28%--multiply1
>                        |          
>                         --0.96%--asm_sysvec_apic_timer_interrupt
>                                   |          
>                                    --0.70%--subflow_syn_recv_sock
>                                              |          
>                                               --0.60%--__sysvec_apic_timer_interrupt
>                                                         |          
>                                                          --0.57%--hrtimer_interrupt
> 
>      0.97%     0.05%  matrix.gcc.g.O3  [kernel.kallsyms]         [k] asm_sysvec_apic_timer_interrupt
>             |          
>              --0.91%--asm_sysvec_apic_timer_interrupt
>                        |          
>                         --0.70%--subflow_syn_recv_sock
> 
> 
> -- 
> 2.24.1
> 

-- 

- Arnaldo

^ permalink raw reply	[flat|nested] 45+ messages in thread

* Re: [PATCH v1 03/15] perf data: open data directory in read access mode
  2020-10-12 16:52     ` Alexey Budankov
@ 2020-10-13 16:22       ` Arnaldo Carvalho de Melo
  2020-10-14 10:39         ` Alexey Budankov
  0 siblings, 1 reply; 45+ messages in thread
From: Arnaldo Carvalho de Melo @ 2020-10-13 16:22 UTC (permalink / raw)
  To: Alexey Budankov
  Cc: Andi Kleen, Jiri Olsa, Namhyung Kim, Alexander Shishkin,
	Adrian Hunter, Peter Zijlstra, Ingo Molnar, linux-kernel

Em Mon, Oct 12, 2020 at 07:52:33PM +0300, Alexey Budankov escreveu:
> 
> On 12.10.2020 19:03, Andi Kleen wrote:
> > On Mon, Oct 12, 2020 at 11:55:07AM +0300, Alexey Budankov wrote:
> >>
> >> Open files located at data directory in case of read access mode.
> > 
> > Need some rationale. Is this a bug fix? Did the case not matter
> > before?
> 
> This is not a bug fix. The case didn't matter before.

In such cases can you please write a paragraph explaining why it didn't
matter before and now it is required?

- Arnaldo
 
 > 
> >>
> >> Signed-off-by: Alexey Budankov <alexey.budankov@linux.intel.com>
> >> ---
> >>  tools/perf/util/data.c | 4 ++++
> >>  1 file changed, 4 insertions(+)
> >>
> >> diff --git a/tools/perf/util/data.c b/tools/perf/util/data.c
> >> index c47aa34fdc0a..6ad61ac6ba67 100644
> >> --- a/tools/perf/util/data.c
> >> +++ b/tools/perf/util/data.c
> >> @@ -321,6 +321,10 @@ static int open_dir(struct perf_data *data)
> >>  		return -1;
> >>  
> >>  	ret = open_file(data);
> >> +	if (!ret && perf_data__is_dir(data)) {
> >> +		if (perf_data__is_read(data))
> >> +			ret = perf_data__open_dir(data);
> >> +	}
> >>  
> >>  	/* Cleanup whatever we managed to create so far. */
> >>  	if (ret && perf_data__is_write(data))
> >> -- 
> >> 2.24.1
> 
> Alexei
> 

-- 

- Arnaldo

^ permalink raw reply	[flat|nested] 45+ messages in thread

* Re: [PATCH v1 02/15] perf report: output trace file name in raw trace dump
  2020-10-12  8:54 ` [PATCH v1 02/15] perf report: output trace file name " Alexey Budankov
  2020-10-12 16:01   ` Andi Kleen
@ 2020-10-13 19:54   ` Jiri Olsa
  2020-10-13 21:23     ` Alexey Budankov
  1 sibling, 1 reply; 45+ messages in thread
From: Jiri Olsa @ 2020-10-13 19:54 UTC (permalink / raw)
  To: Alexey Budankov
  Cc: Arnaldo Carvalho de Melo, Namhyung Kim, Alexander Shishkin,
	Andi Kleen, Adrian Hunter, Peter Zijlstra, Ingo Molnar,
	linux-kernel

On Mon, Oct 12, 2020 at 11:54:24AM +0300, Alexey Budankov wrote:
> 
> Output path of a trace file into raw dump (-D) <file_offset>@<path/file>.
> Print offset of PERF_RECORD_COMPRESSED record instead of zero for
> decompressed records:
>   0x2226a@perf.data [0x30]: event: 9
> or
>   0x15cc36@perf.data/data.7 [0x30]: event: 9
> 
> Signed-off-by: Alexey Budankov <alexey.budankov@linux.intel.com>

hi,
I'm getting:

  CC       builtin-inject.o
builtin-inject.c: In function ‘cmd_inject’:
builtin-inject.c:850:18: error: initialization of ‘int (*)(struct perf_session *, union perf_event *, u64,  const char *)’ {aka ‘int (*)(struct perf_session *, union perf_event *, long unsigned int,  const char *)’} from incompatible pointer type ‘int (*)(struct perf_session *, union perf_event *, u64)’ {aka ‘int (*)(struct perf_session *, union perf_event *, long unsigned int)’} [-Werror=incompatible-pointer-types]
  850 |    .compressed = perf_event__repipe_op4_synth,
      |                  ^~~~~~~~~~~~~~~~~~~~~~~~~~~~
builtin-inject.c:850:18: note: (near initialization for ‘inject.tool.compressed’)

it's probably recent build id changes 

jirka


> ---
>  tools/perf/util/session.c | 75 +++++++++++++++++++++++----------------
>  tools/perf/util/tool.h    |  3 +-
>  2 files changed, 46 insertions(+), 32 deletions(-)
> 
> diff --git a/tools/perf/util/session.c b/tools/perf/util/session.c
> index 4478ddae485b..6f09d506b2f6 100644
> --- a/tools/perf/util/session.c
> +++ b/tools/perf/util/session.c
> @@ -36,7 +36,8 @@
>  
>  #ifdef HAVE_ZSTD_SUPPORT
>  static int perf_session__process_compressed_event(struct perf_session *session,
> -						  union perf_event *event, u64 file_offset)
> +						  union perf_event *event, u64 file_offset,
> +						  const char *file_path)
>  {
>  	void *src;
>  	size_t decomp_size, src_size;
> @@ -58,6 +59,7 @@ static int perf_session__process_compressed_event(struct perf_session *session,
>  	}
>  
>  	decomp->file_pos = file_offset;
> +	decomp->file_path = file_path;
>  	decomp->mmap_len = mmap_len;
>  	decomp->head = 0;
>  
> @@ -98,7 +100,8 @@ static int perf_session__process_compressed_event(struct perf_session *session,
>  static int perf_session__deliver_event(struct perf_session *session,
>  				       union perf_event *event,
>  				       struct perf_tool *tool,
> -				       u64 file_offset);
> +				       u64 file_offset,
> +				       const char *file_path);
>  
>  static int perf_session__open(struct perf_session *session)
>  {
> @@ -180,7 +183,8 @@ static int ordered_events__deliver_event(struct ordered_events *oe,
>  						    ordered_events);
>  
>  	return perf_session__deliver_event(session, event->event,
> -					   session->tool, event->file_offset);
> +					   session->tool, event->file_offset,
> +					   event->file_path);
>  }
>  
>  struct perf_session *perf_session__new(struct perf_data *data,
> @@ -452,7 +456,8 @@ static int process_stat_round_stub(struct perf_session *perf_session __maybe_unu
>  
>  static int perf_session__process_compressed_event_stub(struct perf_session *session __maybe_unused,
>  						       union perf_event *event __maybe_unused,
> -						       u64 file_offset __maybe_unused)
> +						       u64 file_offset __maybe_unused,
> +						       const char *file_path __maybe_unused)
>  {
>         dump_printf(": unhandled!\n");
>         return 0;
> @@ -1227,13 +1232,14 @@ static void sample_read__printf(struct perf_sample *sample, u64 read_format)
>  }
>  
>  static void dump_event(struct evlist *evlist, union perf_event *event,
> -		       u64 file_offset, struct perf_sample *sample)
> +		       u64 file_offset, struct perf_sample *sample,
> +		       const char *file_path)
>  {
>  	if (!dump_trace)
>  		return;
>  
> -	printf("\n%#" PRIx64 " [%#x]: event: %d\n",
> -	       file_offset, event->header.size, event->header.type);
> +	printf("\n%#" PRIx64 "@%s [%#x]: event: %d\n",
> +	       file_offset, file_path, event->header.size, event->header.type);
>  
>  	trace_event(event);
>  	if (event->header.type == PERF_RECORD_SAMPLE && evlist->trace_event_sample_raw)
> @@ -1424,12 +1430,13 @@ static int machines__deliver_event(struct machines *machines,
>  				   struct evlist *evlist,
>  				   union perf_event *event,
>  				   struct perf_sample *sample,
> -				   struct perf_tool *tool, u64 file_offset)
> +				   struct perf_tool *tool, u64 file_offset,
> +				   const char *file_path)
>  {
>  	struct evsel *evsel;
>  	struct machine *machine;
>  
> -	dump_event(evlist, event, file_offset, sample);
> +	dump_event(evlist, event, file_offset, sample, file_path);
>  
>  	evsel = perf_evlist__id2evsel(evlist, sample->id);
>  
> @@ -1506,7 +1513,8 @@ static int machines__deliver_event(struct machines *machines,
>  static int perf_session__deliver_event(struct perf_session *session,
>  				       union perf_event *event,
>  				       struct perf_tool *tool,
> -				       u64 file_offset)
> +				       u64 file_offset,
> +				       const char *file_path)
>  {
>  	struct perf_sample sample;
>  	int ret;
> @@ -1524,7 +1532,7 @@ static int perf_session__deliver_event(struct perf_session *session,
>  		return 0;
>  
>  	ret = machines__deliver_event(&session->machines, session->evlist,
> -				      event, &sample, tool, file_offset);
> +				      event, &sample, tool, file_offset, file_path);
>  
>  	if (dump_trace && sample.aux_sample.size)
>  		auxtrace__dump_auxtrace_sample(session, &sample);
> @@ -1534,7 +1542,8 @@ static int perf_session__deliver_event(struct perf_session *session,
>  
>  static s64 perf_session__process_user_event(struct perf_session *session,
>  					    union perf_event *event,
> -					    u64 file_offset)
> +					    u64 file_offset,
> +					    const char *file_path)
>  {
>  	struct ordered_events *oe = &session->ordered_events;
>  	struct perf_tool *tool = session->tool;
> @@ -1544,7 +1553,7 @@ static s64 perf_session__process_user_event(struct perf_session *session,
>  
>  	if (event->header.type != PERF_RECORD_COMPRESSED ||
>  	    tool->compressed == perf_session__process_compressed_event_stub)
> -		dump_event(session->evlist, event, file_offset, &sample);
> +		dump_event(session->evlist, event, file_offset, &sample, file_path);
>  
>  	/* These events are processed right away */
>  	switch (event->header.type) {
> @@ -1603,9 +1612,9 @@ static s64 perf_session__process_user_event(struct perf_session *session,
>  	case PERF_RECORD_HEADER_FEATURE:
>  		return tool->feature(session, event);
>  	case PERF_RECORD_COMPRESSED:
> -		err = tool->compressed(session, event, file_offset);
> +		err = tool->compressed(session, event, file_offset, file_path);
>  		if (err)
> -			dump_event(session->evlist, event, file_offset, &sample);
> +			dump_event(session->evlist, event, file_offset, &sample, file_path);
>  		return err;
>  	default:
>  		return -EINVAL;
> @@ -1622,9 +1631,9 @@ int perf_session__deliver_synth_event(struct perf_session *session,
>  	events_stats__inc(&evlist->stats, event->header.type);
>  
>  	if (event->header.type >= PERF_RECORD_USER_TYPE_START)
> -		return perf_session__process_user_event(session, event, 0);
> +		return perf_session__process_user_event(session, event, 0, NULL);
>  
> -	return machines__deliver_event(&session->machines, evlist, event, sample, tool, 0);
> +	return machines__deliver_event(&session->machines, evlist, event, sample, tool, 0, NULL);
>  }
>  
>  static void event_swap(union perf_event *event, bool sample_id_all)
> @@ -1720,7 +1729,8 @@ int perf_session__peek_events(struct perf_session *session, u64 offset,
>  }
>  
>  static s64 perf_session__process_event(struct perf_session *session,
> -				       union perf_event *event, u64 file_offset)
> +				       union perf_event *event, u64 file_offset,
> +				       const char *file_path)
>  {
>  	struct evlist *evlist = session->evlist;
>  	struct perf_tool *tool = session->tool;
> @@ -1735,7 +1745,7 @@ static s64 perf_session__process_event(struct perf_session *session,
>  	events_stats__inc(&evlist->stats, event->header.type);
>  
>  	if (event->header.type >= PERF_RECORD_USER_TYPE_START)
> -		return perf_session__process_user_event(session, event, file_offset);
> +		return perf_session__process_user_event(session, event, file_offset, file_path);
>  
>  	if (tool->ordered_events) {
>  		u64 timestamp = -1ULL;
> @@ -1749,7 +1759,7 @@ static s64 perf_session__process_event(struct perf_session *session,
>  			return ret;
>  	}
>  
> -	return perf_session__deliver_event(session, event, tool, file_offset);
> +	return perf_session__deliver_event(session, event, tool, file_offset, file_path);
>  }
>  
>  void perf_event_header__bswap(struct perf_event_header *hdr)
> @@ -1987,7 +1997,8 @@ static int __perf_session__process_pipe_events(struct perf_session *session)
>  		}
>  	}
>  
> -	if ((skip = perf_session__process_event(session, event, head)) < 0) {
> +	skip = perf_session__process_event(session, event, head, "pipe");
> +	if (skip < 0) {
>  		pr_err("%#" PRIx64 " [%#x]: failed to process type: %d\n",
>  		       head, event->header.size, event->header.type);
>  		err = -EINVAL;
> @@ -2068,7 +2079,7 @@ fetch_decomp_event(u64 head, size_t mmap_size, char *buf, bool needs_swap)
>  static int __perf_session__process_decomp_events(struct perf_session *session)
>  {
>  	s64 skip;
> -	u64 size, file_pos = 0;
> +	u64 size;
>  	struct decomp *decomp = session->decomp_last;
>  
>  	if (!decomp)
> @@ -2082,9 +2093,9 @@ static int __perf_session__process_decomp_events(struct perf_session *session)
>  			break;
>  
>  		size = event->header.size;
> -
> -		if (size < sizeof(struct perf_event_header) ||
> -		    (skip = perf_session__process_event(session, event, file_pos)) < 0) {
> +		skip = perf_session__process_event(session, event, decomp->file_pos,
> +						   decomp->file_path);
> +		if (size < sizeof(struct perf_event_header) || skip < 0) {
>  			pr_err("%#" PRIx64 " [%#x]: failed to process type: %d\n",
>  				decomp->file_pos + decomp->head, event->header.size, event->header.type);
>  			return -EINVAL;
> @@ -2115,7 +2126,8 @@ struct reader;
>  
>  typedef s64 (*reader_cb_t)(struct perf_session *session,
>  			   union perf_event *event,
> -			   u64 file_offset);
> +			   u64 file_offset,
> +			   const char *file_path);
>  
>  struct reader {
>  	int		 fd;
> @@ -2198,9 +2210,9 @@ reader__process_events(struct reader *rd, struct perf_session *session,
>  	skip = -EINVAL;
>  
>  	if (size < sizeof(struct perf_event_header) ||
> -	    (skip = rd->process(session, event, file_pos)) < 0) {
> -		pr_err("%#" PRIx64 " [%#x]: failed to process type: %d [%s]\n",
> -		       file_offset + head, event->header.size,
> +	    (skip = rd->process(session, event, file_pos, rd->path)) < 0) {
> +		pr_err("%#" PRIx64 " [%s] [%#x]: failed to process type: %d [%s]\n",
> +		       file_offset + head, rd->path, event->header.size,
>  		       event->header.type, strerror(-skip));
>  		err = skip;
>  		goto out;
> @@ -2230,9 +2242,10 @@ reader__process_events(struct reader *rd, struct perf_session *session,
>  
>  static s64 process_simple(struct perf_session *session,
>  			  union perf_event *event,
> -			  u64 file_offset)
> +			  u64 file_offset,
> +			  const char *file_path)
>  {
> -	return perf_session__process_event(session, event, file_offset);
> +	return perf_session__process_event(session, event, file_offset, file_path);
>  }
>  
>  static int __perf_session__process_events(struct perf_session *session)
> diff --git a/tools/perf/util/tool.h b/tools/perf/util/tool.h
> index bbbc0dcd461f..c966531d3eca 100644
> --- a/tools/perf/util/tool.h
> +++ b/tools/perf/util/tool.h
> @@ -28,7 +28,8 @@ typedef int (*event_attr_op)(struct perf_tool *tool,
>  
>  typedef int (*event_op2)(struct perf_session *session, union perf_event *event);
>  typedef s64 (*event_op3)(struct perf_session *session, union perf_event *event);
> -typedef int (*event_op4)(struct perf_session *session, union perf_event *event, u64 data);
> +typedef int (*event_op4)(struct perf_session *session, union perf_event *event, u64 data,
> +			 const char *str);
>  
>  typedef int (*event_oe)(struct perf_tool *tool, union perf_event *event,
>  			struct ordered_events *oe);
> -- 
> 2.24.1
> 


^ permalink raw reply	[flat|nested] 45+ messages in thread

* Re: [PATCH v1 02/15] perf report: output trace file name in raw trace dump
  2020-10-13 19:54   ` Jiri Olsa
@ 2020-10-13 21:23     ` Alexey Budankov
  0 siblings, 0 replies; 45+ messages in thread
From: Alexey Budankov @ 2020-10-13 21:23 UTC (permalink / raw)
  To: Jiri Olsa
  Cc: Arnaldo Carvalho de Melo, Namhyung Kim, Alexander Shishkin,
	Andi Kleen, Adrian Hunter, Peter Zijlstra, Ingo Molnar,
	linux-kernel

Hi,

On 13.10.2020 22:54, Jiri Olsa wrote:
> On Mon, Oct 12, 2020 at 11:54:24AM +0300, Alexey Budankov wrote:
>>
>> Output path of a trace file into raw dump (-D) <file_offset>@<path/file>.
>> Print offset of PERF_RECORD_COMPRESSED record instead of zero for
>> decompressed records:
>>   0x2226a@perf.data [0x30]: event: 9
>> or
>>   0x15cc36@perf.data/data.7 [0x30]: event: 9
>>
>> Signed-off-by: Alexey Budankov <alexey.budankov@linux.intel.com>
> 
> hi,
> I'm getting:
> 
>   CC       builtin-inject.o
> builtin-inject.c: In function ‘cmd_inject’:
> builtin-inject.c:850:18: error: initialization of ‘int (*)(struct perf_session *, union perf_event *, u64,  const char *)’ {aka ‘int (*)(struct perf_session *, union perf_event *, long unsigned int,  const char *)’} from incompatible pointer type ‘int (*)(struct perf_session *, union perf_event *, u64)’ {aka ‘int (*)(struct perf_session *, union perf_event *, long unsigned int)’} [-Werror=incompatible-pointer-types]
>   850 |    .compressed = perf_event__repipe_op4_synth,
>       |                  ^~~~~~~~~~~~~~~~~~~~~~~~~~~~
> builtin-inject.c:850:18: note: (near initialization for ‘inject.tool.compressed’)
> 
> it's probably recent build id changes 

Looks like that's it. Fix is in v2 and follows:

diff --git a/tools/perf/builtin-inject.c b/tools/perf/builtin-inject.c
index f3f965157d69..35c005b8da7f 100644
--- a/tools/perf/builtin-inject.c
+++ b/tools/perf/builtin-inject.c
@@ -106,7 +106,8 @@ static int perf_event__repipe_op2_synth(struct perf_session *session,
 
 static int perf_event__repipe_op4_synth(struct perf_session *session,
 					union perf_event *event,
-					u64 data __maybe_unused)
+					u64 data __maybe_unused,
+					const char *str __maybe_unused)
 {
 	return perf_event__repipe_synth(session->tool, event);
 }

Thanks!
Alexei

^ permalink raw reply related	[flat|nested] 45+ messages in thread

* Re: [PATCH v1 03/15] perf data: open data directory in read access mode
  2020-10-13 16:22       ` Arnaldo Carvalho de Melo
@ 2020-10-14 10:39         ` Alexey Budankov
  0 siblings, 0 replies; 45+ messages in thread
From: Alexey Budankov @ 2020-10-14 10:39 UTC (permalink / raw)
  To: Arnaldo Carvalho de Melo
  Cc: Andi Kleen, Jiri Olsa, Namhyung Kim, Alexander Shishkin,
	Adrian Hunter, Peter Zijlstra, Ingo Molnar, linux-kernel


On 13.10.2020 19:22, Arnaldo Carvalho de Melo wrote:
> Em Mon, Oct 12, 2020 at 07:52:33PM +0300, Alexey Budankov escreveu:
>>
>> On 12.10.2020 19:03, Andi Kleen wrote:
>>> On Mon, Oct 12, 2020 at 11:55:07AM +0300, Alexey Budankov wrote:
>>>>
>>>> Open files located at data directory in case of read access mode.
>>>
>>> Need some rationale. Is this a bug fix? Did the case not matter
>>> before?
>>
>> This is not a bug fix. The case didn't matter before.
> 
> In such cases can you please write a paragraph explaining why it didn't
> matter before and now it is required?

Sure. Will do. Already in v2.

Alexei

^ permalink raw reply	[flat|nested] 45+ messages in thread

* Re: [PATCH v1 08/15] perf record: write trace data into mmap trace files
  2020-10-12  9:01 ` [PATCH v1 08/15] perf record: write trace data into mmap trace files Alexey Budankov
@ 2020-10-14 10:52   ` Namhyung Kim
  2020-10-14 12:09     ` Alexey Budankov
  0 siblings, 1 reply; 45+ messages in thread
From: Namhyung Kim @ 2020-10-14 10:52 UTC (permalink / raw)
  To: Alexey Budankov
  Cc: Arnaldo Carvalho de Melo, Jiri Olsa, Alexander Shishkin,
	Andi Kleen, Adrian Hunter, Peter Zijlstra, Ingo Molnar,
	linux-kernel

Hi,

On Mon, Oct 12, 2020 at 6:01 PM Alexey Budankov
<alexey.budankov@linux.intel.com> wrote:
>
>
> Write trace data into per mmap trace files located
> at data directory. Streaming thread adjusts its affinity
> according to mask of the buffer being processed.
>
> Signed-off-by: Alexey Budankov <alexey.budankov@linux.intel.com>
> ---
[SNIP]
> @@ -1184,8 +1203,12 @@ static int record__mmap_read_evlist(struct record *rec, struct evlist *evlist,
>         /*
>          * Mark the round finished in case we wrote
>          * at least one event.
> +        *
> +        * No need for round events in directory mode,
> +        * because per-cpu maps and files have data
> +        * sorted by kernel.
>          */
> -       if (bytes_written != rec->bytes_written)
> +       if (!record__threads_enabled(rec) && bytes_written != rec->bytes_written)
>                 rc = record__write(rec, NULL, &finished_round_event, sizeof(finished_round_event));

This means it needs to keep all events in the ordered events queue
when perf report processes the data, right?

Thanks
Namhyung

^ permalink raw reply	[flat|nested] 45+ messages in thread

* Re: [PATCH v1 08/15] perf record: write trace data into mmap trace files
  2020-10-14 10:52   ` Namhyung Kim
@ 2020-10-14 12:09     ` Alexey Budankov
  2020-10-21  7:34       ` Namhyung Kim
  0 siblings, 1 reply; 45+ messages in thread
From: Alexey Budankov @ 2020-10-14 12:09 UTC (permalink / raw)
  To: Namhyung Kim
  Cc: Arnaldo Carvalho de Melo, Jiri Olsa, Alexander Shishkin,
	Andi Kleen, Adrian Hunter, Peter Zijlstra, Ingo Molnar,
	linux-kernel

Hi,

On 14.10.2020 13:52, Namhyung Kim wrote:
> Hi,
> 
> On Mon, Oct 12, 2020 at 6:01 PM Alexey Budankov
> <alexey.budankov@linux.intel.com> wrote:
>>
>>
>> Write trace data into per mmap trace files located
>> at data directory. Streaming thread adjusts its affinity
>> according to mask of the buffer being processed.
>>
>> Signed-off-by: Alexey Budankov <alexey.budankov@linux.intel.com>
>> ---
> [SNIP]
>> @@ -1184,8 +1203,12 @@ static int record__mmap_read_evlist(struct record *rec, struct evlist *evlist,
>>         /*
>>          * Mark the round finished in case we wrote
>>          * at least one event.
>> +        *
>> +        * No need for round events in directory mode,
>> +        * because per-cpu maps and files have data
>> +        * sorted by kernel.
>>          */
>> -       if (bytes_written != rec->bytes_written)
>> +       if (!record__threads_enabled(rec) && bytes_written != rec->bytes_written)
>>                 rc = record__write(rec, NULL, &finished_round_event, sizeof(finished_round_event));
> 
> This means it needs to keep all events in the ordered events queue
> when perf report processes the data, right?

Looks so.

Alexei

^ permalink raw reply	[flat|nested] 45+ messages in thread

* Re: [PATCH v1 00/15] Introduce threaded trace streaming for basic perf record operation
  2020-10-13 16:20 ` [PATCH v1 00/15] Introduce threaded trace streaming for basic perf record operation Arnaldo Carvalho de Melo
@ 2020-10-14 12:15   ` Alexey Budankov
  0 siblings, 0 replies; 45+ messages in thread
From: Alexey Budankov @ 2020-10-14 12:15 UTC (permalink / raw)
  To: Arnaldo Carvalho de Melo
  Cc: Jiri Olsa, Namhyung Kim, Alexander Shishkin, Andi Kleen,
	Adrian Hunter, Peter Zijlstra, Ingo Molnar, linux-kernel


On 13.10.2020 19:20, Arnaldo Carvalho de Melo wrote:
> Em Mon, Oct 12, 2020 at 11:50:29AM +0300, Alexey Budankov escreveu:
>>
>> Patch set provides threaded trace streaming for base perf record
>> operation. Provided streaming mode (--threads) mitigates profiling
>> data losses and resolves scalability issues of serial and asynchronous
>> (--aio) trace streaming modes on multicore server systems. The patch
>> set is based on the prototype [1], [2] and the most closely relates
>> to mode 3) "mode that creates thread for every monitored memory map".
>>
>> The threaded mode executes one-to-one mapping of trace streaming threads
>> to mapped data buffers and streaming into per-CPU trace files located
>> at data directory. The data buffers and threads are affined to NUMA
>> nodes and monitored CPUs according to system topology. --cpu option
>> can be used to specify exact CPUs to be monitored.
>>
>> Basic analysis of data directories is provided for perf report mode.
>> Raw dump (-D) and aggregated reports are available for data directories,
>> still with no memory consumption optimizations. However data directories
>> collected with --compression-level option enabled can be analyzed with
>> little less memory because trace files are unmaped from tool process
>> memory after loading collected data.
>>
>> Provided streaming mode is available with Zstd compression/decompression
>> (--compression-level) and handling of external commands (--control).
>> AUX area tracing, related and derived modes like --snapshot or
>> --aux-sample are not enabled. --switch-output, --switch-output-event, 
>> --switch-max-files and --timestamp-filename options are not enabled.
> 
> Would be interesting to spell out what are the difficulties to have
> those options working with this threaded mode, as I expect that once
> this is all reviewed and tested we should switch to it by default, no?

At the moment I am not sure about this as the default mode. It all depends
on specifics of HW configuration and workload to be monitored and analyzed.
On middle and small sized systems --aio could still fit better from HW/OS
resource consumption perspective.

Initial intent to enable AUX area tracing faced the need to define some
(optimal?) way to store index data at data directory, thus left aside of
this first step to bring threaded trace streaming into Perf tool.
--switch-output-* and --timestamp-filename use cases are not yet clear
for data directories and thus look like features of the second order.

Addressing all that issues in a single patch set looks too much. Proper
way to have it all in is step-by-step. Also I should say that it is aside
of the scope of current Intel VTune specific needs.

Alexei

^ permalink raw reply	[flat|nested] 45+ messages in thread

* Re: [PATCH v1 00/15] Introduce threaded trace streaming for basic perf record operation
  2020-10-12  8:50 [PATCH v1 00/15] Introduce threaded trace streaming for basic perf record operation Alexey Budankov
                   ` (15 preceding siblings ...)
  2020-10-13 16:20 ` [PATCH v1 00/15] Introduce threaded trace streaming for basic perf record operation Arnaldo Carvalho de Melo
@ 2020-10-14 17:27 ` Ingo Molnar
  2020-10-15 10:35   ` Alexey Budankov
  16 siblings, 1 reply; 45+ messages in thread
From: Ingo Molnar @ 2020-10-14 17:27 UTC (permalink / raw)
  To: Alexey Budankov
  Cc: Arnaldo Carvalho de Melo, Jiri Olsa, Namhyung Kim,
	Alexander Shishkin, Andi Kleen, Adrian Hunter, Peter Zijlstra,
	Ingo Molnar, linux-kernel


* Alexey Budankov <alexey.budankov@linux.intel.com> wrote:

> 
> Patch set provides threaded trace streaming for base perf record
> operation. Provided streaming mode (--threads) mitigates profiling
> data losses and resolves scalability issues of serial and asynchronous
> (--aio) trace streaming modes on multicore server systems. The patch
> set is based on the prototype [1], [2] and the most closely relates
> to mode 3) "mode that creates thread for every monitored memory map".
> 
> The threaded mode executes one-to-one mapping of trace streaming threads
> to mapped data buffers and streaming into per-CPU trace files located
> at data directory. The data buffers and threads are affined to NUMA
> nodes and monitored CPUs according to system topology. --cpu option
> can be used to specify exact CPUs to be monitored.

Yay! This should really be the default trace capture model everywhere 
possible.

Can we do this for perf top too? It's really struggling with lots of cores.

If on a 64-core system I run just a moderately higher frequency 'perf top' 
of 1 kHz:

  perf top -e cycles -F 1000

perf stays stuck forever in 'Collecting samples...', and I also get a lot 
of:

  [548112.871089] Uhhuh. NMI received for unknown reason 31 on CPU 25.
  [548112.871089] Do you have a strange power saving mode enabled?

Thanks,

	Ingo

^ permalink raw reply	[flat|nested] 45+ messages in thread

* Re: [PATCH v1 00/15] Introduce threaded trace streaming for basic perf record operation
  2020-10-14 17:27 ` Ingo Molnar
@ 2020-10-15 10:35   ` Alexey Budankov
  0 siblings, 0 replies; 45+ messages in thread
From: Alexey Budankov @ 2020-10-15 10:35 UTC (permalink / raw)
  To: Ingo Molnar
  Cc: Arnaldo Carvalho de Melo, Jiri Olsa, Namhyung Kim,
	Alexander Shishkin, Andi Kleen, Adrian Hunter, Peter Zijlstra,
	Ingo Molnar, linux-kernel


On 14.10.2020 20:27, Ingo Molnar wrote:
> 
> * Alexey Budankov <alexey.budankov@linux.intel.com> wrote:
> 
>>
>> Patch set provides threaded trace streaming for base perf record
>> operation. Provided streaming mode (--threads) mitigates profiling
>> data losses and resolves scalability issues of serial and asynchronous
>> (--aio) trace streaming modes on multicore server systems. The patch
>> set is based on the prototype [1], [2] and the most closely relates
>> to mode 3) "mode that creates thread for every monitored memory map".
>>
>> The threaded mode executes one-to-one mapping of trace streaming threads
>> to mapped data buffers and streaming into per-CPU trace files located
>> at data directory. The data buffers and threads are affined to NUMA
>> nodes and monitored CPUs according to system topology. --cpu option
>> can be used to specify exact CPUs to be monitored.
> 
> Yay! This should really be the default trace capture model everywhere 
> possible.
> 
> Can we do this for perf top too? It's really struggling with lots of cores.
> 
> If on a 64-core system I run just a moderately higher frequency 'perf top' 
> of 1 kHz:
> 
>   perf top -e cycles -F 1000
> 
> perf stays stuck forever in 'Collecting samples...', and I also get a lot 
> of:
> 
>   [548112.871089] Uhhuh. NMI received for unknown reason 31 on CPU 25.
>   [548112.871089] Do you have a strange power saving mode enabled?

Yes, we can. I would only prefer to do it in a separate patch set since
for me this patch set is already complex enough as a single change.
Is it ok?

I would also appreciate if you could clarify, advise or guide on the impact
of this perf top advancement or may be even provide some feedback on this feature adoption to help better justify the effort for my management. 

Gratefully,
Alexei

> 
> Thanks,
> 
> 	Ingo
> 

^ permalink raw reply	[flat|nested] 45+ messages in thread

* Re: [PATCH v1 02/15] perf report: output trace file name in raw trace dump
  2020-10-12 17:06     ` Alexey Budankov
@ 2020-10-20 16:31       ` Alexey Budankov
  2020-10-20 17:10         ` Alexey Budankov
  0 siblings, 1 reply; 45+ messages in thread
From: Alexey Budankov @ 2020-10-20 16:31 UTC (permalink / raw)
  To: Andi Kleen
  Cc: Arnaldo Carvalho de Melo, Jiri Olsa, Namhyung Kim,
	Alexander Shishkin, Adrian Hunter, Peter Zijlstra, Ingo Molnar,
	linux-kernel


On 12.10.2020 20:06, Alexey Budankov wrote:
> 
> On 12.10.2020 19:01, Andi Kleen wrote:
>> On Mon, Oct 12, 2020 at 11:54:24AM +0300, Alexey Budankov wrote:
>>> @@ -180,7 +183,8 @@ static int ordered_events__deliver_event(struct ordered_events *oe,
>>>  						    ordered_events);
>>>  
>>>  	return perf_session__deliver_event(session, event->event,
>>> -					   session->tool, event->file_offset);
>>> +					   session->tool, event->file_offset,
>>> +					   event->file_path);
>>
>> Wouldn't it be better to pass "event" around now, which would contain at least
>> four of the arguments.
>>
>> These functions are getting entirely too many arguments.
> 
> Well, either approach is possible, and even shrink of two arguments kept at session object.
> However changing function signature more than posted can cause bigger adjustments all over
> the code. So this needs more evaluation prior implementation. 

After brief evaluation it still doesn't look easy. The simplest thing
I could imagine is to probably combine file_path and file_offset at a
struct object on stack and then pass the object by the reference along
function calls. I expect it will roughly cause the same amount of changes
in the code and saves one argument (GP register). It is not that much
but still. However I don't see issues with passing even 6 args on stack.

Alexei

^ permalink raw reply	[flat|nested] 45+ messages in thread

* Re: [PATCH v1 02/15] perf report: output trace file name in raw trace dump
  2020-10-20 16:31       ` Alexey Budankov
@ 2020-10-20 17:10         ` Alexey Budankov
  2020-10-20 20:29           ` Andi Kleen
  0 siblings, 1 reply; 45+ messages in thread
From: Alexey Budankov @ 2020-10-20 17:10 UTC (permalink / raw)
  To: Andi Kleen
  Cc: Arnaldo Carvalho de Melo, Jiri Olsa, Namhyung Kim,
	Alexander Shishkin, Adrian Hunter, Peter Zijlstra, Ingo Molnar,
	linux-kernel


On 20.10.2020 19:31, Alexey Budankov wrote:
> 
> On 12.10.2020 20:06, Alexey Budankov wrote:
>>
>> On 12.10.2020 19:01, Andi Kleen wrote:
>>> On Mon, Oct 12, 2020 at 11:54:24AM +0300, Alexey Budankov wrote:
>>>> @@ -180,7 +183,8 @@ static int ordered_events__deliver_event(struct ordered_events *oe,
>>>>  						    ordered_events);
>>>>  
>>>>  	return perf_session__deliver_event(session, event->event,
>>>> -					   session->tool, event->file_offset);
>>>> +					   session->tool, event->file_offset,
>>>> +					   event->file_path);
>>>
>>> Wouldn't it be better to pass "event" around now, which would contain at least
>>> four of the arguments.
>>>
>>> These functions are getting entirely too many arguments.
>>
>> Well, either approach is possible, and even shrink of two arguments kept at session object.
>> However changing function signature more than posted can cause bigger adjustments all over
>> the code. So this needs more evaluation prior implementation. 
> 
> After brief evaluation it still doesn't look easy. The simplest thing
> I could imagine is to probably combine file_path and file_offset at a
> struct object on stack and then pass the object by the reference along
> function calls. I expect it will roughly cause the same amount of changes
> in the code and saves one argument (GP register). It is not that much
> but still. However I don't see issues with passing even 6 args on stack.

Sorry - "passing 6 args to a function call"


^ permalink raw reply	[flat|nested] 45+ messages in thread

* Re: [PATCH v1 02/15] perf report: output trace file name in raw trace dump
  2020-10-20 17:10         ` Alexey Budankov
@ 2020-10-20 20:29           ` Andi Kleen
  0 siblings, 0 replies; 45+ messages in thread
From: Andi Kleen @ 2020-10-20 20:29 UTC (permalink / raw)
  To: Alexey Budankov
  Cc: Arnaldo Carvalho de Melo, Jiri Olsa, Namhyung Kim,
	Alexander Shishkin, Adrian Hunter, Peter Zijlstra, Ingo Molnar,
	linux-kernel

> > After brief evaluation it still doesn't look easy. The simplest thing
> > I could imagine is to probably combine file_path and file_offset at a
> > struct object on stack and then pass the object by the reference along
> > function calls. I expect it will roughly cause the same amount of changes
> > in the code and saves one argument (GP register). It is not that much
> > but still. However I don't see issues with passing even 6 args on stack.
> 
> Sorry - "passing 6 args to a function call"

Ah it wasn't about code efficiency, just maintainability. Function calls
with too many arguments are generally hard to maintain, and typically
at some point have to be refactored anyways because they tend to grow
even more over time.

But if it's too difficult, ok.

-Andi

^ permalink raw reply	[flat|nested] 45+ messages in thread

* Re: [PATCH v1 06/15] perf session: load data directory into tool process memory
  2020-10-13 11:30       ` Alexey Budankov
@ 2020-10-21  6:54         ` Namhyung Kim
  2020-10-21 10:25           ` Alexey Budankov
  0 siblings, 1 reply; 45+ messages in thread
From: Namhyung Kim @ 2020-10-21  6:54 UTC (permalink / raw)
  To: Alexey Budankov
  Cc: Andi Kleen, Arnaldo Carvalho de Melo, Jiri Olsa,
	Alexander Shishkin, Adrian Hunter, Peter Zijlstra, Ingo Molnar,
	linux-kernel

Hi,

On Tue, Oct 13, 2020 at 8:30 PM Alexey Budankov
<alexey.budankov@linux.intel.com> wrote:
> On 12.10.2020 19:49, Alexey Budankov wrote:
> > On 12.10.2020 19:09, Andi Kleen wrote:
> >> On Mon, Oct 12, 2020 at 11:58:58AM +0300, Alexey Budankov wrote:
> >>> diff --git a/tools/perf/util/session.c b/tools/perf/util/session.c
> >>> index 6afc670fdf0c..0752eec19813 100644
> >>> --- a/tools/perf/util/session.c
> >>> +++ b/tools/perf/util/session.c
> >>> @@ -2212,6 +2212,17 @@ reader__process_events(struct reader *rd, struct perf_session *session,
> >>>             goto more;
> >>>
> >>>  out:
> >>> +   if (rd->unmap_file) {
> >>> +           int i;
> >>> +
> >>> +           for (i = 0; i < NUM_MMAPS; i++) {
> >>> +                   if (mmaps[i]) {
> >>> +                           munmap(mmaps[i], mmap_size);
> >>> +                           mmaps[i] = NULL;
> >>
> >> Okay so where is the mmap? Would make more sense to put that
> >> into the same patch as who adds the mmap. Or is the mmap
> >> code already in the perf source? In that case it should
> >> probably be some common helper with the existing users.
> >
> > That mmap is already in the code. Agree, this part of the patch
> > can be applied prior the whole patch set.
>
> I take it back. Single trace file can't be unmapped yet since it also
> contains not only compressed records but also other records backing
> the data for aggregated analysis.

Are you talking about the auxtrace?

I thought everything will be compressed when it's enabled.
But if it's only for the auxtrace, maybe we can unmap them
with checking it as it's not the common case?

Thanks
Namhyung

^ permalink raw reply	[flat|nested] 45+ messages in thread

* Re: [PATCH v1 08/15] perf record: write trace data into mmap trace files
  2020-10-14 12:09     ` Alexey Budankov
@ 2020-10-21  7:34       ` Namhyung Kim
  2020-10-21 10:24         ` Alexey Budankov
  0 siblings, 1 reply; 45+ messages in thread
From: Namhyung Kim @ 2020-10-21  7:34 UTC (permalink / raw)
  To: Alexey Budankov
  Cc: Arnaldo Carvalho de Melo, Jiri Olsa, Alexander Shishkin,
	Andi Kleen, Adrian Hunter, Peter Zijlstra, Ingo Molnar,
	linux-kernel

On Wed, Oct 14, 2020 at 9:09 PM Alexey Budankov
<alexey.budankov@linux.intel.com> wrote:
>
> Hi,
>
> On 14.10.2020 13:52, Namhyung Kim wrote:
> > Hi,
> >
> > On Mon, Oct 12, 2020 at 6:01 PM Alexey Budankov
> > <alexey.budankov@linux.intel.com> wrote:
> >>
> >>
> >> Write trace data into per mmap trace files located
> >> at data directory. Streaming thread adjusts its affinity
> >> according to mask of the buffer being processed.
> >>
> >> Signed-off-by: Alexey Budankov <alexey.budankov@linux.intel.com>
> >> ---
> > [SNIP]
> >> @@ -1184,8 +1203,12 @@ static int record__mmap_read_evlist(struct record *rec, struct evlist *evlist,
> >>         /*
> >>          * Mark the round finished in case we wrote
> >>          * at least one event.
> >> +        *
> >> +        * No need for round events in directory mode,
> >> +        * because per-cpu maps and files have data
> >> +        * sorted by kernel.
> >>          */
> >> -       if (bytes_written != rec->bytes_written)
> >> +       if (!record__threads_enabled(rec) && bytes_written != rec->bytes_written)
> >>                 rc = record__write(rec, NULL, &finished_round_event, sizeof(finished_round_event));
> >
> > This means it needs to keep all events in the ordered events queue
> > when perf report processes the data, right?
>
> Looks so.

Maybe it's not related to this directly. But we need to think about
how to make perf report faster and more efficient as well.

In my previous attempt, I separated samples from other events
to be in different mmaps so they were saved to different files
(or in a separate part of the data file).

And perf report processes the meta events (FORK/MMAP/...)
first to construct the system image and then processes samples
with multi-threads.

Once it has the image, it could bypass the ordered events queue
entirely.

Thanks

Namhyung

^ permalink raw reply	[flat|nested] 45+ messages in thread

* Re: [PATCH v1 08/15] perf record: write trace data into mmap trace files
  2020-10-21  7:34       ` Namhyung Kim
@ 2020-10-21 10:24         ` Alexey Budankov
  2020-10-21 10:51           ` Namhyung Kim
  0 siblings, 1 reply; 45+ messages in thread
From: Alexey Budankov @ 2020-10-21 10:24 UTC (permalink / raw)
  To: Namhyung Kim
  Cc: Arnaldo Carvalho de Melo, Jiri Olsa, Alexander Shishkin,
	Andi Kleen, Adrian Hunter, Peter Zijlstra, Ingo Molnar,
	linux-kernel


On 21.10.2020 10:34, Namhyung Kim wrote:
> On Wed, Oct 14, 2020 at 9:09 PM Alexey Budankov
> <alexey.budankov@linux.intel.com> wrote:
>>
>> Hi,
>>
>> On 14.10.2020 13:52, Namhyung Kim wrote:
>>> Hi,
>>>
>>> On Mon, Oct 12, 2020 at 6:01 PM Alexey Budankov
>>> <alexey.budankov@linux.intel.com> wrote:
>>>>
>>>>
>>>> Write trace data into per mmap trace files located
>>>> at data directory. Streaming thread adjusts its affinity
>>>> according to mask of the buffer being processed.
>>>>
>>>> Signed-off-by: Alexey Budankov <alexey.budankov@linux.intel.com>
>>>> ---
>>> [SNIP]
>>>> @@ -1184,8 +1203,12 @@ static int record__mmap_read_evlist(struct record *rec, struct evlist *evlist,
>>>>         /*
>>>>          * Mark the round finished in case we wrote
>>>>          * at least one event.
>>>> +        *
>>>> +        * No need for round events in directory mode,
>>>> +        * because per-cpu maps and files have data
>>>> +        * sorted by kernel.
>>>>          */
>>>> -       if (bytes_written != rec->bytes_written)
>>>> +       if (!record__threads_enabled(rec) && bytes_written != rec->bytes_written)
>>>>                 rc = record__write(rec, NULL, &finished_round_event, sizeof(finished_round_event));
>>>
>>> This means it needs to keep all events in the ordered events queue
>>> when perf report processes the data, right?
>>
>> Looks so.
> 
> Maybe it's not related to this directly. But we need to think about
> how to make perf report faster and more efficient as well.

Makes sense. Agreed.

> 
> In my previous attempt, I separated samples from other events
> to be in different mmaps so they were saved to different files
> (or in a separate part of the data file).
> 
> And perf report processes the meta events (FORK/MMAP/...)
> first to construct the system image and then processes samples
> with multi-threads.

Looks like separation to global, per-process events and per-thread
ones. Alternative algorithm could possibly be multi-passing of trace
data. First pass is to capture global events and build process state
overtime progress picture. Second pass is to capture and map per-thread
samples and/or other events into process state according to samples
and events time.

> 
> Once it has the image, it could bypass the ordered events queue
> entirely.
> 
> Thanks
> 
> Namhyung
> 

Thanks,
Alexei

^ permalink raw reply	[flat|nested] 45+ messages in thread

* Re: [PATCH v1 06/15] perf session: load data directory into tool process memory
  2020-10-21  6:54         ` Namhyung Kim
@ 2020-10-21 10:25           ` Alexey Budankov
  2020-10-21 10:57             ` Namhyung Kim
  0 siblings, 1 reply; 45+ messages in thread
From: Alexey Budankov @ 2020-10-21 10:25 UTC (permalink / raw)
  To: Namhyung Kim
  Cc: Andi Kleen, Arnaldo Carvalho de Melo, Jiri Olsa,
	Alexander Shishkin, Adrian Hunter, Peter Zijlstra, Ingo Molnar,
	linux-kernel


On 21.10.2020 9:54, Namhyung Kim wrote:
> Hi,
> 
> On Tue, Oct 13, 2020 at 8:30 PM Alexey Budankov
> <alexey.budankov@linux.intel.com> wrote:
>> On 12.10.2020 19:49, Alexey Budankov wrote:
>>> On 12.10.2020 19:09, Andi Kleen wrote:
>>>> On Mon, Oct 12, 2020 at 11:58:58AM +0300, Alexey Budankov wrote:
>>>>> diff --git a/tools/perf/util/session.c b/tools/perf/util/session.c
>>>>> index 6afc670fdf0c..0752eec19813 100644
>>>>> --- a/tools/perf/util/session.c
>>>>> +++ b/tools/perf/util/session.c
>>>>> @@ -2212,6 +2212,17 @@ reader__process_events(struct reader *rd, struct perf_session *session,
>>>>>             goto more;
>>>>>
>>>>>  out:
>>>>> +   if (rd->unmap_file) {
>>>>> +           int i;
>>>>> +
>>>>> +           for (i = 0; i < NUM_MMAPS; i++) {
>>>>> +                   if (mmaps[i]) {
>>>>> +                           munmap(mmaps[i], mmap_size);
>>>>> +                           mmaps[i] = NULL;
>>>>
>>>> Okay so where is the mmap? Would make more sense to put that
>>>> into the same patch as who adds the mmap. Or is the mmap
>>>> code already in the perf source? In that case it should
>>>> probably be some common helper with the existing users.
>>>
>>> That mmap is already in the code. Agree, this part of the patch
>>> can be applied prior the whole patch set.
>>
>> I take it back. Single trace file can't be unmapped yet since it also
>> contains not only compressed records but also other records backing
>> the data for aggregated analysis.
> 
> Are you talking about the auxtrace?
> 
> I thought everything will be compressed when it's enabled.
> But if it's only for the auxtrace, maybe we can unmap them
> with checking it as it's not the common case?

I am about primary trace file. Together with PERF_RECORD_COMPRESSED
records it contains the other records with uncompressed data (MMAP,
FORk, COMM etc.) so it is still unsafe to unmap the trace file after
loading.

Alexei

^ permalink raw reply	[flat|nested] 45+ messages in thread

* Re: [PATCH v1 08/15] perf record: write trace data into mmap trace files
  2020-10-21 10:24         ` Alexey Budankov
@ 2020-10-21 10:51           ` Namhyung Kim
  0 siblings, 0 replies; 45+ messages in thread
From: Namhyung Kim @ 2020-10-21 10:51 UTC (permalink / raw)
  To: Alexey Budankov
  Cc: Arnaldo Carvalho de Melo, Jiri Olsa, Alexander Shishkin,
	Andi Kleen, Adrian Hunter, Peter Zijlstra, Ingo Molnar,
	linux-kernel

On Wed, Oct 21, 2020 at 7:25 PM Alexey Budankov
<alexey.budankov@linux.intel.com> wrote:
>
>
> On 21.10.2020 10:34, Namhyung Kim wrote:
> > On Wed, Oct 14, 2020 at 9:09 PM Alexey Budankov
> > <alexey.budankov@linux.intel.com> wrote:
> >>
> >> Hi,
> >>
> >> On 14.10.2020 13:52, Namhyung Kim wrote:
> >>> Hi,
> >>>
> >>> On Mon, Oct 12, 2020 at 6:01 PM Alexey Budankov
> >>> <alexey.budankov@linux.intel.com> wrote:
> >>>>
> >>>>
> >>>> Write trace data into per mmap trace files located
> >>>> at data directory. Streaming thread adjusts its affinity
> >>>> according to mask of the buffer being processed.
> >>>>
> >>>> Signed-off-by: Alexey Budankov <alexey.budankov@linux.intel.com>
> >>>> ---
> >>> [SNIP]
> >>>> @@ -1184,8 +1203,12 @@ static int record__mmap_read_evlist(struct record *rec, struct evlist *evlist,
> >>>>         /*
> >>>>          * Mark the round finished in case we wrote
> >>>>          * at least one event.
> >>>> +        *
> >>>> +        * No need for round events in directory mode,
> >>>> +        * because per-cpu maps and files have data
> >>>> +        * sorted by kernel.
> >>>>          */
> >>>> -       if (bytes_written != rec->bytes_written)
> >>>> +       if (!record__threads_enabled(rec) && bytes_written != rec->bytes_written)
> >>>>                 rc = record__write(rec, NULL, &finished_round_event, sizeof(finished_round_event));
> >>>
> >>> This means it needs to keep all events in the ordered events queue
> >>> when perf report processes the data, right?
> >>
> >> Looks so.
> >
> > Maybe it's not related to this directly. But we need to think about
> > how to make perf report faster and more efficient as well.
>
> Makes sense. Agreed.
>
> >
> > In my previous attempt, I separated samples from other events
> > to be in different mmaps so they were saved to different files
> > (or in a separate part of the data file).
> >
> > And perf report processes the meta events (FORK/MMAP/...)
> > first to construct the system image and then processes samples
> > with multi-threads.
>
> Looks like separation to global, per-process events and per-thread
> ones. Alternative algorithm could possibly be multi-passing of trace
> data. First pass is to capture global events and build process state
> overtime progress picture. Second pass is to capture and map per-thread
> samples and/or other events into process state according to samples
> and events time.

Yep, it seems basically the same approach.  But it'd be better if
we could do it in a single pass (with some modification in the data
collection).

Thanks
Namhyung

^ permalink raw reply	[flat|nested] 45+ messages in thread

* Re: [PATCH v1 06/15] perf session: load data directory into tool process memory
  2020-10-21 10:25           ` Alexey Budankov
@ 2020-10-21 10:57             ` Namhyung Kim
  2020-10-21 13:01               ` Alexey Budankov
  0 siblings, 1 reply; 45+ messages in thread
From: Namhyung Kim @ 2020-10-21 10:57 UTC (permalink / raw)
  To: Alexey Budankov
  Cc: Andi Kleen, Arnaldo Carvalho de Melo, Jiri Olsa,
	Alexander Shishkin, Adrian Hunter, Peter Zijlstra, Ingo Molnar,
	linux-kernel

On Wed, Oct 21, 2020 at 7:25 PM Alexey Budankov
<alexey.budankov@linux.intel.com> wrote:
>
>
> On 21.10.2020 9:54, Namhyung Kim wrote:
> > Hi,
> >
> > On Tue, Oct 13, 2020 at 8:30 PM Alexey Budankov
> > <alexey.budankov@linux.intel.com> wrote:
> >> On 12.10.2020 19:49, Alexey Budankov wrote:
> >>> On 12.10.2020 19:09, Andi Kleen wrote:
> >>>> On Mon, Oct 12, 2020 at 11:58:58AM +0300, Alexey Budankov wrote:
> >>>>> diff --git a/tools/perf/util/session.c b/tools/perf/util/session.c
> >>>>> index 6afc670fdf0c..0752eec19813 100644
> >>>>> --- a/tools/perf/util/session.c
> >>>>> +++ b/tools/perf/util/session.c
> >>>>> @@ -2212,6 +2212,17 @@ reader__process_events(struct reader *rd, struct perf_session *session,
> >>>>>             goto more;
> >>>>>
> >>>>>  out:
> >>>>> +   if (rd->unmap_file) {
> >>>>> +           int i;
> >>>>> +
> >>>>> +           for (i = 0; i < NUM_MMAPS; i++) {
> >>>>> +                   if (mmaps[i]) {
> >>>>> +                           munmap(mmaps[i], mmap_size);
> >>>>> +                           mmaps[i] = NULL;
> >>>>
> >>>> Okay so where is the mmap? Would make more sense to put that
> >>>> into the same patch as who adds the mmap. Or is the mmap
> >>>> code already in the perf source? In that case it should
> >>>> probably be some common helper with the existing users.
> >>>
> >>> That mmap is already in the code. Agree, this part of the patch
> >>> can be applied prior the whole patch set.
> >>
> >> I take it back. Single trace file can't be unmapped yet since it also
> >> contains not only compressed records but also other records backing
> >> the data for aggregated analysis.
> >
> > Are you talking about the auxtrace?
> >
> > I thought everything will be compressed when it's enabled.
> > But if it's only for the auxtrace, maybe we can unmap them
> > with checking it as it's not the common case?
>
> I am about primary trace file. Together with PERF_RECORD_COMPRESSED
> records it contains the other records with uncompressed data (MMAP,
> FORk, COMM etc.) so it is still unsafe to unmap the trace file after
> loading.

Oh, I think all the events coming from the kernel share the mmap
buffer so they will be compressed with SAMPLEs.  Note that
synthesized events have 0 timestamp which means that they
are not queued in the ordered events and processed immediately.

IIUC, here we unmap the original mmap buffer and the actual data
in MMAP/FORK/... is kept in the decomp data buffer, no?

Thanks
Namhyung

^ permalink raw reply	[flat|nested] 45+ messages in thread

* Re: [PATCH v1 06/15] perf session: load data directory into tool process memory
  2020-10-21 10:57             ` Namhyung Kim
@ 2020-10-21 13:01               ` Alexey Budankov
  0 siblings, 0 replies; 45+ messages in thread
From: Alexey Budankov @ 2020-10-21 13:01 UTC (permalink / raw)
  To: Namhyung Kim
  Cc: Andi Kleen, Arnaldo Carvalho de Melo, Jiri Olsa,
	Alexander Shishkin, Adrian Hunter, Peter Zijlstra, Ingo Molnar,
	linux-kernel


On 21.10.2020 13:57, Namhyung Kim wrote:
> On Wed, Oct 21, 2020 at 7:25 PM Alexey Budankov
> <alexey.budankov@linux.intel.com> wrote:
>>
>>
>> On 21.10.2020 9:54, Namhyung Kim wrote:
>>> Hi,
>>>
>>> On Tue, Oct 13, 2020 at 8:30 PM Alexey Budankov
>>> <alexey.budankov@linux.intel.com> wrote:
>>>> On 12.10.2020 19:49, Alexey Budankov wrote:
>>>>> On 12.10.2020 19:09, Andi Kleen wrote:
>>>>>> On Mon, Oct 12, 2020 at 11:58:58AM +0300, Alexey Budankov wrote:
>>>>>>> diff --git a/tools/perf/util/session.c b/tools/perf/util/session.c
>>>>>>> index 6afc670fdf0c..0752eec19813 100644
>>>>>>> --- a/tools/perf/util/session.c
>>>>>>> +++ b/tools/perf/util/session.c
>>>>>>> @@ -2212,6 +2212,17 @@ reader__process_events(struct reader *rd, struct perf_session *session,
>>>>>>>             goto more;
>>>>>>>
>>>>>>>  out:
>>>>>>> +   if (rd->unmap_file) {
>>>>>>> +           int i;
>>>>>>> +
>>>>>>> +           for (i = 0; i < NUM_MMAPS; i++) {
>>>>>>> +                   if (mmaps[i]) {
>>>>>>> +                           munmap(mmaps[i], mmap_size);
>>>>>>> +                           mmaps[i] = NULL;
>>>>>>
>>>>>> Okay so where is the mmap? Would make more sense to put that
>>>>>> into the same patch as who adds the mmap. Or is the mmap
>>>>>> code already in the perf source? In that case it should
>>>>>> probably be some common helper with the existing users.
>>>>>
>>>>> That mmap is already in the code. Agree, this part of the patch
>>>>> can be applied prior the whole patch set.
>>>>
>>>> I take it back. Single trace file can't be unmapped yet since it also
>>>> contains not only compressed records but also other records backing
>>>> the data for aggregated analysis.
>>>
>>> Are you talking about the auxtrace?
>>>
>>> I thought everything will be compressed when it's enabled.
>>> But if it's only for the auxtrace, maybe we can unmap them
>>> with checking it as it's not the common case?
>>
>> I am about primary trace file. Together with PERF_RECORD_COMPRESSED
>> records it contains the other records with uncompressed data (MMAP,
>> FORk, COMM etc.) so it is still unsafe to unmap the trace file after
>> loading.
> 
> Oh, I think all the events coming from the kernel share the mmap
> buffer so they will be compressed with SAMPLEs.  Note that
> synthesized events have 0 timestamp which means that they
> are not queued in the ordered events and processed immediately.

Yes, they are and it will work as expected with this patch changes.

> 
> IIUC, here we unmap the original mmap buffer and the actual data
> in MMAP/FORK/... is kept in the decomp data buffer, no?

Correct. It unmaps trace file from memory after COMPRESSED records
are decompressed into separately allocated decomp objects.

Alexei

> 
> Thanks
> Namhyung
> 

^ permalink raw reply	[flat|nested] 45+ messages in thread

end of thread, other threads:[~2020-10-21 13:02 UTC | newest]

Thread overview: 45+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2020-10-12  8:50 [PATCH v1 00/15] Introduce threaded trace streaming for basic perf record operation Alexey Budankov
2020-10-12  8:53 ` [PATCH v1 01/15] perf session: introduce trace file path to be shown in raw trace dump Alexey Budankov
2020-10-12  8:54 ` [PATCH v1 02/15] perf report: output trace file name " Alexey Budankov
2020-10-12 16:01   ` Andi Kleen
2020-10-12 17:06     ` Alexey Budankov
2020-10-20 16:31       ` Alexey Budankov
2020-10-20 17:10         ` Alexey Budankov
2020-10-20 20:29           ` Andi Kleen
2020-10-13 19:54   ` Jiri Olsa
2020-10-13 21:23     ` Alexey Budankov
2020-10-12  8:55 ` [PATCH v1 03/15] perf data: open data directory in read access mode Alexey Budankov
2020-10-12 16:03   ` Andi Kleen
2020-10-12 16:52     ` Alexey Budankov
2020-10-13 16:22       ` Arnaldo Carvalho de Melo
2020-10-14 10:39         ` Alexey Budankov
2020-10-12  8:56 ` [PATCH v1 04/15] perf session: move reader object definition to header file Alexey Budankov
2020-10-12  8:57 ` [PATCH v1 05/15] perf session: introduce decompressor into trace reader object Alexey Budankov
2020-10-12 16:05   ` Andi Kleen
2020-10-12 16:50     ` Alexey Budankov
2020-10-12  8:58 ` [PATCH v1 06/15] perf session: load data directory into tool process memory Alexey Budankov
2020-10-12 16:09   ` Andi Kleen
2020-10-12 16:49     ` Alexey Budankov
2020-10-13 11:30       ` Alexey Budankov
2020-10-21  6:54         ` Namhyung Kim
2020-10-21 10:25           ` Alexey Budankov
2020-10-21 10:57             ` Namhyung Kim
2020-10-21 13:01               ` Alexey Budankov
2020-10-12  8:59 ` [PATCH v1 07/15] perf record: introduce trace file, compressor and stats in mmap object Alexey Budankov
2020-10-12  9:01 ` [PATCH v1 08/15] perf record: write trace data into mmap trace files Alexey Budankov
2020-10-14 10:52   ` Namhyung Kim
2020-10-14 12:09     ` Alexey Budankov
2020-10-21  7:34       ` Namhyung Kim
2020-10-21 10:24         ` Alexey Budankov
2020-10-21 10:51           ` Namhyung Kim
2020-10-12  9:02 ` [PATCH v1 09/15] perf record: introduce thread specific objects for trace streaming Alexey Budankov
2020-10-12  9:03 ` [PATCH v1 10/15] perf record: manage thread specific data array Alexey Budankov
2020-10-12  9:05 ` [PATCH v1 11/15] perf evlist: introduce evlist__ctlfd_update() to update ctl fd status Alexey Budankov
2020-10-12  9:05 ` [PATCH v1 12/15] perf record: introduce thread local variable for trace streaming Alexey Budankov
2020-10-12  9:10 ` [PATCH v1 13/15] perf record: stop threads in the end of " Alexey Budankov
2020-10-12  9:11 ` [PATCH v1 14/15] perf record: start threads in the beginning " Alexey Budankov
2020-10-12  9:13 ` [PATCH v1 15/15] perf record: introduce --threads command line option Alexey Budankov
2020-10-13 16:20 ` [PATCH v1 00/15] Introduce threaded trace streaming for basic perf record operation Arnaldo Carvalho de Melo
2020-10-14 12:15   ` Alexey Budankov
2020-10-14 17:27 ` Ingo Molnar
2020-10-15 10:35   ` Alexey Budankov

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).