linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [PATCH v2 0/4] perf: enable compression of record mode trace to save storage space
@ 2019-02-11 20:17 Alexey Budankov
  2019-02-11 20:21 ` [PATCH v2 1/4] feature: realize libzstd check, LIBZSTD_DIR and NO_LIBZSTD defines Alexey Budankov
                   ` (4 more replies)
  0 siblings, 5 replies; 59+ messages in thread
From: Alexey Budankov @ 2019-02-11 20:17 UTC (permalink / raw)
  To: Arnaldo Carvalho de Melo, Ingo Molnar, Peter Zijlstra
  Cc: Jiri Olsa, Namhyung Kim, Alexander Shishkin, Andi Kleen, linux-kernel


This is the rebase to the tip of Arnaldo's perf/core repository.

The patch set implements runtime trace compression for record mode and 
trace file decompression for report mode. Zstandard API [1] is used for 
compression/decompression of data that come from perf_events kernel 
data buffers.

Realized -z,--compression_level=n option provides ~3-5x avg. trace file 
size reduction on variety of tested workloads what saves user storage 
space on larger server systems where trace file size can easily reach 
several tens or even hundreds of GiBs, especially when profiling with 
dwarf-based stacks and tracing of  context-switches.

--mmap-flush option can be used to avoid compressing every single byte 
of data and increase compression ratio at the same time lowering tool 
runtime overhead.

  $ tools/perf/perf record -z 1 -e cycles -- matrix.gcc
  $ tools/perf/perf record -z 1 --mmap-flush 1024 -e cycles -- matrix.gcc
  $ tools/perf/perf record -z 1 --mmap-flush 1024 --aio -e cycles -- matrix.gcc

The compression functionality can be disabled from the command line 
using NO_LIBZSTD define and Zstandard sources can be overridden using 
value of LIBZSTD_DIR define:

  $ make -C tools/perf NO_LIBZSTD=1 clean all
  $ make -C tools/perf LIBZSTD_DIR=/path/to/zstd-1.3.7 clean all

---
Alexey Budankov (4):
  feature: realize libzstd check, LIBZSTD_DIR and NO_LIBZSTD defines
  perf record: implement -z=<level> and --mmap-flush=<thres> options
  perf record: enable runtime trace compression
  perf report: support record trace file decompression

 tools/build/Makefile.feature             |   6 +-
 tools/build/feature/Makefile             |   6 +-
 tools/build/feature/test-all.c           |   5 +
 tools/build/feature/test-libzstd.c       |  12 +
 tools/perf/Documentation/perf-record.txt |   9 +
 tools/perf/Makefile.config               |  20 ++
 tools/perf/Makefile.perf                 |   3 +
 tools/perf/builtin-record.c              | 167 +++++++++++---
 tools/perf/builtin-report.c              |   5 +-
 tools/perf/perf.h                        |   2 +
 tools/perf/util/env.h                    |  10 +
 tools/perf/util/event.c                  |   1 +
 tools/perf/util/event.h                  |   7 +
 tools/perf/util/evlist.c                 |   6 +-
 tools/perf/util/evlist.h                 |   3 +-
 tools/perf/util/header.c                 |  45 +++-
 tools/perf/util/header.h                 |   1 +
 tools/perf/util/mmap.c                   | 265 +++++++++++++---------
 tools/perf/util/mmap.h                   |  31 ++-
 tools/perf/util/session.c                | 271 ++++++++++++++++++++++-
 tools/perf/util/session.h                |  26 +++
 tools/perf/util/tool.h                   |   2 +
 22 files changed, 742 insertions(+), 161 deletions(-)
 create mode 100644 tools/build/feature/test-libzstd.c

---
Changes in v2:
- moved compression/decompression code to session layer
- enabled allocation aio data buffers for compression
- enabled trace compression for serial trace streaming

---
[1] https://github.com/facebook/zstd


^ permalink raw reply	[flat|nested] 59+ messages in thread
* [PATCH v2 0/4] perf: enable compression of record mode trace to save storage space
@ 2019-01-28  7:02 Alexey Budankov
  2019-01-28  7:11 ` [PATCH v2 4/4] perf report: support record trace file decompression Alexey Budankov
  0 siblings, 1 reply; 59+ messages in thread
From: Alexey Budankov @ 2019-01-28  7:02 UTC (permalink / raw)
  To: Arnaldo Carvalho de Melo, Ingo Molnar, Peter Zijlstra
  Cc: Jiri Olsa, Namhyung Kim, Alexander Shishkin, Andi Kleen, linux-kernel


The patch set implements runtime trace compression for record mode and 
trace file decompression for report mode. Zstandard API [1] is used for 
compression/decompression of data that come from perf_events kernel 
data buffers.

Realized -z,--compression_level=n option provides ~3-5x avg. trace file 
size reduction on variety of tested workloads what saves user storage 
space on larger server systems where trace file size can easily reach 
several tens or even hundreds of GiBs, especially when profiling with 
stacks for later dwarf unwinding and context-switches tracing and etc.

  $ tools/perf/perf record -z 1 -e cycles -- matrix.gcc

--mmap-flush option can be used to avoid compressing every single byte 
of data and increase compression ratio at the same time lowering tool 
runtime overhead.

The compression functionality can be disabled from the command line 
using NO_LIBZSTD define and Zstandard sources can be overridden using 
value of LIBZSTD_DIR define:

  $ make -C tools/perf NO_LIBZSTD=1 clean all
  $ make -C tools/perf LIBZSTD_DIR=/path/to/zstd-1.3.7 clean all

The patch set is for Arnaldo's perf/core repository.

---
Alexey Budankov (4):
  feature: realize libzstd check, LIBZSTD_DIR and NO_LIBZSTD defines
  perf record: implement -z=<level> and --mmap-flush=<thres> options
  perf record: enable runtime trace compression
  perf report: support record trace file decompression

 tools/build/Makefile.feature             |   6 +-
 tools/build/feature/Makefile             |   6 +-
 tools/build/feature/test-all.c           |   5 +
 tools/build/feature/test-libzstd.c       |  12 +
 tools/perf/Documentation/perf-record.txt |   9 +
 tools/perf/Makefile.config               |  20 ++
 tools/perf/Makefile.perf                 |   3 +
 tools/perf/builtin-record.c              | 167 +++++++++++---
 tools/perf/builtin-report.c              |   5 +-
 tools/perf/perf.h                        |   2 +
 tools/perf/util/env.h                    |  10 +
 tools/perf/util/event.c                  |   1 +
 tools/perf/util/event.h                  |   7 +
 tools/perf/util/evlist.c                 |   6 +-
 tools/perf/util/evlist.h                 |   2 +-
 tools/perf/util/header.c                 |  45 +++-
 tools/perf/util/header.h                 |   1 +
 tools/perf/util/mmap.c                   | 173 ++++++++++-----
 tools/perf/util/mmap.h                   |  31 ++-
 tools/perf/util/session.c                | 271 ++++++++++++++++++++++-
 tools/perf/util/session.h                |  26 +++
 tools/perf/util/tool.h                   |   2 +
 22 files changed, 695 insertions(+), 115 deletions(-)
 create mode 100644 tools/build/feature/test-libzstd.c

---
Changes in v2:
- moved compression/decompression code to session layer
- enabled allocation aio data buffers for compression
- enabled trace compression for serial trace streaming

---
[1] https://github.com/facebook/zstd

---
Examples:

  $ make -C tools/perf NO_LIBZSTD=1 clean all
  $ make -C tools/perf LIBZSTD_DIR=/path/to/zstd-1.3.7 clean all

  $ tools/perf/perf record -z 1 -e cycles -- matrix.gcc
  Addr of buf1 = 0x7fc266d52010
  Offs of buf1 = 0x7fc266d52180
  Addr of buf2 = 0x7fc264d51010
  Offs of buf2 = 0x7fc264d511c0
  Addr of buf3 = 0x7fc262d50010
  Offs of buf3 = 0x7fc262d50100
  Addr of buf4 = 0x7fc260d4f010
  Offs of buf4 = 0x7fc260d4f140
  Threads #: 8 Pthreads
  Matrix size: 2048
  Using multiply kernel: multiply1
  Execution time = 31.471 seconds
  [ perf record: Woken up 120 times to write data ]
  [ perf record: Compressed 38.118 MB to 7.084 MB, ratio is 5.381 ]
  [ perf record: Captured and wrote 7.100 MB perf.data (999192 samples) ]

  $ tools/perf/perf report -D --header
  # ========
  # captured on    : Sat Jan 26 11:49:55 2019
  # header version : 1
  # data offset    : 296
  # data size      : 7444119
  # feat offset    : 7444415
  # hostname : nntvtune39
  # os release : 4.19.15-300.fc29.x86_64
  # perf version : 4.13.rc5.g3cfa299
  # arch : x86_64
  # nrcpus online : 8
  # nrcpus avail : 8
  # cpudesc : Intel(R) Core(TM) i7-6700K CPU @ 4.00GHz
  # cpuid : GenuineIntel,6,94,3
  # total memory : 16153184 kB
  # cmdline : /root/abudanko/kernel/acme/tools/perf/perf record -z 1 -e cycles -- ../../matrix/linux/matrix.gcc 
  # event : name = cycles, , id = { 2171, 2172, 2173, 2174, 2175, 2176, 2177, 2178 }, size = 112, { sample_period, sample_freq } = 4000, sample_type = IP|TID|TIME|PERIOD, read_format =>
  # CPU_TOPOLOGY info available, use -I to display
  # NUMA_TOPOLOGY info available, use -I to display
  # pmu mappings: intel_pt = 8, software = 1, power = 11, uprobe = 7, uncore_imc = 12, cpu = 4, cstate_core = 18, uncore_cbox_2 = 15, breakpoint = 5, uncore_cbox_0 = 13, tracepoint = 2>
  # CACHE info available, use -I to display
  # time of first sample : 230574.239204
  # time of last sample : 230605.735403
  # sample duration :  31496.200 ms
  # MEM_TOPOLOGY info available, use -I to display
  # compressed : Zstd, level = 1, ratio = 5
  # missing features: TRACING_DATA BRANCH_STACK GROUP_DESC AUXTRACE STAT CLOCKID 
  # ========
  #

  0x128 [0x20]: event: 79
  .
  . ... raw event: size 32 bytes
  .  0000:  4f 00 00 00 00 00 20 00 1f 00 00 00 00 00 00 00  O..... .........
  .  0010:  11 a6 ef 1f 00 00 00 00 e7 16 81 83 f5 ff ff ff  ................

  0 0x128 [0x20]: PERF_RECORD_TIME_CONV: unhandled!

  0x148 [0x50]: event: 1
  .
  . ... raw event: size 80 bytes
  .  0000:  01 00 00 00 01 00 50 00 ff ff ff ff 00 00 00 00  ......P.........
  .  0010:  00 00 00 89 ff ff ff ff 00 10 31 37 00 00 00 00  ..........17....
  .  0020:  00 00 00 89 ff ff ff ff 5b 6b 65 72 6e 65 6c 2e  ........[kernel.
  .  0030:  6b 61 6c 6c 73 79 6d 73 5d 5f 74 65 78 74 00 00  kallsyms]_text..
  .  0040:  00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00  ................

  0 0x148 [0x50]: PERF_RECORD_MMAP -1/0: [0xffffffff89000000(0x37311000) @ 0xffffffff89000000]: x [kernel.kallsyms]_text

  ...

  0x6375d [0x8]: event: 68
  .
  . ... raw event: size 8 bytes
  .  0000:  44 00 00 00 00 00 08 00                          D.......        

  0 0x6375d [0x8]: PERF_RECORD_FINISHED_ROUND

  0 [0x28]: event: 9
  .
  . ... raw event: size 40 bytes
  .  0000:  09 00 00 00 01 00 28 00 76 78 06 89 ff ff ff ff  ......(.vx......
  .  0010:  d4 1d 00 00 d4 1d 00 00 26 43 9f bf b4 d1 00 00  ........&C......
  .  0020:  01 00 00 00 00 00 00 00                          ........        

  230574239204134 0 [0x28]: PERF_RECORD_SAMPLE(IP, 0x1): 7636/7636: 0xffffffff89067876 period: 1 addr: 0
   ... thread: perf:7636
   ...... dso: /proc/kcore

  0 [0x30]: event: 3
  .
  . ... raw event: size 48 bytes
  .  0000:  03 00 00 00 00 20 30 00 d4 1d 00 00 d4 1d 00 00  ..... 0.........
  .  0010:  6d 61 74 72 69 78 2e 67 63 63 00 00 00 00 00 00  matrix.gcc......
  .  0020:  d4 1d 00 00 d4 1d 00 00 34 4a 9f bf b4 d1 00 00  ........4J......

  230574239205940 0 [0x30]: PERF_RECORD_COMM exec: matrix.gcc:7636/7636

  0 [0x28]: event: 9
  .
  . ... raw event: size 40 bytes
  .  0000:  09 00 00 00 01 00 28 00 76 78 06 89 ff ff ff ff  ......(.vx......
  .  0010:  d4 1d 00 00 d4 1d 00 00 1f af 9f bf b4 d1 00 00  ................
  .  0020:  3f 0c 00 00 00 00 00 00                          ?.......        

  230574239231775 0 [0x28]: PERF_RECORD_SAMPLE(IP, 0x1): 7636/7636: 0xffffffff89067876 period: 3135 addr: 0
   ... thread: matrix.gcc:7636
   ...... dso: /proc/kcore


  Aggregated stats:
             TOTAL events:    1001434
              MMAP events:        100
              LOST events:          0
              COMM events:          2
              EXIT events:          9
          THROTTLE events:          0
        UNTHROTTLE events:          0
              FORK events:          8
              READ events:          0
            SAMPLE events:     999192
             MMAP2 events:          7
               AUX events:          0
      ITRACE_START events:          0
      LOST_SAMPLES events:          0
            SWITCH events:          0
   SWITCH_CPU_WIDE events:          0
        NAMESPACES events:          0
           KSYMBOL events:          0
         BPF_EVENT events:          0
              ATTR events:          0
        EVENT_TYPE events:          0
      TRACING_DATA events:          0
          BUILD_ID events:          0
    FINISHED_ROUND events:        319
          ID_INDEX events:          0
     AUXTRACE_INFO events:          0
          AUXTRACE events:          0
    AUXTRACE_ERROR events:          0
        THREAD_MAP events:          1
           CPU_MAP events:          1
       STAT_CONFIG events:          0
              STAT events:          0
        STAT_ROUND events:          0
      EVENT_UPDATE events:          0
         TIME_CONV events:          1
           FEATURE events:          0
        COMPRESSED events:       1794
  cycles stats:
             TOTAL events:     999192
              MMAP events:          0
              LOST events:          0
              COMM events:          0
              EXIT events:          0
          THROTTLE events:          0
        UNTHROTTLE events:          0
              FORK events:          0
              READ events:          0
            SAMPLE events:     999192
             MMAP2 events:          0
               AUX events:          0
      ITRACE_START events:          0
      LOST_SAMPLES events:          0
            SWITCH events:          0
   SWITCH_CPU_WIDE events:          0
        NAMESPACES events:          0
           KSYMBOL events:          0
         BPF_EVENT events:          0
              ATTR events:          0
        EVENT_TYPE events:          0
      TRACING_DATA events:          0
          BUILD_ID events:          0
    FINISHED_ROUND events:          0
          ID_INDEX events:          0
     AUXTRACE_INFO events:          0
          AUXTRACE events:          0
    AUXTRACE_ERROR events:          0
        THREAD_MAP events:          0
           CPU_MAP events:          0
       STAT_CONFIG events:          0
              STAT events:          0
        STAT_ROUND events:          0
      EVENT_UPDATE events:          0
         TIME_CONV events:          0
           FEATURE events:          0
        COMPRESSED events:          0
---

^ permalink raw reply	[flat|nested] 59+ messages in thread

end of thread, other threads:[~2019-02-27 11:17 UTC | newest]

Thread overview: 59+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2019-02-11 20:17 [PATCH v2 0/4] perf: enable compression of record mode trace to save storage space Alexey Budankov
2019-02-11 20:21 ` [PATCH v2 1/4] feature: realize libzstd check, LIBZSTD_DIR and NO_LIBZSTD defines Alexey Budankov
2019-02-11 20:22 ` [PATCH v2 2/4] perf record: implement -z=<level> and --mmap-flush=<thres> options Alexey Budankov
2019-02-12 13:08   ` Jiri Olsa
2019-02-20 14:13     ` Alexey Budankov
2019-02-12 13:08   ` Jiri Olsa
2019-02-20 14:13     ` Alexey Budankov
2019-02-12 13:08   ` Jiri Olsa
2019-02-20 15:24     ` Alexey Budankov
2019-02-21  9:49       ` Jiri Olsa
2019-02-21 11:24         ` Alexey Budankov
2019-02-25 15:27           ` Alexey Budankov
2019-02-12 13:08   ` Jiri Olsa
2019-02-20 14:15     ` Alexey Budankov
2019-02-21  9:47       ` Jiri Olsa
2019-02-21 11:23         ` Alexey Budankov
2019-02-25 15:26           ` Alexey Budankov
2019-02-12 13:08   ` Jiri Olsa
2019-02-20 14:13     ` Alexey Budankov
2019-02-12 13:09   ` Jiri Olsa
2019-02-20 15:19     ` Alexey Budankov
2019-02-12 13:09   ` Jiri Olsa
2019-02-20 14:25     ` Alexey Budankov
2019-02-12 13:09   ` Jiri Olsa
2019-02-20 14:14     ` Alexey Budankov
2019-02-25 15:30       ` Alexey Budankov
2019-02-11 20:23 ` [PATCH v2 3/4] perf record: enable runtime trace compression Alexey Budankov
2019-02-12 13:08   ` Jiri Olsa
2019-02-20 15:13     ` Alexey Budankov
2019-02-21  9:43       ` Jiri Olsa
2019-02-21 11:30         ` Alexey Budankov
2019-02-12 13:08   ` Jiri Olsa
2019-02-20 14:53     ` Alexey Budankov
2019-02-21  9:43       ` Jiri Olsa
2019-02-21 11:18         ` Alexey Budankov
2019-02-12 13:09   ` Jiri Olsa
2019-02-20 15:09     ` Alexey Budankov
2019-02-25 15:27       ` Alexey Budankov
2019-02-12 13:09   ` Jiri Olsa
2019-02-20 15:11     ` Alexey Budankov
2019-02-12 13:09   ` Jiri Olsa
2019-02-20 15:03     ` Alexey Budankov
2019-02-12 13:09   ` Jiri Olsa
2019-02-20 15:06     ` Alexey Budankov
2019-02-11 20:25 ` [PATCH v2 4/4] perf report: support record trace file decompression Alexey Budankov
2019-02-12 13:08   ` Jiri Olsa
2019-02-20 15:19     ` Alexey Budankov
2019-02-25 15:28       ` Alexey Budankov
2019-02-12 13:09   ` Jiri Olsa
2019-02-20 14:48     ` Alexey Budankov
     [not found]       ` <0132ec08-e28b-4102-5053-8f8e21e7fd44@linux.intel.com>
2019-02-27 10:56         ` Alexey Budankov
2019-02-27 11:17           ` Jiri Olsa
2019-02-12 13:09   ` Jiri Olsa
2019-02-20 14:46     ` Alexey Budankov
2019-02-12 13:09   ` Jiri Olsa
2019-02-20 14:44     ` Alexey Budankov
2019-02-12 12:27 ` [PATCH v2 0/4] perf: enable compression of record mode trace to save storage space Arnaldo Carvalho de Melo
2019-02-12 14:06   ` Alexey Budankov
  -- strict thread matches above, loose matches on Subject: below --
2019-01-28  7:02 Alexey Budankov
2019-01-28  7:11 ` [PATCH v2 4/4] perf report: support record trace file decompression Alexey Budankov

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).