All of lore.kernel.org
 help / color / mirror / Atom feed
* [PATCH v1 0/4] perf: enable compression of record mode trace to save storage space
@ 2018-12-24 13:21 Alexey Budankov
  2018-12-24 13:35 ` [PATCH v1 1/4] feature: build libzstd feature check, LIBZSTD_DIR and NO_LIBZSTD defines Alexey Budankov
                   ` (4 more replies)
  0 siblings, 5 replies; 11+ messages in thread
From: Alexey Budankov @ 2018-12-24 13:21 UTC (permalink / raw)
  To: Arnaldo Carvalho de Melo, Ingo Molnar, Peter Zijlstra
  Cc: Jiri Olsa, Namhyung Kim, Alexander Shishkin, Andi Kleen, linux-kernel


The patch set implements runtime record trace compression accompanied by 
trace file decompression implemented in the tool report mode. Zstandard 
library API [1] is used for compression/decompression of data that come 
from perf_events kernel data buffers.

Realized -z,--compression_level=n option provides ~3-5x avg. trace file 
size reduction on the tested workloads what significantly saves user's 
storage space on larger server systems where trace file size can easily 
reach several tens or even hundreds of GiBs, especially when profiling 
with stacks for later dwarf unwinding, context-switches tracing and etc.

The option is effective jointly with asynchronous trace writing because 
compression requires auxiliary memory buffers to operate on and memory 
buffers for asynchronous trace writing serve that purpose.

Added --mmap-flush option can be used to avoid compressing every single 
byte of data from mmaped kernel buffers to the trace file and increase 
compression ratio at the same time lowering tool runtime overhead.

The feature can be disabled from the command line using NO_LIBZSTD define
and Zstandard sources can be overridden using value of LIBZSTD_DIR define.

The patch set is for Arnaldo's perf/core repository.

Examples:

  $ make -C tools/perf NO_LIBZSTD=1 clean all
  $ make -C tools/perf LIBZSTD_DIR=/path/to/zstd-1.3.7 clean all

  $ tools/perf/perf record -F 42000 --aio -z 1 --mmap-flush 0x1000 -e cycles -- matrix.gcc
  Addr of buf1 = 0x7fc1bf183010
  Offs of buf1 = 0x7fc1bf183180
  Addr of buf2 = 0x7fc1bd182010
  Offs of buf2 = 0x7fc1bd1821c0
  Addr of buf3 = 0x7fc1bb181010
  Offs of buf3 = 0x7fc1bb181100
  Addr of buf4 = 0x7fc1b9180010
  Offs of buf4 = 0x7fc1b9180140
  Threads #: 8 Pthreads
  Matrix size: 2048
  Using multiply kernel: multiply1
  Execution time = 25.499 seconds
  [ perf record: Woken up 1157 times to write data ]
  [ perf record: Compressed 316.684 MB to 58.034 MB, ratio is 5.457 ]
  [ perf record: Captured and wrote 58.059 MB perf.data ]

  $ tools/perf/perf report -D --header
  # ========
  # captured on    : Mon Dec 24 13:19:52 2018
  # header version : 1
  # data offset    : 296
  # data size      : 60878779
  # feat offset    : 60879075
  # hostname : nntvtune39
  # os release : 4.19.9-300.fc29.x86_64
  # perf version : 4.13.rc5.gdbb7997
  # arch : x86_64
  # nrcpus online : 8
  # nrcpus avail : 8
  # cpudesc : Intel(R) Core(TM) i7-6700K CPU @ 4.00GHz
  # cpuid : GenuineIntel,6,94,3
  # total memory : 16153380 kB
  # cmdline : /root/abudanko/kernel/acme/tools/perf/perf record -F 42000 --aio -z 1 --mmap-flush 0x1000 -e cycles -- ../../matrix/linux/matrix.gcc 
  # event : name = cycles, , id = { 2315, 2316, 2317, 2318, 2319, 2320, 2321, 2322 }, size = 112, { sample_period, sample_freq } = 42000, sample_type = IP|TID|TIME|PERIOD, read_form>
  # CPU_TOPOLOGY info available, use -I to display
  # NUMA_TOPOLOGY info available, use -I to display
  # pmu mappings: intel_pt = 8, software = 1, power = 11, uprobe = 7, uncore_imc = 12, cpu = 4, cstate_core = 18, uncore_cbox_2 = 15, breakpoint = 5, uncore_cbox_0 = 13, tracepoint >
  # CACHE info available, use -I to display
  # time of first sample : 0.000000
  # time of last sample : 0.000000
  # sample duration :      0.000 ms
  # MEM_TOPOLOGY info available, use -I to display
  # compressed : Zstd, level = 1, ratio = 5
  # missing features: TRACING_DATA BUILD_ID BRANCH_STACK GROUP_DESC AUXTRACE STAT CLOCKID 
  # ========
  #
  
  0x128 [0x20]: event: 79
  .
  . ... raw event: size 32 bytes
  .  0000:  4f 00 00 00 00 00 20 00 1f 00 00 00 00 00 00 00  O..... .........
  .  0010:  11 a6 ef 1f 00 00 00 00 f8 fe 7c b5 f5 ff ff ff  ..........|.....
  
  0 0x128 [0x20]: PERF_RECORD_TIME_CONV: unhandled!
  
  0x148 [0x50]: event: 1
  .
  . ... raw event: size 80 bytes
  .  0000:  01 00 00 00 01 00 50 00 ff ff ff ff 00 00 00 00  ......P.........
  .  0010:  00 00 00 a8 ff ff ff ff 00 80 33 18 00 00 00 00  ..........3.....
  .  0020:  00 00 00 a8 ff ff ff ff 5b 6b 65 72 6e 65 6c 2e  ........[kernel.
  .  0030:  6b 61 6c 6c 73 79 6d 73 5d 5f 74 65 78 74 00 00  kallsyms]_text..
  .  0040:  00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00  ................
  
  0 0x148 [0x50]: PERF_RECORD_MMAP -1/0: [0xffffffffa8000000(0x18338000) @ 0xffffffffa8000000]: x [kernel.kallsyms]_text

  ...
  0x62d8e [0x8]: event: 68
  .
  . ... raw event: size 8 bytes
  .  0000:  44 00 00 00 00 00 08 00                          D.......        
  
  0 0x62d8e [0x8]: PERF_RECORD_FINISHED_ROUND
  
  0 [0x28]: event: 9
  .
  . ... raw event: size 40 bytes
  .  0000:  09 00 00 00 01 00 28 00 76 78 06 a8 ff ff ff ff  ......(.vx......
  .  0010:  94 29 00 00 94 29 00 00 82 02 f4 af 33 3e 01 00  .)...)......3>..
  .  0020:  01 00 00 00 00 00 00 00                          ........        
  
  349866692969090 0 [0x28]: PERF_RECORD_SAMPLE(IP, 0x1): 10644/10644: 0xffffffffa8067876 period: 1 addr: 0
   ... thread: perf:10644
   ...... dso: vmlinux
  
  0 [0x30]: event: 3
  .
  . ... raw event: size 48 bytes
  .  0000:  03 00 00 00 00 20 30 00 94 29 00 00 94 29 00 00  ..... 0..)...)..
  .  0010:  6d 61 74 72 69 78 2e 67 63 63 00 00 00 00 00 00  matrix.gcc......
  .  0020:  94 29 00 00 94 29 00 00 36 09 f4 af 33 3e 01 00  .)...)..6...3>..
  ...
  349892217639276 0 [0x28]: PERF_RECORD_SAMPLE(IP, 0x1): 10644/10644: 0xffffffffa8288030 period: 45296 addr: 0
   ... thread: matrix.gcc:10644
   ...... dso: vmlinux

  0 [0x30]: event: 4
  .
  . ... raw event: size 48 bytes
  .  0000:  04 00 00 00 00 00 30 00 94 29 00 00 94 29 00 00  ......0..)...)..
  .  0010:  94 29 00 00 94 29 00 00 09 b6 57 a1 39 3e 01 00  .)...)....W.9>..
  .  0020:  94 29 00 00 94 29 00 00 60 b5 57 a1 39 3e 01 00  .)...)..`.W.9>..
  
  349892217648480 0 [0x30]: PERF_RECORD_EXIT(10644:10644):(10644:10644)
  
  Aggregated stats:
             TOTAL events:    8248327
              MMAP events:        100
              LOST events:          0
              COMM events:          2
              EXIT events:          9
          THROTTLE events:     163377
        UNTHROTTLE events:     163377
              FORK events:          8
              READ events:          0
            SAMPLE events:    7909529
             MMAP2 events:          9
               AUX events:          0
      ITRACE_START events:          0
      LOST_SAMPLES events:          0
            SWITCH events:          0
   SWITCH_CPU_WIDE events:          0
        NAMESPACES events:          0
              ATTR events:          0
        EVENT_TYPE events:          0
      TRACING_DATA events:          0
          BUILD_ID events:          0
    FINISHED_ROUND events:       1562
          ID_INDEX events:          0
     AUXTRACE_INFO events:          0
          AUXTRACE events:          0
    AUXTRACE_ERROR events:          0
        THREAD_MAP events:          1
           CPU_MAP events:          1
       STAT_CONFIG events:          0
              STAT events:          0
        STAT_ROUND events:          0
      EVENT_UPDATE events:          0
         TIME_CONV events:          1
           FEATURE events:          0
        COMPRESSED events:      10351
  cycles stats:
             TOTAL events:    7909529
              MMAP events:          0
              LOST events:          0
              COMM events:          0
              EXIT events:          0
          THROTTLE events:          0
        UNTHROTTLE events:          0
              FORK events:          0
              READ events:          0
            SAMPLE events:    7909529
             MMAP2 events:          0
               AUX events:          0
      ITRACE_START events:          0
      LOST_SAMPLES events:          0
            SWITCH events:          0
   SWITCH_CPU_WIDE events:          0
        NAMESPACES events:          0
              ATTR events:          0
        EVENT_TYPE events:          0
      TRACING_DATA events:          0
          BUILD_ID events:          0
    FINISHED_ROUND events:          0
          ID_INDEX events:          0
     AUXTRACE_INFO events:          0
          AUXTRACE events:          0
    AUXTRACE_ERROR events:          0
        THREAD_MAP events:          0
           CPU_MAP events:          0
       STAT_CONFIG events:          0
              STAT events:          0
        STAT_ROUND events:          0
      EVENT_UPDATE events:          0
         TIME_CONV events:          0
           FEATURE events:          0
        COMPRESSED events:          0
  
Dump of trace  without compression:

  $ tools/perf/perf report -D --header
  # ========
  # captured on    : Fri Dec 21 16:44:00 2018
  # header version : 1
  ..
  # time of first sample : 102911.898905
  # time of last sample : 102940.879058
  # sample duration :  28980.153 ms
  # MEM_TOPOLOGY info available, use -I to display
  # missing features: TRACING_DATA BRANCH_STACK GROUP_DESC AUXTRACE STAT CLOCKID COMPRESSED 
  # ========
  #

---
Alexey Budankov (4):
  feature: build libzstd feature check, LIBZSTD_DIR and NO_LIBZSTD defines
  perf record: introduce z, mmap-flush options and PERF_RECORD_COMPRESSED record
  perf record: enable runtime trace compression
  perf report: support record trace file decompression

 tools/build/Makefile.feature             |   6 +-
 tools/build/feature/Makefile             |   6 +-
 tools/build/feature/test-all.c           |   5 +
 tools/build/feature/test-libzstd.c       |  12 ++
 tools/perf/Documentation/perf-record.txt |  11 ++
 tools/perf/Makefile.config               |  20 +++
 tools/perf/Makefile.perf                 |   3 +
 tools/perf/builtin-record.c              | 213 +++++++++++++++++++++--
 tools/perf/builtin-report.c              | 151 +++++++++++++++-
 tools/perf/perf.h                        |   2 +
 tools/perf/util/env.h                    |  10 ++
 tools/perf/util/event.c                  |   1 +
 tools/perf/util/event.h                  |   7 +
 tools/perf/util/evlist.c                 |   6 +-
 tools/perf/util/evlist.h                 |   2 +-
 tools/perf/util/header.c                 |  47 ++++-
 tools/perf/util/header.h                 |   1 +
 tools/perf/util/machine.c                |   4 +
 tools/perf/util/mmap.c                   |  17 +-
 tools/perf/util/mmap.h                   |   5 +-
 tools/perf/util/session.c                |  59 ++++++-
 tools/perf/util/session.h                |  16 ++
 tools/perf/util/tool.h                   |   2 +
 23 files changed, 572 insertions(+), 34 deletions(-)
 create mode 100644 tools/build/feature/test-libzstd.c

---
[1] https://github.com/facebook/zstd

^ permalink raw reply	[flat|nested] 11+ messages in thread

* [PATCH v1 1/4] feature: build libzstd feature check, LIBZSTD_DIR and NO_LIBZSTD defines
  2018-12-24 13:21 [PATCH v1 0/4] perf: enable compression of record mode trace to save storage space Alexey Budankov
@ 2018-12-24 13:35 ` Alexey Budankov
  2018-12-24 13:45 ` [PATCH v1 2/4] perf record: introduce z, mmap-flush options and PERF_RECORD_COMPRESSED record Alexey Budankov
                   ` (3 subsequent siblings)
  4 siblings, 0 replies; 11+ messages in thread
From: Alexey Budankov @ 2018-12-24 13:35 UTC (permalink / raw)
  To: Arnaldo Carvalho de Melo, Ingo Molnar, Peter Zijlstra
  Cc: Jiri Olsa, Namhyung Kim, Alexander Shishkin, Andi Kleen, linux-kernel


Build feature check, LIBZSTD_DIR and NO_LIBZSTD defines to have capability
of overriding ZStandard library source and disabling of the feature and from 
the command line:

  $ make -C tools/perf LIBZSTD_DIR=/root/abudanko/zstd-1.3.7 clean all
  $ make -C tools/perf NO_LIBZSTD=1 clean all

Signed-off-by: Alexey Budankov <alexey.budankov@linux.intel.com>
---
 tools/build/Makefile.feature       |  6 ++++--
 tools/build/feature/Makefile       |  6 +++++-
 tools/build/feature/test-all.c     |  5 +++++
 tools/build/feature/test-libzstd.c | 12 ++++++++++++
 tools/perf/Makefile.config         | 20 ++++++++++++++++++++
 tools/perf/Makefile.perf           |  3 +++
 6 files changed, 49 insertions(+), 3 deletions(-)
 create mode 100644 tools/build/feature/test-libzstd.c

diff --git a/tools/build/Makefile.feature b/tools/build/Makefile.feature
index 5467c6bf9ceb..25088c8f05b2 100644
--- a/tools/build/Makefile.feature
+++ b/tools/build/Makefile.feature
@@ -71,7 +71,8 @@ FEATURE_TESTS_BASIC :=                  \
         sdt				\
         setns				\
         libopencsd			\
-        libaio
+        libaio				\
+        libzstd
 
 # FEATURE_TESTS_BASIC + FEATURE_TESTS_EXTRA is the complete list
 # of all feature tests
@@ -118,7 +119,8 @@ FEATURE_DISPLAY ?=              \
          lzma                   \
          get_cpuid              \
          bpf			\
-         libaio
+         libaio			\
+         libzstd
 
 # Set FEATURE_CHECK_(C|LD)FLAGS-all for all FEATURE_TESTS features.
 # If in the future we need per-feature checks/flags for features not
diff --git a/tools/build/feature/Makefile b/tools/build/feature/Makefile
index 7ceb4441b627..4b8244ee65ce 100644
--- a/tools/build/feature/Makefile
+++ b/tools/build/feature/Makefile
@@ -62,7 +62,8 @@ FILES=                                          \
          test-clang.bin				\
          test-llvm.bin				\
          test-llvm-version.bin			\
-         test-libaio.bin
+         test-libaio.bin			\
+         test-libzstd.bin
 
 FILES := $(addprefix $(OUTPUT),$(FILES))
 
@@ -301,6 +302,9 @@ $(OUTPUT)test-clang.bin:
 $(OUTPUT)test-libaio.bin:
 	$(BUILD) -lrt
 
+$(OUTPUT)test-libzstd.bin:
+	$(BUILD) -lzstd
+
 ###############################
 
 clean:
diff --git a/tools/build/feature/test-all.c b/tools/build/feature/test-all.c
index 20cdaa4fc112..5af329b6ffef 100644
--- a/tools/build/feature/test-all.c
+++ b/tools/build/feature/test-all.c
@@ -178,6 +178,10 @@
 # include "test-libaio.c"
 #undef main
 
+#define main main_test_zstd
+# include "test-libzstd.c"
+#undef main
+
 int main(int argc, char *argv[])
 {
 	main_test_libpython();
@@ -219,6 +223,7 @@ int main(int argc, char *argv[])
 	main_test_setns();
 	main_test_libopencsd();
 	main_test_libaio();
+	main_test_libzstd();
 
 	return 0;
 }
diff --git a/tools/build/feature/test-libzstd.c b/tools/build/feature/test-libzstd.c
new file mode 100644
index 000000000000..55268c01b84d
--- /dev/null
+++ b/tools/build/feature/test-libzstd.c
@@ -0,0 +1,12 @@
+// SPDX-License-Identifier: GPL-2.0
+#include <zstd.h>
+
+int main(void)
+{
+	ZSTD_CStream	*cstream;
+
+	cstream = ZSTD_createCStream();
+	ZSTD_freeCStream(cstream);
+
+	return 0;
+}
diff --git a/tools/perf/Makefile.config b/tools/perf/Makefile.config
index b441c88cafa1..1dccd776a4aa 100644
--- a/tools/perf/Makefile.config
+++ b/tools/perf/Makefile.config
@@ -145,6 +145,13 @@ endif
 FEATURE_CHECK_CFLAGS-libbabeltrace := $(LIBBABELTRACE_CFLAGS)
 FEATURE_CHECK_LDFLAGS-libbabeltrace := $(LIBBABELTRACE_LDFLAGS) -lbabeltrace-ctf
 
+ifdef LIBZSTD_DIR
+  LIBZSTD_CFLAGS  := -I$(LIBZSTD_DIR)/lib
+  LIBZSTD_LDFLAGS := -L$(LIBZSTD_DIR)/lib
+endif
+FEATURE_CHECK_CFLAGS-libzstd := $(LIBZSTD_CFLAGS)
+FEATURE_CHECK_LDFLAGS-libzstd := $(LIBZSTD_LDFLAGS)
+
 FEATURE_CHECK_CFLAGS-bpf = -I. -I$(srctree)/tools/include -I$(srctree)/tools/arch/$(SRCARCH)/include/uapi -I$(srctree)/tools/include/uapi
 # include ARCH specific config
 -include $(src-perf)/arch/$(SRCARCH)/Makefile
@@ -770,6 +777,19 @@ ifndef NO_LZMA
   endif
 endif
 
+ifndef NO_LIBZSTD
+  ifeq ($(feature-libzstd), 1)
+    CFLAGS += -DHAVE_ZSTD_SUPPORT
+    CFLAGS += $(LIBZSTD_CFLAGS)
+    LDFLAGS += $(LIBZSTD_LDFLAGS)
+    EXTLIBS += -lzstd
+    $(call detected,CONFIG_ZSTD)
+  else
+    msg := $(warning No libzstd found, disables trace compression, please install libzstd-dev[el] and/or set LIBZSTD_DIR);
+    NO_LIBZSTD := 1
+  endif
+endif
+
 ifndef NO_BACKTRACE
   ifeq ($(feature-backtrace), 1)
     CFLAGS += -DHAVE_BACKTRACE_SUPPORT
diff --git a/tools/perf/Makefile.perf b/tools/perf/Makefile.perf
index bd23e3f30895..dcac562e1d00 100644
--- a/tools/perf/Makefile.perf
+++ b/tools/perf/Makefile.perf
@@ -108,6 +108,9 @@ include ../scripts/utilities.mak
 # streaming for record mode. Currently Posix AIO trace streaming is
 # supported only when linking with glibc.
 #
+# Define NO_LIBZSTD if you do not want support of Zstandard based runtime
+# trace compression in record mode.
+#
 
 # As per kernel Makefile, avoid funny character set dependencies
 unexport LC_ALL

^ permalink raw reply related	[flat|nested] 11+ messages in thread

* [PATCH v1 2/4] perf record: introduce z, mmap-flush options and PERF_RECORD_COMPRESSED record
  2018-12-24 13:21 [PATCH v1 0/4] perf: enable compression of record mode trace to save storage space Alexey Budankov
  2018-12-24 13:35 ` [PATCH v1 1/4] feature: build libzstd feature check, LIBZSTD_DIR and NO_LIBZSTD defines Alexey Budankov
@ 2018-12-24 13:45 ` Alexey Budankov
  2019-01-09 16:58   ` Jiri Olsa
  2018-12-24 13:46 ` [PATCH v1 3/4] perf record: enable runtime trace compression Alexey Budankov
                   ` (2 subsequent siblings)
  4 siblings, 1 reply; 11+ messages in thread
From: Alexey Budankov @ 2018-12-24 13:45 UTC (permalink / raw)
  To: Arnaldo Carvalho de Melo, Ingo Molnar, Peter Zijlstra
  Cc: Jiri Olsa, Namhyung Kim, Alexander Shishkin, Andi Kleen, linux-kernel


Introduce --compression_level=n, --mmap-flush options and PERF_RECORD_COMPRESSED 
event record that contains compressed parts of mmap kernel buffer data.

Signed-off-by: Alexey Budankov <alexey.budankov@linux.intel.com>
---
 tools/perf/Documentation/perf-record.txt | 11 +++
 tools/perf/builtin-record.c              | 97 ++++++++++++++++++++----
 tools/perf/perf.h                        |  2 +
 tools/perf/util/env.h                    | 10 +++
 tools/perf/util/event.c                  |  1 +
 tools/perf/util/event.h                  |  7 ++
 tools/perf/util/evlist.c                 |  6 +-
 tools/perf/util/evlist.h                 |  2 +-
 tools/perf/util/header.c                 | 47 +++++++++++-
 tools/perf/util/header.h                 |  1 +
 tools/perf/util/mmap.c                   |  4 +-
 tools/perf/util/mmap.h                   |  3 +-
 12 files changed, 169 insertions(+), 22 deletions(-)

diff --git a/tools/perf/Documentation/perf-record.txt b/tools/perf/Documentation/perf-record.txt
index d232b13ea713..b849dfdefefe 100644
--- a/tools/perf/Documentation/perf-record.txt
+++ b/tools/perf/Documentation/perf-record.txt
@@ -440,6 +440,17 @@ Use <n> control blocks in asynchronous (Posix AIO) trace writing mode (default:
 Asynchronous mode is supported only when linking Perf tool with libc library
 providing implementation for Posix AIO API.
 
+-z::
+--compression-level=n::
+Produce compressed trace file to save storage space using specified level n (default: 0,
+best speed: 1, best compression: 22). Compression can be activated in asynchronous trace
+writing mode (--aio) only.
+
+--mmap-flush=n::
+Minimal number of bytes accumulated in mmap buffer that is flushed to trace file (default: 1).
+When compression mode (-z) is enabled it is recommended to set --mmap-flush to 4096 or more.
+Maximal allowed value is a quater of mmap kernel buffer size.
+
 --all-kernel::
 Configure all used events to run in kernel space.
 
diff --git a/tools/perf/builtin-record.c b/tools/perf/builtin-record.c
index 882285fb9f64..cb0b880281d7 100644
--- a/tools/perf/builtin-record.c
+++ b/tools/perf/builtin-record.c
@@ -81,6 +81,8 @@ struct record {
 	bool			timestamp_boundary;
 	struct switch_output	switch_output;
 	unsigned long long	samples;
+	u64			bytes_transferred;
+	u64			bytes_compressed;
 };
 
 static volatile int auxtrace_record__snapshot_started;
@@ -286,13 +288,17 @@ static int record__aio_parse(const struct option *opt,
 
 	if (unset) {
 		opts->nr_cblocks = 0;
-	} else {
-		if (str)
-			opts->nr_cblocks = strtol(str, NULL, 0);
-		if (!opts->nr_cblocks)
-			opts->nr_cblocks = nr_cblocks_default;
+		return 0;
 	}
 
+	if (str)
+		opts->nr_cblocks = strtol(str, NULL, 0);
+	if (!opts->nr_cblocks)
+		opts->nr_cblocks = nr_cblocks_default;
+
+	if (opts->nr_cblocks > nr_cblocks_max)
+		opts->nr_cblocks = nr_cblocks_max;
+
 	return 0;
 }
 #else /* HAVE_AIO_SUPPORT */
@@ -328,6 +334,30 @@ static int record__aio_enabled(struct record *rec)
 	return rec->opts.nr_cblocks > 0;
 }
 
+#define MMAP_FLUSH_DEFAULT 1
+
+static int record__mmap_flush_parse(const struct option *opt,
+				    const char *str,
+				    int unset)
+{
+	int mmap_len;
+	struct record_opts *opts = (struct record_opts *)opt->value;
+
+	if (unset)
+		return 0;
+
+	if (str)
+		opts->mmap_flush = strtol(str, NULL, 0);
+	if (!opts->mmap_flush)
+		opts->mmap_flush = MMAP_FLUSH_DEFAULT;
+
+	mmap_len = perf_evlist__mmap_size(opts->mmap_pages);
+	if (opts->mmap_flush > mmap_len / 4 )
+		opts->mmap_flush = mmap_len / 4;
+
+	return 0;
+}
+
 static int process_synthesized_event(struct perf_tool *tool,
 				     union perf_event *event,
 				     struct perf_sample *sample __maybe_unused,
@@ -533,7 +563,8 @@ static int record__mmap_evlist(struct record *rec,
 
 	if (perf_evlist__mmap_ex(evlist, opts->mmap_pages,
 				 opts->auxtrace_mmap_pages,
-				 opts->auxtrace_snapshot_mode, opts->nr_cblocks) < 0) {
+				 opts->auxtrace_snapshot_mode,
+				 opts->nr_cblocks, opts->mmap_flush) < 0) {
 		if (errno == EPERM) {
 			pr_err("Permission error mapping pages.\n"
 			       "Consider increasing "
@@ -723,7 +754,7 @@ static struct perf_event_header finished_round_event = {
 };
 
 static int record__mmap_read_evlist(struct record *rec, struct perf_evlist *evlist,
-				    bool overwrite)
+				    bool overwrite, bool sync)
 {
 	u64 bytes_written = rec->bytes_written;
 	int i;
@@ -746,11 +777,18 @@ static int record__mmap_read_evlist(struct record *rec, struct perf_evlist *evli
 		off = record__aio_get_pos(trace_fd);
 
 	for (i = 0; i < evlist->nr_mmaps; i++) {
+		u64 flush;
 		struct perf_mmap *map = &maps[i];
 
 		if (map->base) {
+			if (sync) {
+				flush = map->flush;
+				map->flush = MMAP_FLUSH_DEFAULT;
+			}
 			if (!record__aio_enabled(rec)) {
 				if (perf_mmap__push(map, rec, record__pushfn) != 0) {
+					if (sync)
+						map->flush = flush;
 					rc = -1;
 					goto out;
 				}
@@ -763,10 +801,14 @@ static int record__mmap_read_evlist(struct record *rec, struct perf_evlist *evli
 				idx = record__aio_sync(map, false);
 				if (perf_mmap__aio_push(map, rec, idx, record__aio_pushfn, &off) != 0) {
 					record__aio_set_pos(trace_fd, off);
+					if (sync)
+						map->flush = flush;
 					rc = -1;
 					goto out;
 				}
 			}
+			if (sync)
+				map->flush = flush;
 		}
 
 		if (map->auxtrace_mmap.base && !rec->opts.auxtrace_snapshot_mode &&
@@ -792,15 +834,15 @@ static int record__mmap_read_evlist(struct record *rec, struct perf_evlist *evli
 	return rc;
 }
 
-static int record__mmap_read_all(struct record *rec)
+static int record__mmap_read_all(struct record *rec, bool sync)
 {
 	int err;
 
-	err = record__mmap_read_evlist(rec, rec->evlist, false);
+	err = record__mmap_read_evlist(rec, rec->evlist, false, sync);
 	if (err)
 		return err;
 
-	return record__mmap_read_evlist(rec, rec->evlist, true);
+	return record__mmap_read_evlist(rec, rec->evlist, true, sync);
 }
 
 static void record__init_features(struct record *rec)
@@ -826,6 +868,9 @@ static void record__init_features(struct record *rec)
 	if (!(rec->opts.use_clockid && rec->opts.clockid_res_ns))
 		perf_header__clear_feat(&session->header, HEADER_CLOCKID);
 
+	if (!(rec->opts.comp_level && rec->opts.nr_cblocks))
+		perf_header__clear_feat(&session->header, HEADER_COMPRESSED);
+
 	perf_header__clear_feat(&session->header, HEADER_STAT);
 }
 
@@ -1130,6 +1175,10 @@ static int __cmd_record(struct record *rec, int argc, const char **argv)
 	fd = perf_data__fd(data);
 	rec->session = session;
 
+	session->header.env.comp_type = PERF_COMP_NONE;
+	rec->opts.comp_level = 0;
+	session->header.env.comp_level = rec->opts.comp_level;
+
 	record__init_features(rec);
 
 	if (rec->opts.use_clockid && rec->opts.clockid_res_ns)
@@ -1159,6 +1208,7 @@ static int __cmd_record(struct record *rec, int argc, const char **argv)
 		err = -1;
 		goto out_child;
 	}
+	session->header.env.comp_mmap_len = session->evlist->mmap_len;
 
 	err = bpf__apply_obj_config();
 	if (err) {
@@ -1294,7 +1344,7 @@ static int __cmd_record(struct record *rec, int argc, const char **argv)
 		if (trigger_is_hit(&switch_output_trigger) || done || draining)
 			perf_evlist__toggle_bkw_mmap(rec->evlist, BKW_MMAP_DATA_PENDING);
 
-		if (record__mmap_read_all(rec) < 0) {
+		if (record__mmap_read_all(rec, false) < 0) {
 			trigger_error(&auxtrace_snapshot_trigger);
 			trigger_error(&switch_output_trigger);
 			err = -1;
@@ -1395,8 +1445,16 @@ static int __cmd_record(struct record *rec, int argc, const char **argv)
 		record__synthesize_workload(rec, true);
 
 out_child:
+	record__mmap_read_all(rec, true);
 	record__aio_mmap_read_sync(rec);
 
+	if (!quiet && rec->bytes_transferred && rec->bytes_compressed) {
+		float ratio = (float)rec->bytes_transferred/(float)rec->bytes_compressed;
+		session->header.env.comp_ratio = ratio + 0.5;
+		fprintf(stderr,	"[ perf record: Compressed %.3f MB to %.3f MB, ratio is %.3f ]\n",
+			rec->bytes_transferred / 1024.0 / 1024.0, rec->bytes_compressed / 1024.0 / 1024.0, ratio);
+	}
+
 	if (forks) {
 		int exit_status;
 
@@ -1782,6 +1840,7 @@ static struct record record = {
 			.uses_mmap   = true,
 			.default_per_cpu = true,
 		},
+		.mmap_flush          = MMAP_FLUSH_DEFAULT,
 	},
 	.tool = {
 		.sample		= process_sample_event,
@@ -1945,7 +2004,12 @@ static struct option __record_options[] = {
 	OPT_CALLBACK_OPTARG(0, "aio", &record.opts,
 		     &nr_cblocks_default, "n", "Use <n> control blocks in asynchronous trace writing mode (default: 1, max: 4)",
 		     record__aio_parse),
+	OPT_UINTEGER('z', "compression-level", &record.opts.comp_level,
+		     "Produce compressed trace file (default: 0, best speed: 1, best compression: 22)"),
 #endif
+	OPT_CALLBACK(0, "mmap-flush", &record.opts, "num",
+		     "Minimal number of bytes in mmap buffer that is flushed to trace file (default: 1)",
+		     record__mmap_flush_parse),
 	OPT_END()
 };
 
@@ -2138,10 +2202,13 @@ int cmd_record(int argc, const char **argv)
 		goto out;
 	}
 
-	if (rec->opts.nr_cblocks > nr_cblocks_max)
-		rec->opts.nr_cblocks = nr_cblocks_max;
-	if (verbose > 0)
-		pr_info("nr_cblocks: %d\n", rec->opts.nr_cblocks);
+	pr_debug("nr_cblocks: %d\n", rec->opts.nr_cblocks);
+
+	if (rec->opts.comp_level > 22)
+		rec->opts.comp_level = 0;
+	pr_debug("Compression level: %d\n", rec->opts.comp_level);
+
+	pr_debug("mmap flush (B): %d\n", rec->opts.mmap_flush);
 
 	err = __cmd_record(&record, argc, argv);
 out:
diff --git a/tools/perf/perf.h b/tools/perf/perf.h
index 388c6dd128b8..0352b5a5b9d5 100644
--- a/tools/perf/perf.h
+++ b/tools/perf/perf.h
@@ -83,6 +83,8 @@ struct record_opts {
 	clockid_t    clockid;
 	u64          clockid_res_ns;
 	int	     nr_cblocks;
+	unsigned int comp_level;
+	int	     mmap_flush;
 };
 
 struct option;
diff --git a/tools/perf/util/env.h b/tools/perf/util/env.h
index d01b8355f4ca..3d1ab2ccc128 100644
--- a/tools/perf/util/env.h
+++ b/tools/perf/util/env.h
@@ -64,6 +64,16 @@ struct perf_env {
 	struct memory_node	*memory_nodes;
 	unsigned long long	 memory_bsize;
 	u64                     clockid_res_ns;
+	u32			comp_type;
+	u32			comp_level;
+	u32			comp_ratio;
+	u32			comp_mmap_len;
+};
+
+enum perf_compress_type {
+	PERF_COMP_NONE = 0,
+	PERF_COMP_ZSTD,
+	PERF_COMP_EOF
 };
 
 extern struct perf_env perf_env;
diff --git a/tools/perf/util/event.c b/tools/perf/util/event.c
index 937a5a4f71cc..20730ba2a08b 100644
--- a/tools/perf/util/event.c
+++ b/tools/perf/util/event.c
@@ -62,6 +62,7 @@ static const char *perf_event__names[] = {
 	[PERF_RECORD_EVENT_UPDATE]		= "EVENT_UPDATE",
 	[PERF_RECORD_TIME_CONV]			= "TIME_CONV",
 	[PERF_RECORD_HEADER_FEATURE]		= "FEATURE",
+	[PERF_RECORD_COMPRESSED]		= "COMPRESSED",
 };
 
 static const char *perf_ns__names[] = {
diff --git a/tools/perf/util/event.h b/tools/perf/util/event.h
index eb95f3384958..03960cfbe8d3 100644
--- a/tools/perf/util/event.h
+++ b/tools/perf/util/event.h
@@ -249,6 +249,7 @@ enum perf_user_event_type { /* above any possible kernel type */
 	PERF_RECORD_EVENT_UPDATE		= 78,
 	PERF_RECORD_TIME_CONV			= 79,
 	PERF_RECORD_HEADER_FEATURE		= 80,
+	PERF_RECORD_COMPRESSED			= 81,
 	PERF_RECORD_HEADER_MAX
 };
 
@@ -620,6 +621,11 @@ struct feature_event {
 	char				data[];
 };
 
+struct compressed_event {
+	struct perf_event_header	header;
+	char				data[];
+};
+
 union perf_event {
 	struct perf_event_header	header;
 	struct mmap_event		mmap;
@@ -651,6 +657,7 @@ union perf_event {
 	struct stat_round_event		stat_round;
 	struct time_conv_event		time_conv;
 	struct feature_event		feat;
+	struct compressed_event		pack;
 };
 
 void perf_event__print_totals(void);
diff --git a/tools/perf/util/evlist.c b/tools/perf/util/evlist.c
index 8c902276d4b4..c82d4fd32dcf 100644
--- a/tools/perf/util/evlist.c
+++ b/tools/perf/util/evlist.c
@@ -1022,7 +1022,7 @@ int perf_evlist__parse_mmap_pages(const struct option *opt, const char *str,
  */
 int perf_evlist__mmap_ex(struct perf_evlist *evlist, unsigned int pages,
 			 unsigned int auxtrace_pages,
-			 bool auxtrace_overwrite, int nr_cblocks)
+			 bool auxtrace_overwrite, int nr_cblocks, int flush)
 {
 	struct perf_evsel *evsel;
 	const struct cpu_map *cpus = evlist->cpus;
@@ -1032,7 +1032,7 @@ int perf_evlist__mmap_ex(struct perf_evlist *evlist, unsigned int pages,
 	 * Its value is decided by evsel's write_backward.
 	 * So &mp should not be passed through const pointer.
 	 */
-	struct mmap_params mp = { .nr_cblocks = nr_cblocks };
+	struct mmap_params mp = { .nr_cblocks = nr_cblocks, .flush = flush };
 
 	if (!evlist->mmap)
 		evlist->mmap = perf_evlist__alloc_mmap(evlist, false);
@@ -1064,7 +1064,7 @@ int perf_evlist__mmap_ex(struct perf_evlist *evlist, unsigned int pages,
 
 int perf_evlist__mmap(struct perf_evlist *evlist, unsigned int pages)
 {
-	return perf_evlist__mmap_ex(evlist, pages, 0, false, 0);
+	return perf_evlist__mmap_ex(evlist, pages, 0, false, 0, 1);
 }
 
 int perf_evlist__create_maps(struct perf_evlist *evlist, struct target *target)
diff --git a/tools/perf/util/evlist.h b/tools/perf/util/evlist.h
index 868294491194..33af704e55a2 100644
--- a/tools/perf/util/evlist.h
+++ b/tools/perf/util/evlist.h
@@ -162,7 +162,7 @@ unsigned long perf_event_mlock_kb_in_pages(void);
 
 int perf_evlist__mmap_ex(struct perf_evlist *evlist, unsigned int pages,
 			 unsigned int auxtrace_pages,
-			 bool auxtrace_overwrite, int nr_cblocks);
+			 bool auxtrace_overwrite, int nr_cblocks, int flush);
 int perf_evlist__mmap(struct perf_evlist *evlist, unsigned int pages);
 void perf_evlist__munmap(struct perf_evlist *evlist);
 
diff --git a/tools/perf/util/header.c b/tools/perf/util/header.c
index dec6d218c31c..37ab460b6f06 100644
--- a/tools/perf/util/header.c
+++ b/tools/perf/util/header.c
@@ -1463,6 +1463,23 @@ static int write_mem_topology(struct feat_fd *ff __maybe_unused,
 	return ret;
 }
 
+static int write_compressed(struct feat_fd *ff __maybe_unused,
+			    struct perf_evlist *evlist __maybe_unused)
+{
+	int ret;
+	u64 compression_info = ((u64)ff->ph->env.comp_type  << 32) |
+			             ff->ph->env.comp_level;
+
+	ret = do_write(ff, &compression_info, sizeof(compression_info));
+	if (ret)
+		return ret;
+
+	compression_info = ((u64)ff->ph->env.comp_ratio << 32) |
+	                         ff->ph->env.comp_mmap_len;
+
+	return do_write(ff, &compression_info, sizeof(compression_info));
+}
+
 static void print_hostname(struct feat_fd *ff, FILE *fp)
 {
 	fprintf(fp, "# hostname : %s\n", ff->ph->env.hostname);
@@ -1750,6 +1767,13 @@ static void print_cache(struct feat_fd *ff, FILE *fp __maybe_unused)
 	}
 }
 
+static void print_compressed(struct feat_fd *ff, FILE *fp)
+{
+	fprintf(fp, "# compressed : %s, level = %d, ratio = %d\n",
+		ff->ph->env.comp_type == PERF_COMP_ZSTD ? "Zstd" : "Unknown",
+		ff->ph->env.comp_level, ff->ph->env.comp_ratio);
+}
+
 static void print_pmu_mappings(struct feat_fd *ff, FILE *fp)
 {
 	const char *delimiter = "# pmu mappings: ";
@@ -2592,6 +2616,26 @@ static int process_clockid(struct feat_fd *ff,
 	return 0;
 }
 
+static int process_compressed(struct feat_fd *ff,
+			      void *data __maybe_unused)
+{
+	u64 compression_info;
+
+	if (do_read_u64(ff, &compression_info))
+		return -1;
+
+	ff->ph->env.comp_type  = (compression_info >> 32) & 0xffffffffULL;
+	ff->ph->env.comp_level = compression_info & 0xffffffffULL;
+
+	if (do_read_u64(ff, &compression_info))
+		return -1;
+
+	ff->ph->env.comp_ratio = (compression_info >> 32) & 0xffffffffULL;
+	ff->ph->env.comp_mmap_len = compression_info & 0xffffffffULL;
+
+	return 0;
+}
+
 struct feature_ops {
 	int (*write)(struct feat_fd *ff, struct perf_evlist *evlist);
 	void (*print)(struct feat_fd *ff, FILE *fp);
@@ -2651,7 +2695,8 @@ static const struct feature_ops feat_ops[HEADER_LAST_FEATURE] = {
 	FEAT_OPN(CACHE,		cache,		true),
 	FEAT_OPR(SAMPLE_TIME,	sample_time,	false),
 	FEAT_OPR(MEM_TOPOLOGY,	mem_topology,	true),
-	FEAT_OPR(CLOCKID,       clockid,        false)
+	FEAT_OPR(CLOCKID,       clockid,        false),
+	FEAT_OPR(COMPRESSED,	compressed,	false)
 };
 
 struct header_print_data {
diff --git a/tools/perf/util/header.h b/tools/perf/util/header.h
index 0d553ddca0a3..ee867075dc64 100644
--- a/tools/perf/util/header.h
+++ b/tools/perf/util/header.h
@@ -39,6 +39,7 @@ enum {
 	HEADER_SAMPLE_TIME,
 	HEADER_MEM_TOPOLOGY,
 	HEADER_CLOCKID,
+	HEADER_COMPRESSED,
 	HEADER_LAST_FEATURE,
 	HEADER_FEAT_BITS	= 256,
 };
diff --git a/tools/perf/util/mmap.c b/tools/perf/util/mmap.c
index 8fc39311a30d..5e71b0183e33 100644
--- a/tools/perf/util/mmap.c
+++ b/tools/perf/util/mmap.c
@@ -347,6 +347,8 @@ int perf_mmap__mmap(struct perf_mmap *map, struct mmap_params *mp, int fd, int c
 				&mp->auxtrace_mp, map->base, fd))
 		return -1;
 
+	map->flush = mp->flush;
+
 	return perf_mmap__aio_mmap(map, mp);
 }
 
@@ -395,7 +397,7 @@ static int __perf_mmap__read_init(struct perf_mmap *md)
 	md->start = md->overwrite ? head : old;
 	md->end = md->overwrite ? old : head;
 
-	if (md->start == md->end)
+	if ((md->end - md->start) < md->flush)
 		return -EAGAIN;
 
 	size = md->end - md->start;
diff --git a/tools/perf/util/mmap.h b/tools/perf/util/mmap.h
index aeb6942fdb00..afbfb8b58d45 100644
--- a/tools/perf/util/mmap.h
+++ b/tools/perf/util/mmap.h
@@ -30,6 +30,7 @@ struct perf_mmap {
 	bool		 overwrite;
 	struct auxtrace_mmap auxtrace_mmap;
 	char		 event_copy[PERF_SAMPLE_MAX_SIZE] __aligned(8);
+	u64		 flush;
 #ifdef HAVE_AIO_SUPPORT
 	struct {
 		void		 **data;
@@ -69,7 +70,7 @@ enum bkw_mmap_state {
 };
 
 struct mmap_params {
-	int			    prot, mask, nr_cblocks;
+	int			    prot, mask, nr_cblocks, flush;
 	struct auxtrace_mmap_params auxtrace_mp;
 };

^ permalink raw reply related	[flat|nested] 11+ messages in thread

* [PATCH v1 3/4] perf record: enable runtime trace compression
  2018-12-24 13:21 [PATCH v1 0/4] perf: enable compression of record mode trace to save storage space Alexey Budankov
  2018-12-24 13:35 ` [PATCH v1 1/4] feature: build libzstd feature check, LIBZSTD_DIR and NO_LIBZSTD defines Alexey Budankov
  2018-12-24 13:45 ` [PATCH v1 2/4] perf record: introduce z, mmap-flush options and PERF_RECORD_COMPRESSED record Alexey Budankov
@ 2018-12-24 13:46 ` Alexey Budankov
  2018-12-24 14:00 ` [PATCH v1 4/4] perf report: support record trace file decompression Alexey Budankov
  2019-01-09 17:28 ` [PATCH v1 0/4] perf: enable compression of record mode trace to save storage space Jiri Olsa
  4 siblings, 0 replies; 11+ messages in thread
From: Alexey Budankov @ 2018-12-24 13:46 UTC (permalink / raw)
  To: Arnaldo Carvalho de Melo, Ingo Molnar, Peter Zijlstra
  Cc: Jiri Olsa, Namhyung Kim, Alexander Shishkin, Andi Kleen, linux-kernel


Compression is implemented using Zstandard API and employs AIO buffers
as the memory to operate on so memcpy() is substituted by the API call.
If the API call fails for some reason copying falls back to memcpy().
Data chunks are split and packed into PERF_RECORD_COMPRESSED records by
64KB at max. mmap-flush option value can be used to avoid compression of
every single byte of data and increase compression ratio.

Signed-off-by: Alexey Budankov <alexey.budankov@linux.intel.com>
---
 tools/perf/builtin-record.c | 122 ++++++++++++++++++++++++++++++++++--
 tools/perf/util/mmap.c      |  13 ++--
 tools/perf/util/mmap.h      |   2 +
 3 files changed, 127 insertions(+), 10 deletions(-)

diff --git a/tools/perf/builtin-record.c b/tools/perf/builtin-record.c
index cb0b880281d7..0ef1878967f8 100644
--- a/tools/perf/builtin-record.c
+++ b/tools/perf/builtin-record.c
@@ -53,6 +53,9 @@
 #include <sys/mman.h>
 #include <sys/wait.h>
 #include <linux/time64.h>
+#ifdef HAVE_ZSTD_SUPPORT
+#include <zstd.h>
+#endif
 
 struct switch_output {
 	bool		 enabled;
@@ -83,6 +86,9 @@ struct record {
 	unsigned long long	samples;
 	u64			bytes_transferred;
 	u64			bytes_compressed;
+#ifdef HAVE_ZSTD_SUPPORT
+	ZSTD_CStream		*zstd_cstream;
+#endif
 };
 
 static volatile int auxtrace_record__snapshot_started;
@@ -358,6 +364,109 @@ static int record__mmap_flush_parse(const struct option *opt,
 	return 0;
 }
 
+#ifdef HAVE_ZSTD_SUPPORT
+static int record__zstd_init(struct record *rec)
+{
+	size_t ret;
+
+	if (rec->opts.comp_level == 0)
+		return 0;
+
+	rec->zstd_cstream = ZSTD_createCStream();
+	if (rec->zstd_cstream == NULL) {
+		pr_err("Couldn't create compression stream, disables trace compression\n");
+		return -1;
+	}
+
+	ret = ZSTD_initCStream(rec->zstd_cstream, rec->opts.comp_level);
+	if (ZSTD_isError(ret)) {
+		pr_err("Failed to initialize compression stream: %s\n", ZSTD_getErrorName(ret));
+		return -1;
+	}
+
+	return 0;
+}
+
+static int record__zstd_fini(struct record *rec)
+{
+	if (rec->zstd_cstream) {
+		ZSTD_freeCStream(rec->zstd_cstream);
+		rec->zstd_cstream = NULL;
+	}
+
+	return 0;
+}
+
+static size_t record__zstd_compress(void *to,  void *dst, size_t dst_size,
+			            void *src, size_t src_size)
+{
+	void *dst_head = dst;
+	struct record *rec = to;
+	size_t ret, size, compressed = 0;
+	struct compressed_event *event = NULL;
+	/* maximum size of record data size (2^16 - 1 - header) */
+	const size_t max_data_size = (1 << 8 * sizeof(event->header.size)) -
+				      1 - sizeof(struct compressed_event);
+	ZSTD_inBuffer input = { src, src_size, 0 };
+	ZSTD_outBuffer output;
+
+	if (rec->opts.comp_level == 0) {
+		memcpy(dst_head, src, src_size);
+		return src_size;
+	}
+
+	while (input.pos < input.size) {
+		event = dst;
+
+		event->header.type = PERF_RECORD_COMPRESSED;
+		event->header.size = size = sizeof(struct compressed_event);
+		compressed += size;
+		dst += size;
+		dst_size -= size;
+
+		output = (ZSTD_outBuffer){ dst, (dst_size > max_data_size) ?
+						max_data_size : dst_size, 0 };
+		ret = ZSTD_compressStream(rec->zstd_cstream, &output, &input);
+		ZSTD_flushStream(rec->zstd_cstream, &output);
+		if (ZSTD_isError(ret)) {
+			pr_err("failed to compress %ld bytes: %s\n",
+				(long)src_size, ZSTD_getErrorName(ret));
+			memcpy(dst_head, src, src_size);
+			return src_size;
+		}
+		size = output.pos;
+
+		event->header.size += size;
+		compressed += size;
+		dst += size;
+		dst_size -= size;
+	}
+
+	rec->bytes_transferred += src_size;
+	rec->bytes_compressed += compressed;
+
+	return compressed;
+}
+#else /* !HAVE_ZSTD_SUPPORT */
+static int record__zstd_init(struct record *rec __maybe_unused)
+{
+	return -1;
+}
+
+static int record__zstd_fini(struct record *rec __maybe_unused)
+{
+	return 0;
+}
+
+static size_t record__zstd_compress(void *to __maybe_unused,
+			            void *dst, size_t dst_size __maybe_unused,
+			            void *src, size_t src_size)
+{
+	memcpy(dst, src, src_size);
+	return src_size;
+}
+#endif
+
 static int process_synthesized_event(struct perf_tool *tool,
 				     union perf_event *event,
 				     struct perf_sample *sample __maybe_unused,
@@ -799,7 +908,8 @@ static int record__mmap_read_evlist(struct record *rec, struct perf_evlist *evli
 				 * becomes available after previous aio write request.
 				 */
 				idx = record__aio_sync(map, false);
-				if (perf_mmap__aio_push(map, rec, idx, record__aio_pushfn, &off) != 0) {
+				if (perf_mmap__aio_push(map, rec, idx,
+					record__zstd_compress, record__aio_pushfn, &off) != 0) {
 					record__aio_set_pos(trace_fd, off);
 					if (sync)
 						map->flush = flush;
@@ -1175,8 +1285,12 @@ static int __cmd_record(struct record *rec, int argc, const char **argv)
 	fd = perf_data__fd(data);
 	rec->session = session;
 
-	session->header.env.comp_type = PERF_COMP_NONE;
-	rec->opts.comp_level = 0;
+	if (record__zstd_init(rec) == 0) {
+		session->header.env.comp_type = PERF_COMP_ZSTD;
+	} else {
+		session->header.env.comp_type = PERF_COMP_NONE;
+		rec->opts.comp_level = 0;
+	}
 	session->header.env.comp_level = rec->opts.comp_level;
 
 	record__init_features(rec);
@@ -1447,7 +1561,7 @@ static int __cmd_record(struct record *rec, int argc, const char **argv)
 out_child:
 	record__mmap_read_all(rec, true);
 	record__aio_mmap_read_sync(rec);
-
+	record__zstd_fini(rec);
 	if (!quiet && rec->bytes_transferred && rec->bytes_compressed) {
 		float ratio = (float)rec->bytes_transferred/(float)rec->bytes_compressed;
 		session->header.env.comp_ratio = ratio + 0.5;
diff --git a/tools/perf/util/mmap.c b/tools/perf/util/mmap.c
index 5e71b0183e33..58a71ca77df5 100644
--- a/tools/perf/util/mmap.c
+++ b/tools/perf/util/mmap.c
@@ -218,14 +218,16 @@ static void perf_mmap__aio_munmap(struct perf_mmap *map)
 }
 
 int perf_mmap__aio_push(struct perf_mmap *md, void *to, int idx,
+			size_t compress(void *to, void *dst, size_t dst_size, void *src, size_t src_size),
 			int push(void *to, struct aiocb *cblock, void *buf, size_t size, off_t off),
 			off_t *off)
 {
 	u64 head = perf_mmap__read_head(md);
 	unsigned char *data = md->base + page_size;
-	unsigned long size, size0 = 0;
+	size_t size, size0 = 0, size1 = 0;
 	void *buf;
 	int rc = 0;
+	size_t mmap_len = perf_mmap__mmap_len(md);
 
 	rc = perf_mmap__read_init(md);
 	if (rc < 0)
@@ -254,14 +256,13 @@ int perf_mmap__aio_push(struct perf_mmap *md, void *to, int idx,
 		buf = &data[md->start & md->mask];
 		size = md->mask + 1 - (md->start & md->mask);
 		md->start += size;
-		memcpy(md->aio.data[idx], buf, size);
-		size0 = size;
+		size0 = compress(to, md->aio.data[idx], mmap_len, buf, size);
 	}
 
 	buf = &data[md->start & md->mask];
 	size = md->end - md->start;
 	md->start += size;
-	memcpy(md->aio.data[idx] + size0, buf, size);
+	size1 = compress(to, md->aio.data[idx] + size0, mmap_len - size0, buf, size);
 
 	/*
 	 * Increment md->refcount to guard md->data[idx] buffer
@@ -277,9 +278,9 @@ int perf_mmap__aio_push(struct perf_mmap *md, void *to, int idx,
 	md->prev = head;
 	perf_mmap__consume(md);
 
-	rc = push(to, &md->aio.cblocks[idx], md->aio.data[idx], size0 + size, *off);
+	rc = push(to, &md->aio.cblocks[idx], md->aio.data[idx], size0 + size1, *off);
 	if (!rc) {
-		*off += size0 + size;
+		*off += size0 + size1;
 	} else {
 		/*
 		 * Decrement md->refcount back if aio write
diff --git a/tools/perf/util/mmap.h b/tools/perf/util/mmap.h
index afbfb8b58d45..0b3b8b46410a 100644
--- a/tools/perf/util/mmap.h
+++ b/tools/perf/util/mmap.h
@@ -100,10 +100,12 @@ int perf_mmap__push(struct perf_mmap *md, void *to,
 		    int push(struct perf_mmap *map, void *to, void *buf, size_t size));
 #ifdef HAVE_AIO_SUPPORT
 int perf_mmap__aio_push(struct perf_mmap *md, void *to, int idx,
+			size_t compress(void *to, void *dst, size_t dst_size, void *src, size_t src_size),
 			int push(void *to, struct aiocb *cblock, void *buf, size_t size, off_t off),
 			off_t *off);
 #else
 static inline int perf_mmap__aio_push(struct perf_mmap *md __maybe_unused, void *to __maybe_unused, int idx __maybe_unused,
+	size_t compress(void *to, void *dst, size_t dst_size, void *src, size_t src_size) __maybe_unused,
 	int push(void *to, struct aiocb *cblock, void *buf, size_t size, off_t off) __maybe_unused,
 	off_t *off __maybe_unused)
 {

^ permalink raw reply related	[flat|nested] 11+ messages in thread

* [PATCH v1 4/4] perf report: support record trace file decompression
  2018-12-24 13:21 [PATCH v1 0/4] perf: enable compression of record mode trace to save storage space Alexey Budankov
                   ` (2 preceding siblings ...)
  2018-12-24 13:46 ` [PATCH v1 3/4] perf record: enable runtime trace compression Alexey Budankov
@ 2018-12-24 14:00 ` Alexey Budankov
  2019-01-09 17:28 ` [PATCH v1 0/4] perf: enable compression of record mode trace to save storage space Jiri Olsa
  4 siblings, 0 replies; 11+ messages in thread
From: Alexey Budankov @ 2018-12-24 14:00 UTC (permalink / raw)
  To: Arnaldo Carvalho de Melo, Ingo Molnar, Peter Zijlstra
  Cc: Jiri Olsa, Namhyung Kim, Alexander Shishkin, Andi Kleen, linux-kernel


PERF_RECORD_COMPRESSED records are decompressed from trace file into a 
linked list of mmaped memory regions using Zstandard API. After that the 
region is loaded fetching uncompressed events. When dumping raw trace 
like perf report -D file offsets of events from compressed records are 
set to zero.

Signed-off-by: Alexey Budankov <alexey.budankov@linux.intel.com>
---
 tools/perf/builtin-report.c | 151 +++++++++++++++++++++++++++++++++++-
 tools/perf/util/machine.c   |   4 +
 tools/perf/util/session.c   |  59 +++++++++++++-
 tools/perf/util/session.h   |  16 ++++
 tools/perf/util/tool.h      |   2 +
 5 files changed, 230 insertions(+), 2 deletions(-)

diff --git a/tools/perf/builtin-report.c b/tools/perf/builtin-report.c
index 4958095be4fc..1c45e674743d 100644
--- a/tools/perf/builtin-report.c
+++ b/tools/perf/builtin-report.c
@@ -52,7 +52,10 @@
 #include <sys/types.h>
 #include <sys/stat.h>
 #include <unistd.h>
-#include <linux/mman.h>
+#include <sys/mman.h>
+#ifdef HAVE_ZSTD_SUPPORT
+#include <zstd.h>
+#endif
 
 struct report {
 	struct perf_tool	tool;
@@ -118,6 +121,94 @@ static int report__config(const char *var, const char *value, void *cb)
 	return 0;
 }
 
+#ifdef HAVE_ZSTD_SUPPORT
+static int report__zstd_init(struct perf_session *session)
+{
+	size_t ret;
+
+	session->zstd_dstream = ZSTD_createDStream();
+	if (session->zstd_dstream == NULL)
+	{
+		pr_err("Couldn't create decompression stream, disables trace compression\n");
+		return -1;
+	}
+
+	ret = ZSTD_initDStream(session->zstd_dstream);
+	if (ZSTD_isError(ret))
+	{
+		pr_err("Failed to initialize decompression stream: %s\n", ZSTD_getErrorName(ret));
+		return -1;
+	}
+
+	return 0;
+}
+
+static int report__zstd_fini(struct perf_session *session)
+{
+	struct decomp *next = session->decomp, *decomp;
+	size_t decomp_len = session->header.env.comp_mmap_len;
+
+	if (session->zstd_dstream) {
+		ZSTD_freeDStream(session->zstd_dstream);
+		session->zstd_dstream = NULL;
+	}
+
+	do {
+		decomp = next;
+		if (decomp == NULL)
+			break;
+		next = decomp->next;
+		munmap(decomp, decomp_len + sizeof(struct decomp));
+	} while (1);
+
+	return 0;
+}
+
+static size_t report__zstd_decompress(struct perf_session *session,
+		                      void *src, size_t src_size,
+		                      void *dst, size_t dst_size)
+{
+	size_t ret;
+	ZSTD_inBuffer input = { src, src_size, 0 };
+	ZSTD_outBuffer output = { dst, dst_size, 0 };
+
+	if (session->zstd_dstream == NULL)
+		return 0;
+
+	while (input.pos < input.size) {
+		ret = ZSTD_decompressStream(session->zstd_dstream, &output, &input);
+		if (ZSTD_isError(ret))
+		{
+			pr_err("failed to decompress (B): %ld -> %ld : %s\n",
+				src_size, output.size, ZSTD_getErrorName(ret));
+			break;
+		}
+		output.dst  = dst + output.pos;
+		output.size = dst_size - output.pos;
+	}
+
+	return output.pos;
+}
+
+#else /* !HAVE_ZSTD_SUPPORT */
+static int report__zstd_init(struct perf_session *session __maybe_unused)
+{
+	return -1;
+}
+
+static int report__zstd_fini(struct perf_session *session __maybe_unused)
+{
+	return 0;
+}
+
+static size_t report__zstd_decompress(struct perf_session *session __maybe_unused,
+				      void *src __maybe_unused, size_t src_size __maybe_unused,
+				      void *dst __maybe_unused, size_t dst_size __maybe_unused)
+{
+	return 0;
+}
+#endif
+
 static int hist_iter__report_callback(struct hist_entry_iter *iter,
 				      struct addr_location *al, bool single,
 				      void *arg)
@@ -225,6 +316,57 @@ static int process_feature_event(struct perf_session *session,
 	return 0;
 }
 
+static int process_compressed_event(struct perf_session *session,
+		                    union perf_event *event, u64 file_offset)
+{
+	void *src;
+	size_t decomp_size, src_size;
+	u64 decomp_last_rem = 0;
+	size_t decomp_len = session->header.env.comp_mmap_len;
+	struct decomp *decomp, *decomp_last = session->decomp_last;
+
+	decomp = mmap(NULL, sizeof(struct decomp) + decomp_len, PROT_READ|PROT_WRITE,
+		      MAP_ANONYMOUS|MAP_PRIVATE, -1, 0);
+	if (decomp == MAP_FAILED) {
+		pr_err("Couldn't allocate memory for decompression\n");
+		return -1;
+	}
+
+	decomp->file_pos = file_offset;
+	decomp->head = 0;
+
+	if (decomp_last) {
+		decomp_last_rem = decomp_last->size - decomp_last->head;
+		memcpy(decomp->data, &(decomp_last->data[decomp_last->head]), decomp_last_rem);
+		decomp->size = decomp_last_rem;
+	}
+
+	src = (void*)event + sizeof(struct compressed_event);
+	src_size = event->pack.header.size - sizeof(struct compressed_event);
+
+	decomp_size = report__zstd_decompress(session, src, src_size,
+				&(decomp->data[decomp_last_rem]), decomp_len - decomp_last_rem);
+	if (!decomp_size) {
+		munmap(decomp, sizeof(struct decomp) + decomp_len);
+		pr_err("Couldn't decompress data\n");
+		return -1;
+	}
+
+	decomp->size += decomp_size;
+
+	if (session->decomp == NULL) {
+		session->decomp = decomp;
+		session->decomp_last = decomp;
+	} else {
+		session->decomp_last->next = decomp;
+		session->decomp_last = decomp;
+	}
+
+	pr_debug("decomp (B): %ld to %ld\n", src_size, decomp_size);
+
+	return 0;
+}
+
 static int process_sample_event(struct perf_tool *tool,
 				union perf_event *event,
 				struct perf_sample *sample,
@@ -983,6 +1125,7 @@ int cmd_report(int argc, const char **argv)
 			.auxtrace	 = perf_event__process_auxtrace,
 			.event_update	 = perf_event__process_event_update,
 			.feature	 = process_feature_event,
+			.compressed	 = process_compressed_event,
 			.ordered_events	 = true,
 			.ordering_requires_timestamps = true,
 		},
@@ -1205,6 +1348,10 @@ int cmd_report(int argc, const char **argv)
 
 	report.session = session;
 
+	if (session->header.env.comp_type == PERF_COMP_ZSTD &&
+	    session->header.env.comp_level)
+		report__zstd_init(session);
+
 	has_br_stack = perf_header__has_feat(&session->header,
 					     HEADER_BRANCH_STACK);
 
@@ -1409,6 +1556,8 @@ int cmd_report(int argc, const char **argv)
 error:
 	zfree(&report.ptime_range);
 
+	report__zstd_fini(session);
+
 	perf_session__delete(session);
 	return ret;
 }
diff --git a/tools/perf/util/machine.c b/tools/perf/util/machine.c
index 6fcb3bce0442..66d1ed7e7a80 100644
--- a/tools/perf/util/machine.c
+++ b/tools/perf/util/machine.c
@@ -972,6 +972,10 @@ int machine__map_x86_64_entry_trampolines(struct machine *machine,
 			continue;
 
 		dest_map = map_groups__find(kmaps, map->pgoff);
+		if (!dest_map) {
+			pr_debug("dest_map for %lx is NULL\n", map->pgoff);
+			continue;
+		}
 		if (dest_map != map)
 			map->pgoff = dest_map->map_ip(dest_map, map->pgoff);
 		found = true;
diff --git a/tools/perf/util/session.c b/tools/perf/util/session.c
index 78a067777144..be717ebcdb85 100644
--- a/tools/perf/util/session.c
+++ b/tools/perf/util/session.c
@@ -296,6 +296,13 @@ static int process_event_op2_stub(struct perf_session *session __maybe_unused,
 	return 0;
 }
 
+static int process_event_op4_stub(struct perf_session *session __maybe_unused,
+				  union perf_event *event __maybe_unused,
+				  u64 data __maybe_unused)
+{
+	dump_printf(": unhandled!\n");
+	return 0;
+}
 
 static
 int process_event_thread_map_stub(struct perf_session *session __maybe_unused,
@@ -418,6 +425,8 @@ void perf_tool__fill_defaults(struct perf_tool *tool)
 		tool->time_conv = process_event_op2_stub;
 	if (tool->feature == NULL)
 		tool->feature = process_event_op2_stub;
+	if (tool->compressed == NULL)
+		tool->compressed = process_event_op4_stub;
 }
 
 static void swap_sample_id_all(union perf_event *event, void *data)
@@ -1345,7 +1354,8 @@ static s64 perf_session__process_user_event(struct perf_session *session,
 	int fd = perf_data__fd(session->data);
 	int err;
 
-	dump_event(session->evlist, event, file_offset, &sample);
+	if (event->header.type != PERF_RECORD_COMPRESSED)
+		dump_event(session->evlist, event, file_offset, &sample);
 
 	/* These events are processed right away */
 	switch (event->header.type) {
@@ -1398,6 +1408,11 @@ static s64 perf_session__process_user_event(struct perf_session *session,
 		return tool->time_conv(session, event);
 	case PERF_RECORD_HEADER_FEATURE:
 		return tool->feature(session, event);
+	case PERF_RECORD_COMPRESSED:
+		err = tool->compressed(session, event, file_offset);
+		if (err)
+			dump_event(session->evlist, event, file_offset, &sample);
+		return 0;
 	default:
 		return -EINVAL;
 	}
@@ -1673,6 +1688,8 @@ static int perf_session__flush_thread_stacks(struct perf_session *session)
 
 volatile int session_done;
 
+static int __perf_session__process_decomp_events(struct perf_session *session);
+
 static int __perf_session__process_pipe_events(struct perf_session *session)
 {
 	struct ordered_events *oe = &session->ordered_events;
@@ -1753,6 +1770,10 @@ static int __perf_session__process_pipe_events(struct perf_session *session)
 	if (skip > 0)
 		head += skip;
 
+	err = __perf_session__process_decomp_events(session);
+	if (err)
+		goto out_err;
+
 	if (!session_done())
 		goto more;
 done:
@@ -1801,6 +1822,38 @@ fetch_mmaped_event(struct perf_session *session,
 	return event;
 }
 
+static int __perf_session__process_decomp_events(struct perf_session *session)
+{
+	s64 skip;
+	u64 size, file_pos = 0;
+	union perf_event *event;
+	struct decomp *decomp = session->decomp_last;
+
+	if (!decomp)
+		return 0;
+
+	while (decomp->head < decomp->size && !session_done()) {
+		event = fetch_mmaped_event(session, decomp->head, decomp->size, decomp->data);
+		if (!event)
+			break;
+
+		size = event->header.size;
+		if (size < sizeof(struct perf_event_header) ||
+		    (skip = perf_session__process_event(session, event, file_pos)) < 0) {
+			pr_err("%#" PRIx64 " [%#x]: failed to process type: %d\n",
+				decomp->file_pos + decomp->head, event->header.size, event->header.type);
+			return -EINVAL;
+		}
+
+		if (skip)
+			size += skip;
+
+		decomp->head += size;
+	}
+
+	return 0;
+}
+
 /*
  * On 64bit we can mmap the data file in one go. No need for tiny mmap
  * slices. On 32bit we use 32MB.
@@ -1904,6 +1957,10 @@ static int __perf_session__process_events(struct perf_session *session,
 	head += size;
 	file_pos += size;
 
+	err = __perf_session__process_decomp_events(session);
+	if (err)
+		goto out_err;
+
 	ui_progress__update(&prog, size);
 
 	if (session_done())
diff --git a/tools/perf/util/session.h b/tools/perf/util/session.h
index d96eccd7d27f..8ecda50efc6b 100644
--- a/tools/perf/util/session.h
+++ b/tools/perf/util/session.h
@@ -11,6 +11,9 @@
 #include <linux/kernel.h>
 #include <linux/rbtree.h>
 #include <linux/perf_event.h>
+#ifdef HAVE_ZSTD_SUPPORT
+#include <zstd.h>
+#endif
 
 struct ip_callchain;
 struct symbol;
@@ -35,6 +38,19 @@ struct perf_session {
 	struct ordered_events	ordered_events;
 	struct perf_data	*data;
 	struct perf_tool	*tool;
+	struct decomp		*decomp;
+	struct decomp		*decomp_last;
+#ifdef HAVE_ZSTD_SUPPORT
+	ZSTD_DStream		*zstd_dstream;
+#endif
+};
+
+struct decomp {
+	struct decomp *next;
+	u64 file_pos;
+	u64 head;
+	size_t size;
+	char data[];
 };
 
 struct perf_tool;
diff --git a/tools/perf/util/tool.h b/tools/perf/util/tool.h
index 56e4ca54020a..65ec84dfc5eb 100644
--- a/tools/perf/util/tool.h
+++ b/tools/perf/util/tool.h
@@ -28,6 +28,7 @@ typedef int (*event_attr_op)(struct perf_tool *tool,
 
 typedef int (*event_op2)(struct perf_session *session, union perf_event *event);
 typedef s64 (*event_op3)(struct perf_session *session, union perf_event *event);
+typedef int (*event_op4)(struct perf_session *session, union perf_event *event, u64 data);
 
 typedef int (*event_oe)(struct perf_tool *tool, union perf_event *event,
 			struct ordered_events *oe);
@@ -69,6 +70,7 @@ struct perf_tool {
 			stat,
 			stat_round,
 			feature;
+	event_op4	compressed;
 	event_op3	auxtrace;
 	bool		ordered_events;
 	bool		ordering_requires_timestamps;

^ permalink raw reply related	[flat|nested] 11+ messages in thread

* Re: [PATCH v1 2/4] perf record: introduce z, mmap-flush options and PERF_RECORD_COMPRESSED record
  2018-12-24 13:45 ` [PATCH v1 2/4] perf record: introduce z, mmap-flush options and PERF_RECORD_COMPRESSED record Alexey Budankov
@ 2019-01-09 16:58   ` Jiri Olsa
  2019-01-14  8:46     ` Alexey Budankov
  0 siblings, 1 reply; 11+ messages in thread
From: Jiri Olsa @ 2019-01-09 16:58 UTC (permalink / raw)
  To: Alexey Budankov
  Cc: Arnaldo Carvalho de Melo, Ingo Molnar, Peter Zijlstra,
	Namhyung Kim, Alexander Shishkin, Andi Kleen, linux-kernel

On Mon, Dec 24, 2018 at 04:45:21PM +0300, Alexey Budankov wrote:
> 
> Introduce --compression_level=n, --mmap-flush options and PERF_RECORD_COMPRESSED 
> event record that contains compressed parts of mmap kernel buffer data.
> 
> Signed-off-by: Alexey Budankov <alexey.budankov@linux.intel.com>
> ---
>  tools/perf/Documentation/perf-record.txt | 11 +++
>  tools/perf/builtin-record.c              | 97 ++++++++++++++++++++----
>  tools/perf/perf.h                        |  2 +
>  tools/perf/util/env.h                    | 10 +++
>  tools/perf/util/event.c                  |  1 +
>  tools/perf/util/event.h                  |  7 ++
>  tools/perf/util/evlist.c                 |  6 +-
>  tools/perf/util/evlist.h                 |  2 +-
>  tools/perf/util/header.c                 | 47 +++++++++++-
>  tools/perf/util/header.h                 |  1 +
>  tools/perf/util/mmap.c                   |  4 +-
>  tools/perf/util/mmap.h                   |  3 +-
>  12 files changed, 169 insertions(+), 22 deletions(-)

also I'm getting here similar error (like for the affinity patchset)

[jolsa@krava perf]$ git am /tmp/comp
Applying: feature: build libzstd feature check, LIBZSTD_DIR and NO_LIBZSTD defines
Applying: perf record: introduce z, mmap-flush options and PERF_RECORD_COMPRESSED record
error: corrupt patch at line 526
Patch failed at 0002 perf record: introduce z, mmap-flush options and PERF_RECORD_COMPRESSED record

again the raw patch applies correctly:

[jolsa@krava perf]$ patch -p3 < /tmp/comp2
patching file Documentation/perf-record.txt
patching file builtin-record.c
patching file perf.h
patching file util/env.h
patching file util/event.c
patching file util/event.h
patching file util/evlist.c
patching file util/evlist.h
patching file util/header.c
patching file util/header.h
patching file util/mmap.c
patching file util/mmap.h


jirka

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: [PATCH v1 0/4] perf: enable compression of record mode trace to save storage space
  2018-12-24 13:21 [PATCH v1 0/4] perf: enable compression of record mode trace to save storage space Alexey Budankov
                   ` (3 preceding siblings ...)
  2018-12-24 14:00 ` [PATCH v1 4/4] perf report: support record trace file decompression Alexey Budankov
@ 2019-01-09 17:28 ` Jiri Olsa
  2019-01-14  8:43   ` Alexey Budankov
  4 siblings, 1 reply; 11+ messages in thread
From: Jiri Olsa @ 2019-01-09 17:28 UTC (permalink / raw)
  To: Alexey Budankov
  Cc: Arnaldo Carvalho de Melo, Ingo Molnar, Peter Zijlstra,
	Namhyung Kim, Alexander Shishkin, Andi Kleen, linux-kernel

On Mon, Dec 24, 2018 at 04:21:33PM +0300, Alexey Budankov wrote:
> 
> The patch set implements runtime record trace compression accompanied by 
> trace file decompression implemented in the tool report mode. Zstandard 
> library API [1] is used for compression/decompression of data that come 
> from perf_events kernel data buffers.
> 
> Realized -z,--compression_level=n option provides ~3-5x avg. trace file 
> size reduction on the tested workloads what significantly saves user's 
> storage space on larger server systems where trace file size can easily 
> reach several tens or even hundreds of GiBs, especially when profiling 
> with stacks for later dwarf unwinding, context-switches tracing and etc.
> 
> The option is effective jointly with asynchronous trace writing because 
> compression requires auxiliary memory buffers to operate on and memory 
> buffers for asynchronous trace writing serve that purpose.

I dont like that it's onlt for aio only, I can't really see why it's
a problem for normal data.. can't we just have one layer before and
stream the data to the compress function instead of the file (or aio
buffers).. and that compress functions would spit out 64K size COMPRESSED
events, which would go to file (or aio buffers)

the report side would process them (decompress) on the session layer
before the tool callbacks are called

jirka

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: [PATCH v1 0/4] perf: enable compression of record mode trace to save storage space
  2019-01-09 17:28 ` [PATCH v1 0/4] perf: enable compression of record mode trace to save storage space Jiri Olsa
@ 2019-01-14  8:43   ` Alexey Budankov
  2019-01-14 11:03     ` Jiri Olsa
  0 siblings, 1 reply; 11+ messages in thread
From: Alexey Budankov @ 2019-01-14  8:43 UTC (permalink / raw)
  To: Jiri Olsa
  Cc: Arnaldo Carvalho de Melo, Ingo Molnar, Peter Zijlstra,
	Namhyung Kim, Alexander Shishkin, Andi Kleen, linux-kernel

Hi,
On 09.01.2019 20:28, Jiri Olsa wrote:
> On Mon, Dec 24, 2018 at 04:21:33PM +0300, Alexey Budankov wrote:
>>
>> The patch set implements runtime record trace compression accompanied by 
>> trace file decompression implemented in the tool report mode. Zstandard 
>> library API [1] is used for compression/decompression of data that come 
>> from perf_events kernel data buffers.
>>
>> Realized -z,--compression_level=n option provides ~3-5x avg. trace file 
>> size reduction on the tested workloads what significantly saves user's 
>> storage space on larger server systems where trace file size can easily 
>> reach several tens or even hundreds of GiBs, especially when profiling 
>> with stacks for later dwarf unwinding, context-switches tracing and etc.
>>
>> The option is effective jointly with asynchronous trace writing because 
>> compression requires auxiliary memory buffers to operate on and memory 
>> buffers for asynchronous trace writing serve that purpose.
> 
> I dont like that it's onlt for aio only, I can't really see why it's

For serial streaming, on CPU bound codes, under full system utilization it 
can induce more runtime overhead and increase data loss because amount of 
code on performance critical path grows, of course size of written data 
reduces but still. Feeding kernel buffer content by user space code to a 
syscall is extended with intermediate copying to user space memory with 
doing some math on it in the middle.

> a problem for normal data.. can't we just have one layer before and
> stream the data to the compress function instead of the file (or aio
> buffers).. and that compress functions would spit out 64K size COMPRESSED
> events, which would go to file (or aio buffers)

It is already almost like that. Compression could be bridged using AIO 
buffers but then still streamed to file serially using record__pushfn() 
and that would make some sense for moderate profiling cases on systems 
without AIO support and trace streaming based on it.

> 
> the report side would process them (decompress) on the session layer
> before the tool callbacks are called

It is already pretty similar to that.

Thanks,
Alexey

> 
> jirka
> 

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: [PATCH v1 2/4] perf record: introduce z, mmap-flush options and PERF_RECORD_COMPRESSED record
  2019-01-09 16:58   ` Jiri Olsa
@ 2019-01-14  8:46     ` Alexey Budankov
  0 siblings, 0 replies; 11+ messages in thread
From: Alexey Budankov @ 2019-01-14  8:46 UTC (permalink / raw)
  To: Jiri Olsa
  Cc: Arnaldo Carvalho de Melo, Ingo Molnar, Peter Zijlstra,
	Namhyung Kim, Alexander Shishkin, Andi Kleen, linux-kernel

Hi,
On 09.01.2019 19:58, Jiri Olsa wrote:
> On Mon, Dec 24, 2018 at 04:45:21PM +0300, Alexey Budankov wrote:
>>
>> Introduce --compression_level=n, --mmap-flush options and PERF_RECORD_COMPRESSED 
>> event record that contains compressed parts of mmap kernel buffer data.
>>
>> Signed-off-by: Alexey Budankov <alexey.budankov@linux.intel.com>
>> ---
>>  tools/perf/Documentation/perf-record.txt | 11 +++
>>  tools/perf/builtin-record.c              | 97 ++++++++++++++++++++----
>>  tools/perf/perf.h                        |  2 +
>>  tools/perf/util/env.h                    | 10 +++
>>  tools/perf/util/event.c                  |  1 +
>>  tools/perf/util/event.h                  |  7 ++
>>  tools/perf/util/evlist.c                 |  6 +-
>>  tools/perf/util/evlist.h                 |  2 +-
>>  tools/perf/util/header.c                 | 47 +++++++++++-
>>  tools/perf/util/header.h                 |  1 +
>>  tools/perf/util/mmap.c                   |  4 +-
>>  tools/perf/util/mmap.h                   |  3 +-
>>  12 files changed, 169 insertions(+), 22 deletions(-)
> 
> also I'm getting here similar error (like for the affinity patchset)

Really weird. Hopefully it can be resolved in the next version of the patch set.

Thanks,
Alexey

> 
> [jolsa@krava perf]$ git am /tmp/comp
> Applying: feature: build libzstd feature check, LIBZSTD_DIR and NO_LIBZSTD defines
> Applying: perf record: introduce z, mmap-flush options and PERF_RECORD_COMPRESSED record
> error: corrupt patch at line 526
> Patch failed at 0002 perf record: introduce z, mmap-flush options and PERF_RECORD_COMPRESSED record
> 
> again the raw patch applies correctly:
> 
> [jolsa@krava perf]$ patch -p3 < /tmp/comp2
> patching file Documentation/perf-record.txt
> patching file builtin-record.c
> patching file perf.h
> patching file util/env.h
> patching file util/event.c
> patching file util/event.h
> patching file util/evlist.c
> patching file util/evlist.h
> patching file util/header.c
> patching file util/header.h
> patching file util/mmap.c
> patching file util/mmap.h
> 
> 
> jirka
> 

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: [PATCH v1 0/4] perf: enable compression of record mode trace to save storage space
  2019-01-14  8:43   ` Alexey Budankov
@ 2019-01-14 11:03     ` Jiri Olsa
  2019-01-14 11:26       ` Alexey Budankov
  0 siblings, 1 reply; 11+ messages in thread
From: Jiri Olsa @ 2019-01-14 11:03 UTC (permalink / raw)
  To: Alexey Budankov
  Cc: Arnaldo Carvalho de Melo, Ingo Molnar, Peter Zijlstra,
	Namhyung Kim, Alexander Shishkin, Andi Kleen, linux-kernel

On Mon, Jan 14, 2019 at 11:43:31AM +0300, Alexey Budankov wrote:
> Hi,
> On 09.01.2019 20:28, Jiri Olsa wrote:
> > On Mon, Dec 24, 2018 at 04:21:33PM +0300, Alexey Budankov wrote:
> >>
> >> The patch set implements runtime record trace compression accompanied by 
> >> trace file decompression implemented in the tool report mode. Zstandard 
> >> library API [1] is used for compression/decompression of data that come 
> >> from perf_events kernel data buffers.
> >>
> >> Realized -z,--compression_level=n option provides ~3-5x avg. trace file 
> >> size reduction on the tested workloads what significantly saves user's 
> >> storage space on larger server systems where trace file size can easily 
> >> reach several tens or even hundreds of GiBs, especially when profiling 
> >> with stacks for later dwarf unwinding, context-switches tracing and etc.
> >>
> >> The option is effective jointly with asynchronous trace writing because 
> >> compression requires auxiliary memory buffers to operate on and memory 
> >> buffers for asynchronous trace writing serve that purpose.
> > 
> > I dont like that it's onlt for aio only, I can't really see why it's
> 
> For serial streaming, on CPU bound codes, under full system utilization it 
> can induce more runtime overhead and increase data loss because amount of 
> code on performance critical path grows, of course size of written data 
> reduces but still. Feeding kernel buffer content by user space code to a 
> syscall is extended with intermediate copying to user space memory with 
> doing some math on it in the middle.
> 
> > a problem for normal data.. can't we just have one layer before and
> > stream the data to the compress function instead of the file (or aio
> > buffers).. and that compress functions would spit out 64K size COMPRESSED
> > events, which would go to file (or aio buffers)
> 
> It is already almost like that. Compression could be bridged using AIO 
> buffers but then still streamed to file serially using record__pushfn() 
> and that would make some sense for moderate profiling cases on systems 
> without AIO support and trace streaming based on it.
> 
> > 
> > the report side would process them (decompress) on the session layer
> > before the tool callbacks are called
> 
> It is already pretty similar to that.

hum, AFAICS you do that in report code not in on the session layer

jirka

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: [PATCH v1 0/4] perf: enable compression of record mode trace to save storage space
  2019-01-14 11:03     ` Jiri Olsa
@ 2019-01-14 11:26       ` Alexey Budankov
  0 siblings, 0 replies; 11+ messages in thread
From: Alexey Budankov @ 2019-01-14 11:26 UTC (permalink / raw)
  To: Jiri Olsa
  Cc: Arnaldo Carvalho de Melo, Ingo Molnar, Peter Zijlstra,
	Namhyung Kim, Alexander Shishkin, Andi Kleen, linux-kernel

On 14.01.2019 14:03, Jiri Olsa wrote:
> On Mon, Jan 14, 2019 at 11:43:31AM +0300, Alexey Budankov wrote:
>> Hi,
>> On 09.01.2019 20:28, Jiri Olsa wrote:
>>> On Mon, Dec 24, 2018 at 04:21:33PM +0300, Alexey Budankov wrote:
>>>>
>>>> buffers for asynchronous trace writing serve that purpose.
<SNIP>
>>>
>>> I dont like that it's onlt for aio only, I can't really see why it's
>>
>> For serial streaming, on CPU bound codes, under full system utilization it 
>> can induce more runtime overhead and increase data loss because amount of 
>> code on performance critical path grows, of course size of written data 
>> reduces but still. Feeding kernel buffer content by user space code to a 
>> syscall is extended with intermediate copying to user space memory with 
>> doing some math on it in the middle.
>>
>>> a problem for normal data.. can't we just have one layer before and
>>> stream the data to the compress function instead of the file (or aio
>>> buffers).. and that compress functions would spit out 64K size COMPRESSED
>>> events, which would go to file (or aio buffers)
>>
>> It is already almost like that. Compression could be bridged using AIO 
>> buffers but then still streamed to file serially using record__pushfn() 
>> and that would make some sense for moderate profiling cases on systems 
>> without AIO support and trace streaming based on it.
>>
>>>
>>> the report side would process them (decompress) on the session layer
>>> before the tool callbacks are called
>>
>> It is already pretty similar to that.
> 
> hum, AFAICS you do that in report code not in on the session layer

Correct. Decompressor and handling of compressed data chunks could be 
moved to session related code.

Thanks,
Alexey

> 
> jirka
> 

^ permalink raw reply	[flat|nested] 11+ messages in thread

end of thread, other threads:[~2019-01-14 11:26 UTC | newest]

Thread overview: 11+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2018-12-24 13:21 [PATCH v1 0/4] perf: enable compression of record mode trace to save storage space Alexey Budankov
2018-12-24 13:35 ` [PATCH v1 1/4] feature: build libzstd feature check, LIBZSTD_DIR and NO_LIBZSTD defines Alexey Budankov
2018-12-24 13:45 ` [PATCH v1 2/4] perf record: introduce z, mmap-flush options and PERF_RECORD_COMPRESSED record Alexey Budankov
2019-01-09 16:58   ` Jiri Olsa
2019-01-14  8:46     ` Alexey Budankov
2018-12-24 13:46 ` [PATCH v1 3/4] perf record: enable runtime trace compression Alexey Budankov
2018-12-24 14:00 ` [PATCH v1 4/4] perf report: support record trace file decompression Alexey Budankov
2019-01-09 17:28 ` [PATCH v1 0/4] perf: enable compression of record mode trace to save storage space Jiri Olsa
2019-01-14  8:43   ` Alexey Budankov
2019-01-14 11:03     ` Jiri Olsa
2019-01-14 11:26       ` Alexey Budankov

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.