linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [PATCH v2 0/4] perf: enable compression of record mode trace to save storage space
@ 2019-02-11 20:17 Alexey Budankov
  2019-02-11 20:21 ` [PATCH v2 1/4] feature: realize libzstd check, LIBZSTD_DIR and NO_LIBZSTD defines Alexey Budankov
                   ` (4 more replies)
  0 siblings, 5 replies; 67+ messages in thread
From: Alexey Budankov @ 2019-02-11 20:17 UTC (permalink / raw)
  To: Arnaldo Carvalho de Melo, Ingo Molnar, Peter Zijlstra
  Cc: Jiri Olsa, Namhyung Kim, Alexander Shishkin, Andi Kleen, linux-kernel


This is the rebase to the tip of Arnaldo's perf/core repository.

The patch set implements runtime trace compression for record mode and 
trace file decompression for report mode. Zstandard API [1] is used for 
compression/decompression of data that come from perf_events kernel 
data buffers.

Realized -z,--compression_level=n option provides ~3-5x avg. trace file 
size reduction on variety of tested workloads what saves user storage 
space on larger server systems where trace file size can easily reach 
several tens or even hundreds of GiBs, especially when profiling with 
dwarf-based stacks and tracing of  context-switches.

--mmap-flush option can be used to avoid compressing every single byte 
of data and increase compression ratio at the same time lowering tool 
runtime overhead.

  $ tools/perf/perf record -z 1 -e cycles -- matrix.gcc
  $ tools/perf/perf record -z 1 --mmap-flush 1024 -e cycles -- matrix.gcc
  $ tools/perf/perf record -z 1 --mmap-flush 1024 --aio -e cycles -- matrix.gcc

The compression functionality can be disabled from the command line 
using NO_LIBZSTD define and Zstandard sources can be overridden using 
value of LIBZSTD_DIR define:

  $ make -C tools/perf NO_LIBZSTD=1 clean all
  $ make -C tools/perf LIBZSTD_DIR=/path/to/zstd-1.3.7 clean all

---
Alexey Budankov (4):
  feature: realize libzstd check, LIBZSTD_DIR and NO_LIBZSTD defines
  perf record: implement -z=<level> and --mmap-flush=<thres> options
  perf record: enable runtime trace compression
  perf report: support record trace file decompression

 tools/build/Makefile.feature             |   6 +-
 tools/build/feature/Makefile             |   6 +-
 tools/build/feature/test-all.c           |   5 +
 tools/build/feature/test-libzstd.c       |  12 +
 tools/perf/Documentation/perf-record.txt |   9 +
 tools/perf/Makefile.config               |  20 ++
 tools/perf/Makefile.perf                 |   3 +
 tools/perf/builtin-record.c              | 167 +++++++++++---
 tools/perf/builtin-report.c              |   5 +-
 tools/perf/perf.h                        |   2 +
 tools/perf/util/env.h                    |  10 +
 tools/perf/util/event.c                  |   1 +
 tools/perf/util/event.h                  |   7 +
 tools/perf/util/evlist.c                 |   6 +-
 tools/perf/util/evlist.h                 |   3 +-
 tools/perf/util/header.c                 |  45 +++-
 tools/perf/util/header.h                 |   1 +
 tools/perf/util/mmap.c                   | 265 +++++++++++++---------
 tools/perf/util/mmap.h                   |  31 ++-
 tools/perf/util/session.c                | 271 ++++++++++++++++++++++-
 tools/perf/util/session.h                |  26 +++
 tools/perf/util/tool.h                   |   2 +
 22 files changed, 742 insertions(+), 161 deletions(-)
 create mode 100644 tools/build/feature/test-libzstd.c

---
Changes in v2:
- moved compression/decompression code to session layer
- enabled allocation aio data buffers for compression
- enabled trace compression for serial trace streaming

---
[1] https://github.com/facebook/zstd


^ permalink raw reply	[flat|nested] 67+ messages in thread

* [PATCH v2 1/4] feature: realize libzstd check, LIBZSTD_DIR and NO_LIBZSTD defines
  2019-02-11 20:17 [PATCH v2 0/4] perf: enable compression of record mode trace to save storage space Alexey Budankov
@ 2019-02-11 20:21 ` Alexey Budankov
  2019-02-11 20:22 ` [PATCH v2 2/4] perf record: implement -z=<level> and --mmap-flush=<thres> options Alexey Budankov
                   ` (3 subsequent siblings)
  4 siblings, 0 replies; 67+ messages in thread
From: Alexey Budankov @ 2019-02-11 20:21 UTC (permalink / raw)
  To: Arnaldo Carvalho de Melo, Ingo Molnar, Peter Zijlstra
  Cc: Jiri Olsa, Namhyung Kim, Alexander Shishkin, Andi Kleen, linux-kernel


Implement libzstd feature check, NO_LIBZSTD and LIBZSTD_DIR defines
to override Zstd library sources or disable the feature from the
command line:

  $ make -C tools/perf LIBZSTD_DIR=/root/abudanko/zstd-1.3.7 clean all
  $ make -C tools/perf NO_LIBZSTD=1 clean all

Signed-off-by: Alexey Budankov <alexey.budankov@linux.intel.com>
---
 tools/build/Makefile.feature       |  6 ++++--
 tools/build/feature/Makefile       |  6 +++++-
 tools/build/feature/test-all.c     |  5 +++++
 tools/build/feature/test-libzstd.c | 12 ++++++++++++
 tools/perf/Makefile.config         | 20 ++++++++++++++++++++
 tools/perf/Makefile.perf           |  3 +++
 6 files changed, 49 insertions(+), 3 deletions(-)
 create mode 100644 tools/build/feature/test-libzstd.c

diff --git a/tools/build/Makefile.feature b/tools/build/Makefile.feature
index 5467c6bf9ceb..25088c8f05b2 100644
--- a/tools/build/Makefile.feature
+++ b/tools/build/Makefile.feature
@@ -71,7 +71,8 @@ FEATURE_TESTS_BASIC :=                  \
         sdt				\
         setns				\
         libopencsd			\
-        libaio
+        libaio				\
+        libzstd
 
 # FEATURE_TESTS_BASIC + FEATURE_TESTS_EXTRA is the complete list
 # of all feature tests
@@ -118,7 +119,8 @@ FEATURE_DISPLAY ?=              \
          lzma                   \
          get_cpuid              \
          bpf			\
-         libaio
+         libaio			\
+         libzstd
 
 # Set FEATURE_CHECK_(C|LD)FLAGS-all for all FEATURE_TESTS features.
 # If in the future we need per-feature checks/flags for features not
diff --git a/tools/build/feature/Makefile b/tools/build/feature/Makefile
index 7ceb4441b627..4b8244ee65ce 100644
--- a/tools/build/feature/Makefile
+++ b/tools/build/feature/Makefile
@@ -62,7 +62,8 @@ FILES=                                          \
          test-clang.bin				\
          test-llvm.bin				\
          test-llvm-version.bin			\
-         test-libaio.bin
+         test-libaio.bin			\
+         test-libzstd.bin
 
 FILES := $(addprefix $(OUTPUT),$(FILES))
 
@@ -301,6 +302,9 @@ $(OUTPUT)test-clang.bin:
 $(OUTPUT)test-libaio.bin:
 	$(BUILD) -lrt
 
+$(OUTPUT)test-libzstd.bin:
+	$(BUILD) -lzstd
+
 ###############################
 
 clean:
diff --git a/tools/build/feature/test-all.c b/tools/build/feature/test-all.c
index 20cdaa4fc112..5af329b6ffef 100644
--- a/tools/build/feature/test-all.c
+++ b/tools/build/feature/test-all.c
@@ -178,6 +178,10 @@
 # include "test-libaio.c"
 #undef main
 
+#define main main_test_zstd
+# include "test-libzstd.c"
+#undef main
+
 int main(int argc, char *argv[])
 {
 	main_test_libpython();
@@ -219,6 +223,7 @@ int main(int argc, char *argv[])
 	main_test_setns();
 	main_test_libopencsd();
 	main_test_libaio();
+	main_test_libzstd();
 
 	return 0;
 }
diff --git a/tools/build/feature/test-libzstd.c b/tools/build/feature/test-libzstd.c
new file mode 100644
index 000000000000..55268c01b84d
--- /dev/null
+++ b/tools/build/feature/test-libzstd.c
@@ -0,0 +1,12 @@
+// SPDX-License-Identifier: GPL-2.0
+#include <zstd.h>
+
+int main(void)
+{
+	ZSTD_CStream	*cstream;
+
+	cstream = ZSTD_createCStream();
+	ZSTD_freeCStream(cstream);
+
+	return 0;
+}
diff --git a/tools/perf/Makefile.config b/tools/perf/Makefile.config
index b441c88cafa1..1dccd776a4aa 100644
--- a/tools/perf/Makefile.config
+++ b/tools/perf/Makefile.config
@@ -145,6 +145,13 @@ endif
 FEATURE_CHECK_CFLAGS-libbabeltrace := $(LIBBABELTRACE_CFLAGS)
 FEATURE_CHECK_LDFLAGS-libbabeltrace := $(LIBBABELTRACE_LDFLAGS) -lbabeltrace-ctf
 
+ifdef LIBZSTD_DIR
+  LIBZSTD_CFLAGS  := -I$(LIBZSTD_DIR)/lib
+  LIBZSTD_LDFLAGS := -L$(LIBZSTD_DIR)/lib
+endif
+FEATURE_CHECK_CFLAGS-libzstd := $(LIBZSTD_CFLAGS)
+FEATURE_CHECK_LDFLAGS-libzstd := $(LIBZSTD_LDFLAGS)
+
 FEATURE_CHECK_CFLAGS-bpf = -I. -I$(srctree)/tools/include -I$(srctree)/tools/arch/$(SRCARCH)/include/uapi -I$(srctree)/tools/include/uapi
 # include ARCH specific config
 -include $(src-perf)/arch/$(SRCARCH)/Makefile
@@ -770,6 +777,19 @@ ifndef NO_LZMA
   endif
 endif
 
+ifndef NO_LIBZSTD
+  ifeq ($(feature-libzstd), 1)
+    CFLAGS += -DHAVE_ZSTD_SUPPORT
+    CFLAGS += $(LIBZSTD_CFLAGS)
+    LDFLAGS += $(LIBZSTD_LDFLAGS)
+    EXTLIBS += -lzstd
+    $(call detected,CONFIG_ZSTD)
+  else
+    msg := $(warning No libzstd found, disables trace compression, please install libzstd-dev[el] and/or set LIBZSTD_DIR);
+    NO_LIBZSTD := 1
+  endif
+endif
+
 ifndef NO_BACKTRACE
   ifeq ($(feature-backtrace), 1)
     CFLAGS += -DHAVE_BACKTRACE_SUPPORT
diff --git a/tools/perf/Makefile.perf b/tools/perf/Makefile.perf
index 09df1c8a4ec9..6e95e90a7b7b 100644
--- a/tools/perf/Makefile.perf
+++ b/tools/perf/Makefile.perf
@@ -108,6 +108,9 @@ include ../scripts/utilities.mak
 # streaming for record mode. Currently Posix AIO trace streaming is
 # supported only when linking with glibc.
 #
+# Define NO_LIBZSTD if you do not want support of Zstandard based runtime
+# trace compression in record mode.
+#
 
 # As per kernel Makefile, avoid funny character set dependencies
 unexport LC_ALL

^ permalink raw reply related	[flat|nested] 67+ messages in thread

* [PATCH v2 2/4] perf record: implement -z=<level> and --mmap-flush=<thres> options
  2019-02-11 20:17 [PATCH v2 0/4] perf: enable compression of record mode trace to save storage space Alexey Budankov
  2019-02-11 20:21 ` [PATCH v2 1/4] feature: realize libzstd check, LIBZSTD_DIR and NO_LIBZSTD defines Alexey Budankov
@ 2019-02-11 20:22 ` Alexey Budankov
  2019-02-12 13:08   ` Jiri Olsa
                     ` (7 more replies)
  2019-02-11 20:23 ` [PATCH v2 3/4] perf record: enable runtime trace compression Alexey Budankov
                   ` (2 subsequent siblings)
  4 siblings, 8 replies; 67+ messages in thread
From: Alexey Budankov @ 2019-02-11 20:22 UTC (permalink / raw)
  To: Arnaldo Carvalho de Melo, Ingo Molnar, Peter Zijlstra
  Cc: Jiri Olsa, Namhyung Kim, Alexander Shishkin, Andi Kleen, linux-kernel


Implement -z,--compression_level=<n> and --mmap-flush=<dump_least_size>
options as well as a special PERF_RECORD_COMPRESSED record that contains
compressed parts of kernel data buffer.

Because compression requires auxilary memory to implement encoding of
kernel data record->opts.nr_cblocks == -1 signifies to preallocate single
AIO data buffer aio.data[0] without accompnying AIO control blocks.

Signed-off-by: Alexey Budankov <alexey.budankov@linux.intel.com>
---
Changes in v2:
- enabled allocation aio data buffers for compression

---
 tools/perf/Documentation/perf-record.txt |   9 ++
 tools/perf/builtin-record.c              | 110 +++++++++++++++++++----
 tools/perf/perf.h                        |   2 +
 tools/perf/util/env.h                    |  10 +++
 tools/perf/util/event.c                  |   1 +
 tools/perf/util/event.h                  |   7 ++
 tools/perf/util/evlist.c                 |   6 +-
 tools/perf/util/evlist.h                 |   3 +-
 tools/perf/util/header.c                 |  45 +++++++++-
 tools/perf/util/header.h                 |   1 +
 tools/perf/util/mmap.c                   |  98 ++++++++++++--------
 tools/perf/util/mmap.h                   |   7 +-
 tools/perf/util/session.h                |   2 +
 13 files changed, 240 insertions(+), 61 deletions(-)

diff --git a/tools/perf/Documentation/perf-record.txt b/tools/perf/Documentation/perf-record.txt
index 8f0c2be34848..3682efdf3edd 100644
--- a/tools/perf/Documentation/perf-record.txt
+++ b/tools/perf/Documentation/perf-record.txt
@@ -459,6 +459,15 @@ Set affinity mask of trace reading thread according to the policy defined by 'mo
   node - thread affinity mask is set to NUMA node cpu mask of the processed mmap buffer
   cpu  - thread affinity mask is set to cpu of the processed mmap buffer
 
+-z::
+--compression-level=n::
+Produce compressed trace file using specified level n to save storage space (no compression: 0 - default,
+fastest compression: 1, smallest trace file: 22)
+
+--mmap-flush=n::
+Minimal number of bytes accumulated in kernel buffer that is flushed to trace file (default: 1).
+Maximal allowed value is a quater of kernel buffer size.
+
 --all-kernel::
 Configure all used events to run in kernel space.
 
diff --git a/tools/perf/builtin-record.c b/tools/perf/builtin-record.c
index 6c3719ac901d..227dbbd47d3f 100644
--- a/tools/perf/builtin-record.c
+++ b/tools/perf/builtin-record.c
@@ -292,18 +292,20 @@ static int record__aio_parse(const struct option *opt,
 
 	if (unset) {
 		opts->nr_cblocks = 0;
-	} else {
-		if (str)
-			opts->nr_cblocks = strtol(str, NULL, 0);
-		if (!opts->nr_cblocks)
-			opts->nr_cblocks = nr_cblocks_default;
+		return 0;
 	}
 
+	if (str)
+		opts->nr_cblocks = strtol(str, NULL, 0);
+	if (!opts->nr_cblocks)
+		opts->nr_cblocks = nr_cblocks_default;
+
+	if (opts->nr_cblocks > nr_cblocks_max)
+		opts->nr_cblocks = nr_cblocks_max;
+
 	return 0;
 }
 #else /* HAVE_AIO_SUPPORT */
-static int nr_cblocks_max = 0;
-
 static int record__aio_sync(struct perf_mmap *md __maybe_unused, bool sync_all __maybe_unused)
 {
 	return -1;
@@ -334,6 +336,35 @@ static int record__aio_enabled(struct record *rec)
 	return rec->opts.nr_cblocks > 0;
 }
 
+#define MMAP_FLUSH_DEFAULT 1
+
+static int record__comp_enabled(struct record *rec)
+{
+	return rec->opts.comp_level > 0;
+}
+
+static int record__mmap_flush_parse(const struct option *opt,
+				    const char *str,
+				    int unset)
+{
+	int mmap_len;
+	struct record_opts *opts = (struct record_opts *)opt->value;
+
+	if (unset)
+		return 0;
+
+	if (str)
+		opts->mmap_flush = strtol(str, NULL, 0);
+	if (!opts->mmap_flush)
+		opts->mmap_flush = MMAP_FLUSH_DEFAULT;
+
+	mmap_len = perf_evlist__mmap_size(opts->mmap_pages);
+	if (opts->mmap_flush > mmap_len / 4)
+		opts->mmap_flush = mmap_len / 4;
+
+	return 0;
+}
+
 static int process_synthesized_event(struct perf_tool *tool,
 				     union perf_event *event,
 				     struct perf_sample *sample __maybe_unused,
@@ -543,7 +574,8 @@ static int record__mmap_evlist(struct record *rec,
 	if (perf_evlist__mmap_ex(evlist, opts->mmap_pages,
 				 opts->auxtrace_mmap_pages,
 				 opts->auxtrace_snapshot_mode,
-				 opts->nr_cblocks, opts->affinity) < 0) {
+				 opts->nr_cblocks, opts->affinity,
+				 opts->mmap_flush) < 0) {
 		if (errno == EPERM) {
 			pr_err("Permission error mapping pages.\n"
 			       "Consider increasing "
@@ -734,7 +766,7 @@ static void record__adjust_affinity(struct record *rec, struct perf_mmap *map)
 }
 
 static int record__mmap_read_evlist(struct record *rec, struct perf_evlist *evlist,
-				    bool overwrite)
+				    bool overwrite, bool sync)
 {
 	u64 bytes_written = rec->bytes_written;
 	int i;
@@ -757,12 +789,19 @@ static int record__mmap_read_evlist(struct record *rec, struct perf_evlist *evli
 		off = record__aio_get_pos(trace_fd);
 
 	for (i = 0; i < evlist->nr_mmaps; i++) {
+		u64 flush = MMAP_FLUSH_DEFAULT;
 		struct perf_mmap *map = &maps[i];
 
 		if (map->base) {
 			record__adjust_affinity(rec, map);
+			if (sync) {
+				flush = map->flush;
+				map->flush = MMAP_FLUSH_DEFAULT;
+			}
 			if (!record__aio_enabled(rec)) {
 				if (perf_mmap__push(map, rec, record__pushfn) != 0) {
+					if (sync)
+						map->flush = flush;
 					rc = -1;
 					goto out;
 				}
@@ -775,10 +814,14 @@ static int record__mmap_read_evlist(struct record *rec, struct perf_evlist *evli
 				idx = record__aio_sync(map, false);
 				if (perf_mmap__aio_push(map, rec, idx, record__aio_pushfn, &off) != 0) {
 					record__aio_set_pos(trace_fd, off);
+					if (sync)
+						map->flush = flush;
 					rc = -1;
 					goto out;
 				}
 			}
+			if (sync)
+				map->flush = flush;
 		}
 
 		if (map->auxtrace_mmap.base && !rec->opts.auxtrace_snapshot_mode &&
@@ -804,15 +847,15 @@ static int record__mmap_read_evlist(struct record *rec, struct perf_evlist *evli
 	return rc;
 }
 
-static int record__mmap_read_all(struct record *rec)
+static int record__mmap_read_all(struct record *rec, bool sync)
 {
 	int err;
 
-	err = record__mmap_read_evlist(rec, rec->evlist, false);
+	err = record__mmap_read_evlist(rec, rec->evlist, false, sync);
 	if (err)
 		return err;
 
-	return record__mmap_read_evlist(rec, rec->evlist, true);
+	return record__mmap_read_evlist(rec, rec->evlist, true, sync);
 }
 
 static void record__init_features(struct record *rec)
@@ -838,6 +881,9 @@ static void record__init_features(struct record *rec)
 	if (!(rec->opts.use_clockid && rec->opts.clockid_res_ns))
 		perf_header__clear_feat(&session->header, HEADER_CLOCKID);
 
+	if (!record__comp_enabled(rec))
+		perf_header__clear_feat(&session->header, HEADER_COMPRESSED);
+
 	perf_header__clear_feat(&session->header, HEADER_STAT);
 }
 
@@ -1147,6 +1193,10 @@ static int __cmd_record(struct record *rec, int argc, const char **argv)
 	fd = perf_data__fd(data);
 	rec->session = session;
 
+	rec->opts.comp_level = 0;
+	session->header.env.comp_level = rec->opts.comp_level;
+	session->header.env.comp_type = PERF_COMP_NONE;
+
 	record__init_features(rec);
 
 	if (rec->opts.use_clockid && rec->opts.clockid_res_ns)
@@ -1176,6 +1226,7 @@ static int __cmd_record(struct record *rec, int argc, const char **argv)
 		err = -1;
 		goto out_child;
 	}
+	session->header.env.comp_mmap_len = session->evlist->mmap_len;
 
 	err = bpf__apply_obj_config();
 	if (err) {
@@ -1311,7 +1362,7 @@ static int __cmd_record(struct record *rec, int argc, const char **argv)
 		if (trigger_is_hit(&switch_output_trigger) || done || draining)
 			perf_evlist__toggle_bkw_mmap(rec->evlist, BKW_MMAP_DATA_PENDING);
 
-		if (record__mmap_read_all(rec) < 0) {
+		if (record__mmap_read_all(rec, false) < 0) {
 			trigger_error(&auxtrace_snapshot_trigger);
 			trigger_error(&switch_output_trigger);
 			err = -1;
@@ -1412,8 +1463,17 @@ static int __cmd_record(struct record *rec, int argc, const char **argv)
 		record__synthesize_workload(rec, true);
 
 out_child:
+	record__mmap_read_all(rec, true);
 	record__aio_mmap_read_sync(rec);
 
+	if (!quiet && rec->session->bytes_transferred && rec->session->bytes_compressed) {
+		float ratio = (float)rec->session->bytes_transferred/(float)rec->session->bytes_compressed;
+
+		session->header.env.comp_ratio = ratio + 0.5;
+		fprintf(stderr,	"[ perf record: Compressed %.3f MB to %.3f MB, ratio is %.3f ]\n",
+			rec->session->bytes_transferred / 1024.0 / 1024.0, rec->session->bytes_compressed / 1024.0 / 1024.0, ratio);
+	}
+
 	if (forks) {
 		int exit_status;
 
@@ -1814,6 +1874,7 @@ static struct record record = {
 			.uses_mmap   = true,
 			.default_per_cpu = true,
 		},
+		.mmap_flush          = MMAP_FLUSH_DEFAULT,
 	},
 	.tool = {
 		.sample		= process_sample_event,
@@ -1982,6 +2043,13 @@ static struct option __record_options[] = {
 	OPT_CALLBACK(0, "affinity", &record.opts, "node|cpu",
 		     "Set affinity mask of trace reading thread to NUMA node cpu mask or cpu of processed mmap buffer",
 		     record__parse_affinity),
+#ifdef HAVE_ZSTD_SUPPORT
+	OPT_UINTEGER('z', "compression-level", &record.opts.comp_level,
+		     "Produce compressed trace file (default: 0, fastest: 1, smallest: 22)"),
+#endif
+	OPT_CALLBACK(0, "mmap-flush", &record.opts, "num",
+		     "Minimal number of bytes in kernel buffer that is flushed to trace file (default: 1)",
+		     record__mmap_flush_parse),
 	OPT_END()
 };
 
@@ -2177,10 +2245,18 @@ int cmd_record(int argc, const char **argv)
 		goto out;
 	}
 
-	if (rec->opts.nr_cblocks > nr_cblocks_max)
-		rec->opts.nr_cblocks = nr_cblocks_max;
-	if (verbose > 0)
-		pr_info("nr_cblocks: %d\n", rec->opts.nr_cblocks);
+	if (rec->opts.comp_level > 22)
+		rec->opts.comp_level = 0;
+	if (record__comp_enabled(rec) && !rec->opts.nr_cblocks) {
+		 /*
+		  * Allocate aio.data[0] buffer for compression.
+		  */
+		rec->opts.nr_cblocks = -1;
+	}
+
+	pr_debug("nr_cblocks: %d\n", rec->opts.nr_cblocks);
+	pr_debug("comp level: %d\n", rec->opts.comp_level);
+	pr_debug("mmap flush: %d\n", rec->opts.mmap_flush);
 
 	pr_debug("affinity: %s\n", affinity_tags[rec->opts.affinity]);
 
diff --git a/tools/perf/perf.h b/tools/perf/perf.h
index b120e547ddc7..e5cf206ab9e0 100644
--- a/tools/perf/perf.h
+++ b/tools/perf/perf.h
@@ -85,6 +85,8 @@ struct record_opts {
 	u64          clockid_res_ns;
 	int	     nr_cblocks;
 	int	     affinity;
+	unsigned int comp_level;
+	int	     mmap_flush;
 };
 
 enum perf_affinity {
diff --git a/tools/perf/util/env.h b/tools/perf/util/env.h
index d01b8355f4ca..fa5dc9b87029 100644
--- a/tools/perf/util/env.h
+++ b/tools/perf/util/env.h
@@ -64,6 +64,16 @@ struct perf_env {
 	struct memory_node	*memory_nodes;
 	unsigned long long	 memory_bsize;
 	u64                     clockid_res_ns;
+	u32			comp_type;
+	u32			comp_level;
+	u32			comp_ratio;
+	u32			comp_mmap_len;
+};
+
+enum perf_compress_type {
+	PERF_COMP_NONE = 0,
+	PERF_COMP_ZSTD,
+	PERF_COMP_MAX
 };
 
 extern struct perf_env perf_env;
diff --git a/tools/perf/util/event.c b/tools/perf/util/event.c
index ba7be74fad6e..d1ad6c419724 100644
--- a/tools/perf/util/event.c
+++ b/tools/perf/util/event.c
@@ -68,6 +68,7 @@ static const char *perf_event__names[] = {
 	[PERF_RECORD_EVENT_UPDATE]		= "EVENT_UPDATE",
 	[PERF_RECORD_TIME_CONV]			= "TIME_CONV",
 	[PERF_RECORD_HEADER_FEATURE]		= "FEATURE",
+	[PERF_RECORD_COMPRESSED]		= "COMPRESSED",
 };
 
 static const char *perf_ns__names[] = {
diff --git a/tools/perf/util/event.h b/tools/perf/util/event.h
index 36ae7e92dab1..8a13aefe734e 100644
--- a/tools/perf/util/event.h
+++ b/tools/perf/util/event.h
@@ -254,6 +254,7 @@ enum perf_user_event_type { /* above any possible kernel type */
 	PERF_RECORD_EVENT_UPDATE		= 78,
 	PERF_RECORD_TIME_CONV			= 79,
 	PERF_RECORD_HEADER_FEATURE		= 80,
+	PERF_RECORD_COMPRESSED			= 81,
 	PERF_RECORD_HEADER_MAX
 };
 
@@ -626,6 +627,11 @@ struct feature_event {
 	char				data[];
 };
 
+struct compressed_event {
+	struct perf_event_header	header;
+	char				data[];
+};
+
 union perf_event {
 	struct perf_event_header	header;
 	struct mmap_event		mmap;
@@ -659,6 +665,7 @@ union perf_event {
 	struct feature_event		feat;
 	struct ksymbol_event		ksymbol_event;
 	struct bpf_event		bpf_event;
+	struct compressed_event		pack;
 };
 
 void perf_event__print_totals(void);
diff --git a/tools/perf/util/evlist.c b/tools/perf/util/evlist.c
index 08cedb643ea6..937039faac59 100644
--- a/tools/perf/util/evlist.c
+++ b/tools/perf/util/evlist.c
@@ -1022,7 +1022,7 @@ int perf_evlist__parse_mmap_pages(const struct option *opt, const char *str,
  */
 int perf_evlist__mmap_ex(struct perf_evlist *evlist, unsigned int pages,
 			 unsigned int auxtrace_pages,
-			 bool auxtrace_overwrite, int nr_cblocks, int affinity)
+			 bool auxtrace_overwrite, int nr_cblocks, int affinity, int flush)
 {
 	struct perf_evsel *evsel;
 	const struct cpu_map *cpus = evlist->cpus;
@@ -1032,7 +1032,7 @@ int perf_evlist__mmap_ex(struct perf_evlist *evlist, unsigned int pages,
 	 * Its value is decided by evsel's write_backward.
 	 * So &mp should not be passed through const pointer.
 	 */
-	struct mmap_params mp = { .nr_cblocks = nr_cblocks, .affinity = affinity };
+	struct mmap_params mp = { .nr_cblocks = nr_cblocks, .affinity = affinity, .flush = flush };
 
 	if (!evlist->mmap)
 		evlist->mmap = perf_evlist__alloc_mmap(evlist, false);
@@ -1064,7 +1064,7 @@ int perf_evlist__mmap_ex(struct perf_evlist *evlist, unsigned int pages,
 
 int perf_evlist__mmap(struct perf_evlist *evlist, unsigned int pages)
 {
-	return perf_evlist__mmap_ex(evlist, pages, 0, false, 0, PERF_AFFINITY_SYS);
+	return perf_evlist__mmap_ex(evlist, pages, 0, false, 0, PERF_AFFINITY_SYS, 1);
 }
 
 int perf_evlist__create_maps(struct perf_evlist *evlist, struct target *target)
diff --git a/tools/perf/util/evlist.h b/tools/perf/util/evlist.h
index 744906dd4887..edf18811e39f 100644
--- a/tools/perf/util/evlist.h
+++ b/tools/perf/util/evlist.h
@@ -165,7 +165,8 @@ unsigned long perf_event_mlock_kb_in_pages(void);
 
 int perf_evlist__mmap_ex(struct perf_evlist *evlist, unsigned int pages,
 			 unsigned int auxtrace_pages,
-			 bool auxtrace_overwrite, int nr_cblocks, int affinity);
+			 bool auxtrace_overwrite, int nr_cblocks,
+			 int affinity, int flush);
 int perf_evlist__mmap(struct perf_evlist *evlist, unsigned int pages);
 void perf_evlist__munmap(struct perf_evlist *evlist);
 
diff --git a/tools/perf/util/header.c b/tools/perf/util/header.c
index dec6d218c31c..5ad3a27a042f 100644
--- a/tools/perf/util/header.c
+++ b/tools/perf/util/header.c
@@ -1463,6 +1463,21 @@ static int write_mem_topology(struct feat_fd *ff __maybe_unused,
 	return ret;
 }
 
+static int write_compressed(struct feat_fd *ff __maybe_unused,
+			    struct perf_evlist *evlist __maybe_unused)
+{
+	int ret;
+	u64 compression_info = ((u64)ff->ph->env.comp_type  << 32) | ff->ph->env.comp_level;
+
+	ret = do_write(ff, &compression_info, sizeof(compression_info));
+	if (ret)
+		return ret;
+
+	compression_info = ((u64)ff->ph->env.comp_ratio << 32) | ff->ph->env.comp_mmap_len;
+
+	return do_write(ff, &compression_info, sizeof(compression_info));
+}
+
 static void print_hostname(struct feat_fd *ff, FILE *fp)
 {
 	fprintf(fp, "# hostname : %s\n", ff->ph->env.hostname);
@@ -1750,6 +1765,13 @@ static void print_cache(struct feat_fd *ff, FILE *fp __maybe_unused)
 	}
 }
 
+static void print_compressed(struct feat_fd *ff, FILE *fp)
+{
+	fprintf(fp, "# compressed : %s, level = %d, ratio = %d\n",
+		ff->ph->env.comp_type == PERF_COMP_ZSTD ? "Zstd" : "Unknown",
+		ff->ph->env.comp_level, ff->ph->env.comp_ratio);
+}
+
 static void print_pmu_mappings(struct feat_fd *ff, FILE *fp)
 {
 	const char *delimiter = "# pmu mappings: ";
@@ -2592,6 +2614,26 @@ static int process_clockid(struct feat_fd *ff,
 	return 0;
 }
 
+static int process_compressed(struct feat_fd *ff,
+			      void *data __maybe_unused)
+{
+	u64 compression_info;
+
+	if (do_read_u64(ff, &compression_info))
+		return -1;
+
+	ff->ph->env.comp_type  = (compression_info >> 32) & 0xffffffffULL;
+	ff->ph->env.comp_level = compression_info & 0xffffffffULL;
+
+	if (do_read_u64(ff, &compression_info))
+		return -1;
+
+	ff->ph->env.comp_ratio = (compression_info >> 32) & 0xffffffffULL;
+	ff->ph->env.comp_mmap_len = compression_info & 0xffffffffULL;
+
+	return 0;
+}
+
 struct feature_ops {
 	int (*write)(struct feat_fd *ff, struct perf_evlist *evlist);
 	void (*print)(struct feat_fd *ff, FILE *fp);
@@ -2651,7 +2693,8 @@ static const struct feature_ops feat_ops[HEADER_LAST_FEATURE] = {
 	FEAT_OPN(CACHE,		cache,		true),
 	FEAT_OPR(SAMPLE_TIME,	sample_time,	false),
 	FEAT_OPR(MEM_TOPOLOGY,	mem_topology,	true),
-	FEAT_OPR(CLOCKID,       clockid,        false)
+	FEAT_OPR(CLOCKID,       clockid,        false),
+	FEAT_OPR(COMPRESSED,	compressed,	false)
 };
 
 struct header_print_data {
diff --git a/tools/perf/util/header.h b/tools/perf/util/header.h
index 0d553ddca0a3..ee867075dc64 100644
--- a/tools/perf/util/header.h
+++ b/tools/perf/util/header.h
@@ -39,6 +39,7 @@ enum {
 	HEADER_SAMPLE_TIME,
 	HEADER_MEM_TOPOLOGY,
 	HEADER_CLOCKID,
+	HEADER_COMPRESSED,
 	HEADER_LAST_FEATURE,
 	HEADER_FEAT_BITS	= 256,
 };
diff --git a/tools/perf/util/mmap.c b/tools/perf/util/mmap.c
index cdc7740fc181..239e9a13c2b7 100644
--- a/tools/perf/util/mmap.c
+++ b/tools/perf/util/mmap.c
@@ -156,8 +156,6 @@ void __weak auxtrace_mmap_params__set_idx(struct auxtrace_mmap_params *mp __mayb
 {
 }
 
-#ifdef HAVE_AIO_SUPPORT
-
 #ifdef HAVE_LIBNUMA_SUPPORT
 static int perf_mmap__aio_alloc(struct perf_mmap *map, int idx)
 {
@@ -220,28 +218,24 @@ static int perf_mmap__aio_bind(struct perf_mmap *map __maybe_unused, int idx __m
 }
 #endif
 
+static int perf_mmap__aio_mmap_blocks(struct perf_mmap *map);
+
 static int perf_mmap__aio_mmap(struct perf_mmap *map, struct mmap_params *mp)
 {
-	int delta_max, i, prio, ret;
+	int i, ret = 0, init_blocks = 1;
 
 	map->aio.nr_cblocks = mp->nr_cblocks;
+	if (map->aio.nr_cblocks == -1) {
+		map->aio.nr_cblocks = 1;
+		init_blocks = 0;
+	}
+
 	if (map->aio.nr_cblocks) {
-		map->aio.aiocb = calloc(map->aio.nr_cblocks, sizeof(struct aiocb *));
-		if (!map->aio.aiocb) {
-			pr_debug2("failed to allocate aiocb for data buffer, error %m\n");
-			return -1;
-		}
-		map->aio.cblocks = calloc(map->aio.nr_cblocks, sizeof(struct aiocb));
-		if (!map->aio.cblocks) {
-			pr_debug2("failed to allocate cblocks for data buffer, error %m\n");
-			return -1;
-		}
 		map->aio.data = calloc(map->aio.nr_cblocks, sizeof(void *));
 		if (!map->aio.data) {
 			pr_debug2("failed to allocate data buffer, error %m\n");
 			return -1;
 		}
-		delta_max = sysconf(_SC_AIO_PRIO_DELTA_MAX);
 		for (i = 0; i < map->aio.nr_cblocks; ++i) {
 			ret = perf_mmap__aio_alloc(map, i);
 			if (ret == -1) {
@@ -251,29 +245,16 @@ static int perf_mmap__aio_mmap(struct perf_mmap *map, struct mmap_params *mp)
 			ret = perf_mmap__aio_bind(map, i, map->cpu, mp->affinity);
 			if (ret == -1)
 				return -1;
-			/*
-			 * Use cblock.aio_fildes value different from -1
-			 * to denote started aio write operation on the
-			 * cblock so it requires explicit record__aio_sync()
-			 * call prior the cblock may be reused again.
-			 */
-			map->aio.cblocks[i].aio_fildes = -1;
-			/*
-			 * Allocate cblocks with priority delta to have
-			 * faster aio write system calls because queued requests
-			 * are kept in separate per-prio queues and adding
-			 * a new request will iterate thru shorter per-prio
-			 * list. Blocks with numbers higher than
-			 *  _SC_AIO_PRIO_DELTA_MAX go with priority 0.
-			 */
-			prio = delta_max - i;
-			map->aio.cblocks[i].aio_reqprio = prio >= 0 ? prio : 0;
 		}
+		if (init_blocks)
+			ret = perf_mmap__aio_mmap_blocks(map);
 	}
 
-	return 0;
+	return ret;
 }
 
+static void perf_mmap__aio_munmap_blocks(struct perf_mmap *map);
+
 static void perf_mmap__aio_munmap(struct perf_mmap *map)
 {
 	int i;
@@ -282,6 +263,50 @@ static void perf_mmap__aio_munmap(struct perf_mmap *map)
 		perf_mmap__aio_free(map, i);
 	if (map->aio.data)
 		zfree(&map->aio.data);
+	perf_mmap__aio_munmap_blocks(map);
+}
+
+#ifdef HAVE_AIO_SUPPORT
+static int perf_mmap__aio_mmap_blocks(struct perf_mmap *map)
+{
+	int delta_max, i, prio;
+
+	map->aio.aiocb = calloc(map->aio.nr_cblocks, sizeof(struct aiocb *));
+	if (!map->aio.aiocb) {
+		pr_debug2("failed to allocate aiocb for data buffer, error %m\n");
+		return -1;
+	}
+	map->aio.cblocks = calloc(map->aio.nr_cblocks, sizeof(struct aiocb));
+	if (!map->aio.cblocks) {
+		pr_debug2("failed to allocate cblocks for data buffer, error %m\n");
+		return -1;
+	}
+	delta_max = sysconf(_SC_AIO_PRIO_DELTA_MAX);
+	for (i = 0; i < map->aio.nr_cblocks; ++i) {
+		/*
+		 * Use cblock.aio_fildes value different from -1
+		 * to denote started aio write operation on the
+		 * cblock so it requires explicit record__aio_sync()
+		 * call prior the cblock may be reused again.
+		 */
+		map->aio.cblocks[i].aio_fildes = -1;
+		/*
+		 * Allocate cblocks with priority delta to have
+		 * faster aio write system calls because queued requests
+		 * are kept in separate per-prio queues and adding
+		 * a new request will iterate thru shorter per-prio
+		 * list. Blocks with numbers higher than
+		 *  _SC_AIO_PRIO_DELTA_MAX go with priority 0.
+		 */
+		prio = delta_max - i;
+		map->aio.cblocks[i].aio_reqprio = prio >= 0 ? prio : 0;
+	}
+
+	return 0;
+}
+
+static void perf_mmap__aio_munmap_blocks(struct perf_mmap *map)
+{
 	zfree(&map->aio.cblocks);
 	zfree(&map->aio.aiocb);
 }
@@ -360,13 +385,12 @@ int perf_mmap__aio_push(struct perf_mmap *md, void *to, int idx,
 	return rc;
 }
 #else
-static int perf_mmap__aio_mmap(struct perf_mmap *map __maybe_unused,
-			       struct mmap_params *mp __maybe_unused)
+static int perf_mmap__aio_mmap_blocks(struct perf_mmap *map __maybe_unused)
 {
 	return 0;
 }
 
-static void perf_mmap__aio_munmap(struct perf_mmap *map __maybe_unused)
+static void perf_mmap__aio_munmap_blocks(struct perf_mmap *map __maybe_unused)
 {
 }
 #endif
@@ -444,6 +468,8 @@ int perf_mmap__mmap(struct perf_mmap *map, struct mmap_params *mp, int fd, int c
 				&mp->auxtrace_mp, map->base, fd))
 		return -1;
 
+	map->flush = mp->flush;
+
 	return perf_mmap__aio_mmap(map, mp);
 }
 
@@ -492,7 +518,7 @@ static int __perf_mmap__read_init(struct perf_mmap *md)
 	md->start = md->overwrite ? head : old;
 	md->end = md->overwrite ? old : head;
 
-	if (md->start == md->end)
+	if ((md->end - md->start) < md->flush)
 		return -EAGAIN;
 
 	size = md->end - md->start;
diff --git a/tools/perf/util/mmap.h b/tools/perf/util/mmap.h
index e566c19b242b..4fd7d82825b7 100644
--- a/tools/perf/util/mmap.h
+++ b/tools/perf/util/mmap.h
@@ -30,14 +30,15 @@ struct perf_mmap {
 	bool		 overwrite;
 	struct auxtrace_mmap auxtrace_mmap;
 	char		 event_copy[PERF_SAMPLE_MAX_SIZE] __aligned(8);
-#ifdef HAVE_AIO_SUPPORT
+	u64		 flush;
 	struct {
 		void		 **data;
+#ifdef HAVE_AIO_SUPPORT
 		struct aiocb	 *cblocks;
 		struct aiocb	 **aiocb;
+#endif
 		int		 nr_cblocks;
 	} aio;
-#endif
 	cpu_set_t	affinity_mask;
 };
 
@@ -70,7 +71,7 @@ enum bkw_mmap_state {
 };
 
 struct mmap_params {
-	int			    prot, mask, nr_cblocks, affinity;
+	int			    prot, mask, nr_cblocks, affinity, flush;
 	struct auxtrace_mmap_params auxtrace_mp;
 };
 
diff --git a/tools/perf/util/session.h b/tools/perf/util/session.h
index d96eccd7d27f..0e14884f28b2 100644
--- a/tools/perf/util/session.h
+++ b/tools/perf/util/session.h
@@ -35,6 +35,8 @@ struct perf_session {
 	struct ordered_events	ordered_events;
 	struct perf_data	*data;
 	struct perf_tool	*tool;
+	u64			bytes_transferred;
+	u64			bytes_compressed;
 };
 
 struct perf_tool;

^ permalink raw reply related	[flat|nested] 67+ messages in thread

* [PATCH v2 3/4] perf record: enable runtime trace compression
  2019-02-11 20:17 [PATCH v2 0/4] perf: enable compression of record mode trace to save storage space Alexey Budankov
  2019-02-11 20:21 ` [PATCH v2 1/4] feature: realize libzstd check, LIBZSTD_DIR and NO_LIBZSTD defines Alexey Budankov
  2019-02-11 20:22 ` [PATCH v2 2/4] perf record: implement -z=<level> and --mmap-flush=<thres> options Alexey Budankov
@ 2019-02-11 20:23 ` Alexey Budankov
  2019-02-12 13:08   ` Jiri Olsa
                     ` (5 more replies)
  2019-02-11 20:25 ` [PATCH v2 4/4] perf report: support record trace file decompression Alexey Budankov
  2019-02-12 12:27 ` [PATCH v2 0/4] perf: enable compression of record mode trace to save storage space Arnaldo Carvalho de Melo
  4 siblings, 6 replies; 67+ messages in thread
From: Alexey Budankov @ 2019-02-11 20:23 UTC (permalink / raw)
  To: Arnaldo Carvalho de Melo, Ingo Molnar, Peter Zijlstra
  Cc: Jiri Olsa, Namhyung Kim, Alexander Shishkin, Andi Kleen, linux-kernel


Compression is implemented using simple Zstd API and employs AIO data
buffer as the memory to operate on. If the API call fails for some
reason compression falls back to memcpy().

Data chunks are split and packed into PERF_RECORD_COMPRESSED records
by 64KB at max. mmap-flush option value can be used to avoid compression
of every single byte of data and increase compression ratio.

Signed-off-by: Alexey Budankov <alexey.budankov@linux.intel.com>
---
Changes in v2:
- enabled trace compression for serial trace streaming
- moved compression/decompression code to session layer

---
 tools/perf/builtin-record.c |  67 +++++++++-----
 tools/perf/util/mmap.c      | 173 +++++++++++++++++++++---------------
 tools/perf/util/mmap.h      |  24 +++--
 tools/perf/util/session.c   | 106 ++++++++++++++++++++++
 tools/perf/util/session.h   |  13 +++
 5 files changed, 280 insertions(+), 103 deletions(-)

diff --git a/tools/perf/builtin-record.c b/tools/perf/builtin-record.c
index 227dbbd47d3f..435ff88dfc5e 100644
--- a/tools/perf/builtin-record.c
+++ b/tools/perf/builtin-record.c
@@ -112,8 +112,7 @@ static bool switch_output_time(struct record *rec)
 	       trigger_is_ready(&switch_output_trigger);
 }
 
-static int record__write(struct record *rec, struct perf_mmap *map __maybe_unused,
-			 void *bf, size_t size)
+static int record__write(struct record *rec, void *bf, size_t size)
 {
 	struct perf_data_file *file = &rec->session->data->file;
 
@@ -237,7 +236,7 @@ static int record__aio_sync(struct perf_mmap *md, bool sync_all)
 	} while (1);
 }
 
-static int record__aio_pushfn(void *to, struct aiocb *cblock, void *bf, size_t size, off_t off)
+static int record__aio_pushfn(void *to, void *bf, size_t size, off_t off, struct aiocb *cblock)
 {
 	struct record *rec = to;
 	int ret, trace_fd = rec->session->data->file.fd;
@@ -264,13 +263,15 @@ static void record__aio_set_pos(int trace_fd, off_t pos)
 	lseek(trace_fd, pos, SEEK_SET);
 }
 
+static int record__aio_enabled(struct record *rec);
+
 static void record__aio_mmap_read_sync(struct record *rec)
 {
 	int i;
 	struct perf_evlist *evlist = rec->evlist;
 	struct perf_mmap *maps = evlist->mmap;
 
-	if (!rec->opts.nr_cblocks)
+	if (!record__aio_enabled(rec))
 		return;
 
 	for (i = 0; i < evlist->nr_mmaps; i++) {
@@ -311,8 +312,8 @@ static int record__aio_sync(struct perf_mmap *md __maybe_unused, bool sync_all _
 	return -1;
 }
 
-static int record__aio_pushfn(void *to __maybe_unused, struct aiocb *cblock __maybe_unused,
-		void *bf __maybe_unused, size_t size __maybe_unused, off_t off __maybe_unused)
+static int record__aio_pushfn(void *to __maybe_unused, void *bf __maybe_unused,
+	size_t size __maybe_unused, off_t off __maybe_unused, struct aiocb *cblock __maybe_unused)
 {
 	return -1;
 }
@@ -371,15 +372,15 @@ static int process_synthesized_event(struct perf_tool *tool,
 				     struct machine *machine __maybe_unused)
 {
 	struct record *rec = container_of(tool, struct record, tool);
-	return record__write(rec, NULL, event, event->header.size);
+	return record__write(rec, event, event->header.size);
 }
 
-static int record__pushfn(struct perf_mmap *map, void *to, void *bf, size_t size)
+static int record__pushfn(void *to, void *bf, size_t size)
 {
 	struct record *rec = to;
 
 	rec->samples++;
-	return record__write(rec, map, bf, size);
+	return record__write(rec, bf, size);
 }
 
 static volatile int done;
@@ -414,7 +415,7 @@ static void record__sig_exit(void)
 #ifdef HAVE_AUXTRACE_SUPPORT
 
 static int record__process_auxtrace(struct perf_tool *tool,
-				    struct perf_mmap *map,
+				    struct perf_mmap *map __maybe_unused,
 				    union perf_event *event, void *data1,
 				    size_t len1, void *data2, size_t len2)
 {
@@ -442,11 +443,11 @@ static int record__process_auxtrace(struct perf_tool *tool,
 	if (padding)
 		padding = 8 - padding;
 
-	record__write(rec, map, event, event->header.size);
-	record__write(rec, map, data1, len1);
+	record__write(rec, event, event->header.size);
+	record__write(rec, data1, len1);
 	if (len2)
-		record__write(rec, map, data2, len2);
-	record__write(rec, map, &pad, padding);
+		record__write(rec, data2, len2);
+	record__write(rec, &pad, padding);
 
 	return 0;
 }
@@ -774,6 +775,8 @@ static int record__mmap_read_evlist(struct record *rec, struct perf_evlist *evli
 	struct perf_mmap *maps;
 	int trace_fd = rec->data.file.fd;
 	off_t off;
+	struct perf_session *session = rec->session;
+	perf_mmap__compress_fn_t compress_fn;
 
 	if (!evlist)
 		return 0;
@@ -785,6 +788,9 @@ static int record__mmap_read_evlist(struct record *rec, struct perf_evlist *evli
 	if (overwrite && evlist->bkw_mmap_state != BKW_MMAP_DATA_PENDING)
 		return 0;
 
+	compress_fn = (record__comp_enabled(rec) ?
+		perf_session__zstd_compress : perf_session__zstd_copy);
+
 	if (record__aio_enabled(rec))
 		off = record__aio_get_pos(trace_fd);
 
@@ -799,11 +805,21 @@ static int record__mmap_read_evlist(struct record *rec, struct perf_evlist *evli
 				map->flush = MMAP_FLUSH_DEFAULT;
 			}
 			if (!record__aio_enabled(rec)) {
-				if (perf_mmap__push(map, rec, record__pushfn) != 0) {
-					if (sync)
-						map->flush = flush;
-					rc = -1;
-					goto out;
+				if (!record__comp_enabled(rec)) {
+					if (perf_mmap__push(map, rec, record__pushfn) != 0) {
+						if (sync)
+							map->flush = flush;
+						rc = -1;
+						goto out;
+					}
+				} else {
+					if (perf_mmap__pack(map, rec, record__pushfn,
+							compress_fn, session) != 0) {
+						if (sync)
+							map->flush = flush;
+						rc = -1;
+						goto out;
+					}
 				}
 			} else {
 				int idx;
@@ -812,7 +828,8 @@ static int record__mmap_read_evlist(struct record *rec, struct perf_evlist *evli
 				 * becomes available after previous aio write request.
 				 */
 				idx = record__aio_sync(map, false);
-				if (perf_mmap__aio_push(map, rec, idx, record__aio_pushfn, &off) != 0) {
+				if (perf_mmap__aio_push(map, rec, record__aio_pushfn,
+					&off, idx, compress_fn, session) != 0) {
 					record__aio_set_pos(trace_fd, off);
 					if (sync)
 						map->flush = flush;
@@ -839,7 +856,7 @@ static int record__mmap_read_evlist(struct record *rec, struct perf_evlist *evli
 	 * at least one event.
 	 */
 	if (bytes_written != rec->bytes_written)
-		rc = record__write(rec, NULL, &finished_round_event, sizeof(finished_round_event));
+		rc = record__write(rec, &finished_round_event, sizeof(finished_round_event));
 
 	if (overwrite)
 		perf_evlist__toggle_bkw_mmap(evlist, BKW_MMAP_EMPTY);
@@ -1193,9 +1210,10 @@ static int __cmd_record(struct record *rec, int argc, const char **argv)
 	fd = perf_data__fd(data);
 	rec->session = session;
 
-	rec->opts.comp_level = 0;
-	session->header.env.comp_level = rec->opts.comp_level;
-	session->header.env.comp_type = PERF_COMP_NONE;
+	if (perf_session__zstd_init(session, rec->opts.comp_level) < 0) {
+		pr_err("Compression initialization failed.\n");
+		return -1;
+	}
 
 	record__init_features(rec);
 
@@ -1526,6 +1544,7 @@ static int __cmd_record(struct record *rec, int argc, const char **argv)
 	}
 
 out_delete_session:
+	perf_session__zstd_fini(session);
 	perf_session__delete(session);
 	return status;
 }
diff --git a/tools/perf/util/mmap.c b/tools/perf/util/mmap.c
index 239e9a13c2b7..980784b77fe2 100644
--- a/tools/perf/util/mmap.c
+++ b/tools/perf/util/mmap.c
@@ -156,6 +156,86 @@ void __weak auxtrace_mmap_params__set_idx(struct auxtrace_mmap_params *mp __mayb
 {
 }
 
+static ssize_t perf_mmap__capture(struct perf_mmap *md, int idx,
+			perf_mmap__compress_fn_t compress, void *where)
+{
+	u64 head = perf_mmap__read_head(md);
+	unsigned char *data = md->base + page_size;
+	unsigned long size, size0 = 0;
+	void *buf;
+	int rc = 0;
+	size_t mmap_len = perf_mmap__mmap_len(md);
+
+	rc = perf_mmap__read_init(md);
+	if (rc < 0)
+		return (rc == -EAGAIN) ? 0 : -1;
+
+	/*
+	 * md->base data is copied into md->data[idx] buffer to
+	 * release space in the kernel buffer as fast as possible,
+	 * thru perf_mmap__consume() below.
+	 *
+	 * That lets the kernel to proceed with storing more
+	 * profiling data into the kernel buffer earlier than other
+	 * per-cpu kernel buffers are handled.
+	 *
+	 * Coping can be done in two steps in case the chunk of
+	 * profiling data crosses the upper bound of the kernel buffer.
+	 * In this case we first move part of data from md->start
+	 * till the upper bound and then the reminder from the
+	 * beginning of the kernel buffer till the end of
+	 * the data chunk.
+	 */
+
+	size = md->end - md->start;
+
+	if ((md->start & md->mask) + size != (md->end & md->mask)) {
+		buf = &data[md->start & md->mask];
+		size = md->mask + 1 - (md->start & md->mask);
+		md->start += size;
+		size0 = compress(where, md->aio.data[idx], mmap_len, buf, size);
+	}
+
+	buf = &data[md->start & md->mask];
+	size = md->end - md->start;
+	md->start += size;
+	size0 += compress(where, md->aio.data[idx] + size0, mmap_len - size0, buf, size);
+
+	/*
+	 * Increment md->refcount to guard md->data[idx] buffer
+	 * from premature deallocation because md object can be
+	 * released earlier than aio write request started
+	 * on mmap->data[idx] is complete.
+	 *
+	 * perf_mmap__put() is done at record__aio_complete()
+	 * after started request completion.
+	 */
+	perf_mmap__get(md);
+
+	md->prev = head;
+	perf_mmap__consume(md);
+
+	return size0;
+}
+
+int perf_mmap__pack(struct perf_mmap *md, void *to, perf_mmap__push_fn_t push,
+		perf_mmap__compress_fn_t compress, void *where)
+{
+	int rc = 0;
+	ssize_t size = 0;
+
+	size = perf_mmap__capture(md, /*idx*/ 0, compress, where);
+	if (size > 0) {
+		rc = push(to, md->aio.data[0], size);
+		perf_mmap__put(md);
+		rc = rc < 0 ? -1 : 0;
+	} else if (size < 0) {
+		rc = -1;
+	}
+
+	return rc;
+}
+
 #ifdef HAVE_LIBNUMA_SUPPORT
 static int perf_mmap__aio_alloc(struct perf_mmap *map, int idx)
 {
@@ -311,75 +391,27 @@ static void perf_mmap__aio_munmap_blocks(struct perf_mmap *map)
 	zfree(&map->aio.aiocb);
 }
 
-int perf_mmap__aio_push(struct perf_mmap *md, void *to, int idx,
-			int push(void *to, struct aiocb *cblock, void *buf, size_t size, off_t off),
-			off_t *off)
+int perf_mmap__aio_push(struct perf_mmap *md, void *to,
+		perf_mmap__aio_push_fn_t push, off_t *push_off,
+		int idx, perf_mmap__compress_fn_t compress, void *where)
 {
-	u64 head = perf_mmap__read_head(md);
-	unsigned char *data = md->base + page_size;
-	unsigned long size, size0 = 0;
-	void *buf;
 	int rc = 0;
-
-	rc = perf_mmap__read_init(md);
-	if (rc < 0)
-		return (rc == -EAGAIN) ? 0 : -1;
-
-	/*
-	 * md->base data is copied into md->data[idx] buffer to
-	 * release space in the kernel buffer as fast as possible,
-	 * thru perf_mmap__consume() below.
-	 *
-	 * That lets the kernel to proceed with storing more
-	 * profiling data into the kernel buffer earlier than other
-	 * per-cpu kernel buffers are handled.
-	 *
-	 * Coping can be done in two steps in case the chunk of
-	 * profiling data crosses the upper bound of the kernel buffer.
-	 * In this case we first move part of data from md->start
-	 * till the upper bound and then the reminder from the
-	 * beginning of the kernel buffer till the end of
-	 * the data chunk.
-	 */
-
-	size = md->end - md->start;
-
-	if ((md->start & md->mask) + size != (md->end & md->mask)) {
-		buf = &data[md->start & md->mask];
-		size = md->mask + 1 - (md->start & md->mask);
-		md->start += size;
-		memcpy(md->aio.data[idx], buf, size);
-		size0 = size;
-	}
-
-	buf = &data[md->start & md->mask];
-	size = md->end - md->start;
-	md->start += size;
-	memcpy(md->aio.data[idx] + size0, buf, size);
-
-	/*
-	 * Increment md->refcount to guard md->data[idx] buffer
-	 * from premature deallocation because md object can be
-	 * released earlier than aio write request started
-	 * on mmap->data[idx] is complete.
-	 *
-	 * perf_mmap__put() is done at record__aio_complete()
-	 * after started request completion.
-	 */
-	perf_mmap__get(md);
-
-	md->prev = head;
-	perf_mmap__consume(md);
-
-	rc = push(to, &md->aio.cblocks[idx], md->aio.data[idx], size0 + size, *off);
-	if (!rc) {
-		*off += size0 + size;
-	} else {
-		/*
-		 * Decrement md->refcount back if aio write
-		 * operation failed to start.
-		 */
-		perf_mmap__put(md);
+	ssize_t size = 0;
+
+	size = perf_mmap__capture(md, idx, compress, where);
+	if (size > 0) {
+		rc = push(to, md->aio.data[idx], size, *push_off, &md->aio.cblocks[idx]);
+		if (!rc) {
+			*push_off += size;
+		} else {
+			/*
+			 * Decrement md->refcount back if aio write
+			 * operation failed to start.
+			 */
+			perf_mmap__put(md);
+		}
+	} else if (size < 0) {
+		rc = -1;
 	}
 
 	return rc;
@@ -553,8 +585,7 @@ int perf_mmap__read_init(struct perf_mmap *map)
 	return __perf_mmap__read_init(map);
 }
 
-int perf_mmap__push(struct perf_mmap *md, void *to,
-		    int push(struct perf_mmap *map, void *to, void *buf, size_t size))
+int perf_mmap__push(struct perf_mmap *md, void *to, perf_mmap__push_fn_t push)
 {
 	u64 head = perf_mmap__read_head(md);
 	unsigned char *data = md->base + page_size;
@@ -573,7 +604,7 @@ int perf_mmap__push(struct perf_mmap *md, void *to,
 		size = md->mask + 1 - (md->start & md->mask);
 		md->start += size;
 
-		if (push(md, to, buf, size) < 0) {
+		if (push(to, buf, size) < 0) {
 			rc = -1;
 			goto out;
 		}
@@ -583,7 +614,7 @@ int perf_mmap__push(struct perf_mmap *md, void *to,
 	size = md->end - md->start;
 	md->start += size;
 
-	if (push(md, to, buf, size) < 0) {
+	if (push(to, buf, size) < 0) {
 		rc = -1;
 		goto out;
 	}
diff --git a/tools/perf/util/mmap.h b/tools/perf/util/mmap.h
index 4fd7d82825b7..bf70972c7101 100644
--- a/tools/perf/util/mmap.h
+++ b/tools/perf/util/mmap.h
@@ -97,16 +97,24 @@ union perf_event *perf_mmap__read_forward(struct perf_mmap *map);
 
 union perf_event *perf_mmap__read_event(struct perf_mmap *map);
 
-int perf_mmap__push(struct perf_mmap *md, void *to,
-		    int push(struct perf_mmap *map, void *to, void *buf, size_t size));
+typedef int (*perf_mmap__push_fn_t)(void *to, void *buf, size_t size);
+int perf_mmap__push(struct perf_mmap *md, void *to, perf_mmap__push_fn_t push);
+
+typedef size_t (*perf_mmap__compress_fn_t)(void *where, void *dst, size_t dst_size,
+		void *src, size_t src_size);
+int perf_mmap__pack(struct perf_mmap *md, void *to, perf_mmap__push_fn_t push,
+		perf_mmap__compress_fn_t compress, void *where);
+
+typedef int (*perf_mmap__aio_push_fn_t)(void *to, void *buf, size_t size,
+		off_t push_off, struct aiocb *cblock);
+
 #ifdef HAVE_AIO_SUPPORT
-int perf_mmap__aio_push(struct perf_mmap *md, void *to, int idx,
-			int push(void *to, struct aiocb *cblock, void *buf, size_t size, off_t off),
-			off_t *off);
+int perf_mmap__aio_push(struct perf_mmap *md, void *to, perf_mmap__aio_push_fn_t push, off_t *push_off,
+		int idx, perf_mmap__compress_fn_t compress_fn, void *where);
 #else
-static inline int perf_mmap__aio_push(struct perf_mmap *md __maybe_unused, void *to __maybe_unused, int idx __maybe_unused,
-	int push(void *to, struct aiocb *cblock, void *buf, size_t size, off_t off) __maybe_unused,
-	off_t *off __maybe_unused)
+static inline int perf_mmap__aio_push(struct perf_mmap *md __maybe_unused, void *to __maybe_unused,
+	perf_mmap__aio_push_fn_t push __maybe_unused, off_t *push_off __maybe_unused,
+	int idx __maybe_unused, perf_mmap__compress_fn_t compress __maybe_unused, void *where __maybe_unused)
 {
 	return 0;
 }
diff --git a/tools/perf/util/session.c b/tools/perf/util/session.c
index 18fb9c8cbf9c..5d406eebd058 100644
--- a/tools/perf/util/session.c
+++ b/tools/perf/util/session.c
@@ -29,6 +29,112 @@
 #include "stat.h"
 #include "arch/common.h"
 
+#ifdef HAVE_ZSTD_SUPPORT
+int perf_session__zstd_init(struct perf_session *session, int level)
+{
+	size_t ret;
+
+	session->header.env.comp_type  = PERF_COMP_NONE;
+	session->header.env.comp_level = 0;
+
+	session->zstd_cstream = ZSTD_createCStream();
+	if (session->zstd_cstream == NULL) {
+		pr_err("Couldn't create compression stream.\n");
+		return -1;
+	}
+
+	ret = ZSTD_initCStream(session->zstd_cstream, level);
+	if (ZSTD_isError(ret)) {
+		pr_err("Failed to initialize compression stream: %s\n", ZSTD_getErrorName(ret));
+		return -1;
+	}
+
+	session->header.env.comp_type  = PERF_COMP_ZSTD;
+	session->header.env.comp_level = level;
+
+	return 0;
+}
+
+int perf_session__zstd_fini(struct perf_session *session)
+{
+	if (session->zstd_cstream) {
+		ZSTD_freeCStream(session->zstd_cstream);
+		session->zstd_cstream = NULL;
+	}
+
+	return 0;
+}
+
+size_t perf_session__zstd_compress(void *to,  void *dst, size_t dst_size,
+				   void *src, size_t src_size)
+{
+	struct perf_session *session = to;
+	size_t ret, size, compressed = 0;
+	struct compressed_event *event = NULL;
+	/* maximum size of record data size (2^16 - 1 - header) */
+	const size_t max_data_size = (1 << 8 * sizeof(event->header.size)) -
+				      1 - sizeof(struct compressed_event);
+	ZSTD_inBuffer input = { src, src_size, 0 };
+	ZSTD_outBuffer output;
+
+	while (input.pos < input.size) {
+		event = dst;
+
+		event->header.type = PERF_RECORD_COMPRESSED;
+		event->header.size = size = sizeof(struct compressed_event);
+		compressed += size;
+		dst += size;
+		dst_size -= size;
+
+		output = (ZSTD_outBuffer){ dst, (dst_size > max_data_size) ?
+						max_data_size : dst_size, 0 };
+		ret = ZSTD_compressStream(session->zstd_cstream, &output, &input);
+		ZSTD_flushStream(session->zstd_cstream, &output);
+		if (ZSTD_isError(ret)) {
+			pr_err("failed to compress %ld bytes: %s\n",
+				(long)src_size, ZSTD_getErrorName(ret));
+			return perf_session__zstd_copy(session, dst, dst_size, src, src_size);
+		}
+		size = output.pos;
+
+		event->header.size += size;
+		compressed += size;
+		dst += size;
+		dst_size -= size;
+	}
+
+	session->bytes_transferred += src_size;
+	session->bytes_compressed  += compressed;
+
+	return compressed;
+}
+#else /* !HAVE_ZSTD_SUPPORT */
+int perf_session__zstd_init(struct perf_session *session __maybe_unused, int level __maybe_unused)
+{
+	return 0;
+}
+
+int perf_session__zstd_fini(struct perf_session *session __maybe_unused)
+{
+	return 0;
+}
+
+size_t perf_session__zstd_compress(void *to __maybe_unused,
+				void *dst __maybe_unused, size_t dst_size __maybe_unused,
+				void *src __maybe_unused, size_t src_size __maybe_unused)
+{
+	return 0;
+}
+#endif
+
+size_t perf_session__zstd_copy(void *to __maybe_unused,
+			void *dst, size_t dst_size __maybe_unused,
+			void *src, size_t src_size)
+{
+	memcpy(dst, src, src_size);
+	return src_size;
+}
+
 static int perf_session__deliver_event(struct perf_session *session,
 				       union perf_event *event,
 				       struct perf_tool *tool,
diff --git a/tools/perf/util/session.h b/tools/perf/util/session.h
index 0e14884f28b2..d8f3284cd838 100644
--- a/tools/perf/util/session.h
+++ b/tools/perf/util/session.h
@@ -11,6 +11,9 @@
 #include <linux/kernel.h>
 #include <linux/rbtree.h>
 #include <linux/perf_event.h>
+#ifdef HAVE_ZSTD_SUPPORT
+#include <zstd.h>
+#endif
 
 struct ip_callchain;
 struct symbol;
@@ -37,6 +40,9 @@ struct perf_session {
 	struct perf_tool	*tool;
 	u64			bytes_transferred;
 	u64			bytes_compressed;
+#ifdef HAVE_ZSTD_SUPPORT
+	ZSTD_CStream		*zstd_cstream;
+#endif
 };
 
 struct perf_tool;
@@ -122,6 +128,13 @@ int perf_session__deliver_synth_event(struct perf_session *session,
 				      union perf_event *event,
 				      struct perf_sample *sample);
 
+int perf_session__zstd_init(struct perf_session *session, int level);
+int perf_session__zstd_fini(struct perf_session *session);
+size_t perf_session__zstd_copy(void *to, void *dst, size_t dst_size,
+			void *src, size_t src_size);
+size_t perf_session__zstd_compress(void *to, void *dst, size_t dst_size,
+			void *src, size_t src_size);
+
 int perf_event__process_id_index(struct perf_session *session,
 				 union perf_event *event);
 

^ permalink raw reply related	[flat|nested] 67+ messages in thread

* [PATCH v2 4/4] perf report: support record trace file decompression
  2019-02-11 20:17 [PATCH v2 0/4] perf: enable compression of record mode trace to save storage space Alexey Budankov
                   ` (2 preceding siblings ...)
  2019-02-11 20:23 ` [PATCH v2 3/4] perf record: enable runtime trace compression Alexey Budankov
@ 2019-02-11 20:25 ` Alexey Budankov
  2019-02-12 13:08   ` Jiri Olsa
                     ` (3 more replies)
  2019-02-12 12:27 ` [PATCH v2 0/4] perf: enable compression of record mode trace to save storage space Arnaldo Carvalho de Melo
  4 siblings, 4 replies; 67+ messages in thread
From: Alexey Budankov @ 2019-02-11 20:25 UTC (permalink / raw)
  To: Arnaldo Carvalho de Melo, Ingo Molnar, Peter Zijlstra
  Cc: Jiri Olsa, Namhyung Kim, Alexander Shishkin, Andi Kleen, linux-kernel


PERF_RECORD_COMPRESSED records are decompressed from trace file into
a linked list of mmaped memory regions using streaming Zstandard API.
After that the regions are loaded fetching uncompressed events. When
dumping raw trace (e.g., perf report -D --header) file offsets of
events from compressed records are set to zero.

Signed-off-by: Alexey Budankov <alexey.budankov@linux.intel.com>
---
Changes in v2:
- moved compression/decompression code to session layer

---
 tools/perf/builtin-report.c |   5 +-
 tools/perf/util/session.c   | 165 +++++++++++++++++++++++++++++++++++-
 tools/perf/util/session.h   |  11 +++
 tools/perf/util/tool.h      |   2 +
 4 files changed, 181 insertions(+), 2 deletions(-)

diff --git a/tools/perf/builtin-report.c b/tools/perf/builtin-report.c
index 2e8c74d6430c..3bfb74c11875 100644
--- a/tools/perf/builtin-report.c
+++ b/tools/perf/builtin-report.c
@@ -1215,6 +1215,9 @@ int cmd_report(int argc, const char **argv)
 	if (session == NULL)
 		return -1;
 
+	if (perf_session__zstd_init(session, 0) < 0)
+		pr_warning("Decompression initialization failed. Reported data may be incomplete.\n");
+
 	if (report.queue_size) {
 		ordered_events__set_alloc_size(&session->ordered_events,
 					       report.queue_size);
@@ -1427,7 +1430,7 @@ int cmd_report(int argc, const char **argv)
 
 error:
 	zfree(&report.ptime_range);
-
+	perf_session__zstd_fini(session);
 	perf_session__delete(session);
 	return ret;
 }
diff --git a/tools/perf/util/session.c b/tools/perf/util/session.c
index 5d406eebd058..c501cb904d95 100644
--- a/tools/perf/util/session.c
+++ b/tools/perf/util/session.c
@@ -34,6 +34,21 @@ int perf_session__zstd_init(struct perf_session *session, int level)
 {
 	size_t ret;
 
+	session->zstd_dstream = ZSTD_createDStream();
+	if (session->zstd_dstream == NULL) {
+		pr_err("Couldn't create decompression stream.\n");
+		return -1;
+	}
+
+	ret = ZSTD_initDStream(session->zstd_dstream);
+	if (ZSTD_isError(ret)) {
+		pr_err("Failed to initialize decompression stream: %s\n", ZSTD_getErrorName(ret));
+		return -1;
+	}
+
+	if (level == 0)
+		return 0;
+
 	session->header.env.comp_type  = PERF_COMP_NONE;
 	session->header.env.comp_level = 0;
 
@@ -57,6 +72,22 @@ int perf_session__zstd_init(struct perf_session *session, int level)
 
 int perf_session__zstd_fini(struct perf_session *session)
 {
+	struct decomp *next = session->decomp, *decomp;
+	size_t decomp_len = session->header.env.comp_mmap_len;
+
+	if (session->zstd_dstream) {
+		ZSTD_freeDStream(session->zstd_dstream);
+		session->zstd_dstream = NULL;
+	}
+
+	do {
+		decomp = next;
+		if (decomp == NULL)
+			break;
+		next = decomp->next;
+		munmap(decomp, decomp_len + sizeof(struct decomp));
+	} while (1);
+
 	if (session->zstd_cstream) {
 		ZSTD_freeCStream(session->zstd_cstream);
 		session->zstd_cstream = NULL;
@@ -108,6 +139,80 @@ size_t perf_session__zstd_compress(void *to,  void *dst, size_t dst_size,
 
 	return compressed;
 }
+
+static size_t perf_session__zstd_decompress(struct perf_session *session,
+					void *src, size_t src_size,
+					void *dst, size_t dst_size)
+{
+	size_t ret;
+	ZSTD_inBuffer input = { src, src_size, 0 };
+	ZSTD_outBuffer output = { dst, dst_size, 0 };
+
+	while (input.pos < input.size) {
+		ret = ZSTD_decompressStream(session->zstd_dstream, &output, &input);
+		if (ZSTD_isError(ret)) {
+			pr_err("failed to decompress (B): %ld -> %ld : %s\n",
+				src_size, output.size, ZSTD_getErrorName(ret));
+			break;
+		}
+		output.dst  = dst + output.pos;
+		output.size = dst_size - output.pos;
+	}
+
+	return output.pos;
+}
+
+static int perf_session__process_compressed_event(struct perf_session *session,
+					union perf_event *event, u64 file_offset)
+{
+	void *src;
+	size_t decomp_size, src_size;
+	u64 decomp_last_rem = 0;
+	size_t decomp_len = session->header.env.comp_mmap_len;
+	struct decomp *decomp, *decomp_last = session->decomp_last;
+
+	decomp = mmap(NULL, sizeof(struct decomp) + decomp_len, PROT_READ|PROT_WRITE,
+		      MAP_ANONYMOUS|MAP_PRIVATE, -1, 0);
+	if (decomp == MAP_FAILED) {
+		pr_err("Couldn't allocate memory for decompression\n");
+		return -1;
+	}
+
+	decomp->file_pos = file_offset;
+	decomp->head = 0;
+
+	if (decomp_last) {
+		decomp_last_rem = decomp_last->size - decomp_last->head;
+		memcpy(decomp->data, &(decomp_last->data[decomp_last->head]), decomp_last_rem);
+		decomp->size = decomp_last_rem;
+	}
+
+	src = (void *)event + sizeof(struct compressed_event);
+	src_size = event->pack.header.size - sizeof(struct compressed_event);
+
+	decomp_size = perf_session__zstd_decompress(session, src, src_size,
+				&(decomp->data[decomp_last_rem]), decomp_len - decomp_last_rem);
+	if (!decomp_size) {
+		munmap(decomp, sizeof(struct decomp) + decomp_len);
+		pr_err("Couldn't decompress data\n");
+		return -1;
+	}
+
+	decomp->size += decomp_size;
+
+	if (session->decomp == NULL) {
+		session->decomp = decomp;
+		session->decomp_last = decomp;
+	} else {
+		session->decomp_last->next = decomp;
+		session->decomp_last = decomp;
+	}
+
+	pr_debug("decomp (B): %ld to %ld\n", src_size, decomp_size);
+
+	return 0;
+}
+
 #else /* !HAVE_ZSTD_SUPPORT */
 int perf_session__zstd_init(struct perf_session *session __maybe_unused, int level __maybe_unused)
 {
@@ -125,6 +230,14 @@ size_t perf_session__zstd_compress(void *to __maybe_unused,
 {
 	return 0;
 }
+
+static int perf_session__process_compressed_event(struct perf_session *session __maybe_unused,
+				union perf_event *event __maybe_unused,
+				u64 file_offset __maybe_unused)
+{
+	dump_printf(": unhandled!\n");
+	return 0;
+}
 #endif
 
 size_t perf_session__zstd_copy(void *to __maybe_unused,
@@ -533,6 +646,8 @@ void perf_tool__fill_defaults(struct perf_tool *tool)
 		tool->time_conv = process_event_op2_stub;
 	if (tool->feature == NULL)
 		tool->feature = process_event_op2_stub;
+	if (tool->compressed == NULL)
+		tool->compressed = perf_session__process_compressed_event;
 }
 
 static void swap_sample_id_all(union perf_event *event, void *data)
@@ -1469,7 +1584,8 @@ static s64 perf_session__process_user_event(struct perf_session *session,
 	int fd = perf_data__fd(session->data);
 	int err;
 
-	dump_event(session->evlist, event, file_offset, &sample);
+	if (event->header.type != PERF_RECORD_COMPRESSED)
+		dump_event(session->evlist, event, file_offset, &sample);
 
 	/* These events are processed right away */
 	switch (event->header.type) {
@@ -1522,6 +1638,11 @@ static s64 perf_session__process_user_event(struct perf_session *session,
 		return tool->time_conv(session, event);
 	case PERF_RECORD_HEADER_FEATURE:
 		return tool->feature(session, event);
+	case PERF_RECORD_COMPRESSED:
+		err = tool->compressed(session, event, file_offset);
+		if (err)
+			dump_event(session->evlist, event, file_offset, &sample);
+		return 0;
 	default:
 		return -EINVAL;
 	}
@@ -1804,6 +1925,8 @@ static int perf_session__flush_thread_stacks(struct perf_session *session)
 
 volatile int session_done;
 
+static int __perf_session__process_decomp_events(struct perf_session *session);
+
 static int __perf_session__process_pipe_events(struct perf_session *session)
 {
 	struct ordered_events *oe = &session->ordered_events;
@@ -1884,6 +2007,10 @@ static int __perf_session__process_pipe_events(struct perf_session *session)
 	if (skip > 0)
 		head += skip;
 
+	err = __perf_session__process_decomp_events(session);
+	if (err)
+		goto out_err;
+
 	if (!session_done())
 		goto more;
 done:
@@ -1932,6 +2059,38 @@ fetch_mmaped_event(struct perf_session *session,
 	return event;
 }
 
+static int __perf_session__process_decomp_events(struct perf_session *session)
+{
+	s64 skip;
+	u64 size, file_pos = 0;
+	union perf_event *event;
+	struct decomp *decomp = session->decomp_last;
+
+	if (!decomp)
+		return 0;
+
+	while (decomp->head < decomp->size && !session_done()) {
+		event = fetch_mmaped_event(session, decomp->head, decomp->size, decomp->data);
+		if (!event)
+			break;
+
+		size = event->header.size;
+		if (size < sizeof(struct perf_event_header) ||
+		    (skip = perf_session__process_event(session, event, file_pos)) < 0) {
+			pr_err("%#" PRIx64 " [%#x]: failed to process type: %d\n",
+				decomp->file_pos + decomp->head, event->header.size, event->header.type);
+			return -EINVAL;
+		}
+
+		if (skip)
+			size += skip;
+
+		decomp->head += size;
+	}
+
+	return 0;
+}
+
 /*
  * On 64bit we can mmap the data file in one go. No need for tiny mmap
  * slices. On 32bit we use 32MB.
@@ -2032,6 +2191,10 @@ reader__process_events(struct reader *rd, struct perf_session *session,
 	head += size;
 	file_pos += size;
 
+	err = __perf_session__process_decomp_events(session);
+	if (err)
+		goto out;
+
 	ui_progress__update(prog, size);
 
 	if (session_done())
diff --git a/tools/perf/util/session.h b/tools/perf/util/session.h
index d8f3284cd838..06a0536adbe0 100644
--- a/tools/perf/util/session.h
+++ b/tools/perf/util/session.h
@@ -42,7 +42,18 @@ struct perf_session {
 	u64			bytes_compressed;
 #ifdef HAVE_ZSTD_SUPPORT
 	ZSTD_CStream		*zstd_cstream;
+	ZSTD_DStream		*zstd_dstream;
 #endif
+	struct decomp		*decomp;
+	struct decomp		*decomp_last;
+};
+
+struct decomp {
+	struct decomp *next;
+	u64 file_pos;
+	u64 head;
+	size_t size;
+	char data[];
 };
 
 struct perf_tool;
diff --git a/tools/perf/util/tool.h b/tools/perf/util/tool.h
index 250391672f9f..9096a6e3de59 100644
--- a/tools/perf/util/tool.h
+++ b/tools/perf/util/tool.h
@@ -28,6 +28,7 @@ typedef int (*event_attr_op)(struct perf_tool *tool,
 
 typedef int (*event_op2)(struct perf_session *session, union perf_event *event);
 typedef s64 (*event_op3)(struct perf_session *session, union perf_event *event);
+typedef int (*event_op4)(struct perf_session *session, union perf_event *event, u64 data);
 
 typedef int (*event_oe)(struct perf_tool *tool, union perf_event *event,
 			struct ordered_events *oe);
@@ -72,6 +73,7 @@ struct perf_tool {
 			stat,
 			stat_round,
 			feature;
+	event_op4	compressed;
 	event_op3	auxtrace;
 	bool		ordered_events;
 	bool		ordering_requires_timestamps;

^ permalink raw reply related	[flat|nested] 67+ messages in thread

* Re: [PATCH v2 0/4] perf: enable compression of record mode trace to save storage space
  2019-02-11 20:17 [PATCH v2 0/4] perf: enable compression of record mode trace to save storage space Alexey Budankov
                   ` (3 preceding siblings ...)
  2019-02-11 20:25 ` [PATCH v2 4/4] perf report: support record trace file decompression Alexey Budankov
@ 2019-02-12 12:27 ` Arnaldo Carvalho de Melo
  2019-02-12 14:06   ` Alexey Budankov
  4 siblings, 1 reply; 67+ messages in thread
From: Arnaldo Carvalho de Melo @ 2019-02-12 12:27 UTC (permalink / raw)
  To: Alexey Budankov
  Cc: Ingo Molnar, Peter Zijlstra, Jiri Olsa, Namhyung Kim,
	Alexander Shishkin, Andi Kleen, linux-kernel

Em Mon, Feb 11, 2019 at 11:17:14PM +0300, Alexey Budankov escreveu:
> This is the rebase to the tip of Arnaldo's perf/core repository.
 
> The patch set implements runtime trace compression for record mode and 
> trace file decompression for report mode. Zstandard API [1] is used for 
> compression/decompression of data that come from perf_events kernel 
> data buffers.
 
> Realized -z,--compression_level=n option provides ~3-5x avg. trace file 

Realized? Perhaps "Using the -z/--compression_level=n options provides
3.5x average size reduction on..."

> size reduction on variety of tested workloads what saves user storage 
> space on larger server systems where trace file size can easily reach 
> several tens or even hundreds of GiBs, especially when profiling with 
> dwarf-based stacks and tracing of  context-switches.
 
> --mmap-flush option can be used to avoid compressing every single byte 

What is the default for --mmap-flush when -z is used? Does it vary with
the compression level? Can we expect something that is reasonable or is
using --mmap-flush always needed? If so can we do away with that by
using some sensible default according to the -z level used?

> of data and increase compression ratio at the same time lowering tool 
> runtime overhead.
> 
>   $ tools/perf/perf record -z 1 -e cycles -- matrix.gcc
>   $ tools/perf/perf record -z 1 --mmap-flush 1024 -e cycles -- matrix.gcc
>   $ tools/perf/perf record -z 1 --mmap-flush 1024 --aio -e cycles -- matrix.gcc

Please avoid having these explanations just on the cover letter since it
will not get recorded in the git history for the project, please add
examples with output from real workloads, preferably providing real
cases so that other people can readily replicate your results when
having access to similar large machines.
 
> The compression functionality can be disabled from the command line 
> using NO_LIBZSTD define and Zstandard sources can be overridden using 
> value of LIBZSTD_DIR define:
> 
>   $ make -C tools/perf NO_LIBZSTD=1 clean all
>   $ make -C tools/perf LIBZSTD_DIR=/path/to/zstd-1.3.7 clean all

Showing the relevant output of the these commands before/after
installing the libzstd library is useful as well. From what you use
above is it fair to conclude that there are no readily made packages for
this library and we need to get it from github and build it ourselves?

On my notebook, a Fedora 29 system:

[root@quaco ~]# dnf search zstd
========================================================== Name & Summary Matched: zstd ==========================================================
zstd.x86_64 : Zstd compression library
zstd.x86_64 : Zstd compression library
libzstd.x86_64 : Zstd shared library
libzstd.i686 : Zstd shared library
libzstd.x86_64 : Zstd shared library
libzstd-devel.i686 : Header files for Zstd library
libzstd-devel.x86_64 : Header files for Zstd library
============================================================= Summary Matched: zstd ==============================================================
fedora-repo-zdicts.noarch : Zstd dictionaries for Fedora repository metadata
[root@quaco ~]#
[root@quaco ~]# dnf install libzstd-devel
<SNIP>
 libzstd-devel                         x86_64                         1.3.8-1.fc29                          updates                          39 k
<SNIP>

So, do I need some specific 1.3.7 version? Is that why you used that
LIBZSTD_DIR in your example?

I'm trying these steps, but please consider following these suggestions
in future posts,

And thanks for working on this!

- Arnaldo
 
> ---
> Alexey Budankov (4):
>   feature: realize libzstd check, LIBZSTD_DIR and NO_LIBZSTD defines
>   perf record: implement -z=<level> and --mmap-flush=<thres> options
>   perf record: enable runtime trace compression
>   perf report: support record trace file decompression
> 
>  tools/build/Makefile.feature             |   6 +-
>  tools/build/feature/Makefile             |   6 +-
>  tools/build/feature/test-all.c           |   5 +
>  tools/build/feature/test-libzstd.c       |  12 +
>  tools/perf/Documentation/perf-record.txt |   9 +
>  tools/perf/Makefile.config               |  20 ++
>  tools/perf/Makefile.perf                 |   3 +
>  tools/perf/builtin-record.c              | 167 +++++++++++---
>  tools/perf/builtin-report.c              |   5 +-
>  tools/perf/perf.h                        |   2 +
>  tools/perf/util/env.h                    |  10 +
>  tools/perf/util/event.c                  |   1 +
>  tools/perf/util/event.h                  |   7 +
>  tools/perf/util/evlist.c                 |   6 +-
>  tools/perf/util/evlist.h                 |   3 +-
>  tools/perf/util/header.c                 |  45 +++-
>  tools/perf/util/header.h                 |   1 +
>  tools/perf/util/mmap.c                   | 265 +++++++++++++---------
>  tools/perf/util/mmap.h                   |  31 ++-
>  tools/perf/util/session.c                | 271 ++++++++++++++++++++++-
>  tools/perf/util/session.h                |  26 +++
>  tools/perf/util/tool.h                   |   2 +
>  22 files changed, 742 insertions(+), 161 deletions(-)
>  create mode 100644 tools/build/feature/test-libzstd.c
> 
> ---
> Changes in v2:
> - moved compression/decompression code to session layer
> - enabled allocation aio data buffers for compression
> - enabled trace compression for serial trace streaming
> 
> ---
> [1] https://github.com/facebook/zstd

-- 

- Arnaldo

^ permalink raw reply	[flat|nested] 67+ messages in thread

* Re: [PATCH v2 2/4] perf record: implement -z=<level> and --mmap-flush=<thres> options
  2019-02-11 20:22 ` [PATCH v2 2/4] perf record: implement -z=<level> and --mmap-flush=<thres> options Alexey Budankov
@ 2019-02-12 13:08   ` Jiri Olsa
  2019-02-20 14:13     ` Alexey Budankov
  2019-02-12 13:08   ` Jiri Olsa
                     ` (6 subsequent siblings)
  7 siblings, 1 reply; 67+ messages in thread
From: Jiri Olsa @ 2019-02-12 13:08 UTC (permalink / raw)
  To: Alexey Budankov
  Cc: Arnaldo Carvalho de Melo, Ingo Molnar, Peter Zijlstra,
	Namhyung Kim, Alexander Shishkin, Andi Kleen, linux-kernel

On Mon, Feb 11, 2019 at 11:22:38PM +0300, Alexey Budankov wrote:

SNIP

>  static int process_synthesized_event(struct perf_tool *tool,
>  				     union perf_event *event,
>  				     struct perf_sample *sample __maybe_unused,
> @@ -543,7 +574,8 @@ static int record__mmap_evlist(struct record *rec,
>  	if (perf_evlist__mmap_ex(evlist, opts->mmap_pages,
>  				 opts->auxtrace_mmap_pages,
>  				 opts->auxtrace_snapshot_mode,
> -				 opts->nr_cblocks, opts->affinity) < 0) {
> +				 opts->nr_cblocks, opts->affinity,
> +				 opts->mmap_flush) < 0) {
>  		if (errno == EPERM) {
>  			pr_err("Permission error mapping pages.\n"
>  			       "Consider increasing "
> @@ -734,7 +766,7 @@ static void record__adjust_affinity(struct record *rec, struct perf_mmap *map)
>  }
>  
>  static int record__mmap_read_evlist(struct record *rec, struct perf_evlist *evlist,
> -				    bool overwrite)
> +				    bool overwrite, bool sync)
>  {
>  	u64 bytes_written = rec->bytes_written;
>  	int i;
> @@ -757,12 +789,19 @@ static int record__mmap_read_evlist(struct record *rec, struct perf_evlist *evli
>  		off = record__aio_get_pos(trace_fd);
>  
>  	for (i = 0; i < evlist->nr_mmaps; i++) {
> +		u64 flush = MMAP_FLUSH_DEFAULT;
>  		struct perf_mmap *map = &maps[i];
>  
>  		if (map->base) {
>  			record__adjust_affinity(rec, map);
> +			if (sync) {
> +				flush = map->flush;
> +				map->flush = MMAP_FLUSH_DEFAULT;
> +			}
>  			if (!record__aio_enabled(rec)) {
>  				if (perf_mmap__push(map, rec, record__pushfn) != 0) {
> +					if (sync)
> +						map->flush = flush;
>  					rc = -1;
>  					goto out;
>  				}
> @@ -775,10 +814,14 @@ static int record__mmap_read_evlist(struct record *rec, struct perf_evlist *evli
>  				idx = record__aio_sync(map, false);
>  				if (perf_mmap__aio_push(map, rec, idx, record__aio_pushfn, &off) != 0) {
>  					record__aio_set_pos(trace_fd, off);
> +					if (sync)
> +						map->flush = flush;
>  					rc = -1;
>  					goto out;
>  				}
>  			}
> +			if (sync)
> +				map->flush = flush;
>  		}

what's 'flush' and 'sync' for? also the above handling seems confusing,
why would you set it temporarily for default value if you let it set
by user command line option?

SNIP

> -static void perf_mmap__aio_munmap(struct perf_mmap *map __maybe_unused)
> +static void perf_mmap__aio_munmap_blocks(struct perf_mmap *map __maybe_unused)
>  {
>  }
>  #endif
> @@ -444,6 +468,8 @@ int perf_mmap__mmap(struct perf_mmap *map, struct mmap_params *mp, int fd, int c
>  				&mp->auxtrace_mp, map->base, fd))
>  		return -1;
>  
> +	map->flush = mp->flush;
> +
>  	return perf_mmap__aio_mmap(map, mp);
>  }
>  
> @@ -492,7 +518,7 @@ static int __perf_mmap__read_init(struct perf_mmap *md)
>  	md->start = md->overwrite ? head : old;
>  	md->end = md->overwrite ? old : head;
>  
> -	if (md->start == md->end)
> +	if ((md->end - md->start) < md->flush)
>  		return -EAGAIN;

we need document and explain this change in changelog in separate patch

thanks,
jirka

^ permalink raw reply	[flat|nested] 67+ messages in thread

* Re: [PATCH v2 2/4] perf record: implement -z=<level> and --mmap-flush=<thres> options
  2019-02-11 20:22 ` [PATCH v2 2/4] perf record: implement -z=<level> and --mmap-flush=<thres> options Alexey Budankov
  2019-02-12 13:08   ` Jiri Olsa
@ 2019-02-12 13:08   ` Jiri Olsa
  2019-02-20 14:13     ` Alexey Budankov
  2019-02-12 13:08   ` Jiri Olsa
                     ` (5 subsequent siblings)
  7 siblings, 1 reply; 67+ messages in thread
From: Jiri Olsa @ 2019-02-12 13:08 UTC (permalink / raw)
  To: Alexey Budankov
  Cc: Arnaldo Carvalho de Melo, Ingo Molnar, Peter Zijlstra,
	Namhyung Kim, Alexander Shishkin, Andi Kleen, linux-kernel

On Mon, Feb 11, 2019 at 11:22:38PM +0300, Alexey Budankov wrote:

SNIP

> diff --git a/tools/perf/util/session.h b/tools/perf/util/session.h
> index d96eccd7d27f..0e14884f28b2 100644
> --- a/tools/perf/util/session.h
> +++ b/tools/perf/util/session.h
> @@ -35,6 +35,8 @@ struct perf_session {
>  	struct ordered_events	ordered_events;
>  	struct perf_data	*data;
>  	struct perf_tool	*tool;
> +	u64			bytes_transferred;
> +	u64			bytes_compressed;

please add those in separate patch, so we can see how they are
incremented and displayed easily

thanks,
jirka

^ permalink raw reply	[flat|nested] 67+ messages in thread

* Re: [PATCH v2 2/4] perf record: implement -z=<level> and --mmap-flush=<thres> options
  2019-02-11 20:22 ` [PATCH v2 2/4] perf record: implement -z=<level> and --mmap-flush=<thres> options Alexey Budankov
  2019-02-12 13:08   ` Jiri Olsa
  2019-02-12 13:08   ` Jiri Olsa
@ 2019-02-12 13:08   ` Jiri Olsa
  2019-02-20 15:24     ` Alexey Budankov
  2019-02-12 13:08   ` Jiri Olsa
                     ` (4 subsequent siblings)
  7 siblings, 1 reply; 67+ messages in thread
From: Jiri Olsa @ 2019-02-12 13:08 UTC (permalink / raw)
  To: Alexey Budankov
  Cc: Arnaldo Carvalho de Melo, Ingo Molnar, Peter Zijlstra,
	Namhyung Kim, Alexander Shishkin, Andi Kleen, linux-kernel

On Mon, Feb 11, 2019 at 11:22:38PM +0300, Alexey Budankov wrote:

SNIP

> +static int perf_mmap__aio_mmap_blocks(struct perf_mmap *map);
> +
>  static int perf_mmap__aio_mmap(struct perf_mmap *map, struct mmap_params *mp)
>  {
> -	int delta_max, i, prio, ret;
> +	int i, ret = 0, init_blocks = 1;
>  
>  	map->aio.nr_cblocks = mp->nr_cblocks;
> +	if (map->aio.nr_cblocks == -1) {
> +		map->aio.nr_cblocks = 1;
> +		init_blocks = 0;
> +	}
> +
>  	if (map->aio.nr_cblocks) {
> -		map->aio.aiocb = calloc(map->aio.nr_cblocks, sizeof(struct aiocb *));
> -		if (!map->aio.aiocb) {
> -			pr_debug2("failed to allocate aiocb for data buffer, error %m\n");
> -			return -1;
> -		}
> -		map->aio.cblocks = calloc(map->aio.nr_cblocks, sizeof(struct aiocb));
> -		if (!map->aio.cblocks) {
> -			pr_debug2("failed to allocate cblocks for data buffer, error %m\n");
> -			return -1;
> -		}
>  		map->aio.data = calloc(map->aio.nr_cblocks, sizeof(void *));
>  		if (!map->aio.data) {
>  			pr_debug2("failed to allocate data buffer, error %m\n");
>  			return -1;
>  		}
> -		delta_max = sysconf(_SC_AIO_PRIO_DELTA_MAX);
>  		for (i = 0; i < map->aio.nr_cblocks; ++i) {
>  			ret = perf_mmap__aio_alloc(map, i);
>  			if (ret == -1) {
> @@ -251,29 +245,16 @@ static int perf_mmap__aio_mmap(struct perf_mmap *map, struct mmap_params *mp)
>  			ret = perf_mmap__aio_bind(map, i, map->cpu, mp->affinity);
>  			if (ret == -1)
>  				return -1;
> -			/*
> -			 * Use cblock.aio_fildes value different from -1
> -			 * to denote started aio write operation on the
> -			 * cblock so it requires explicit record__aio_sync()
> -			 * call prior the cblock may be reused again.
> -			 */
> -			map->aio.cblocks[i].aio_fildes = -1;
> -			/*
> -			 * Allocate cblocks with priority delta to have
> -			 * faster aio write system calls because queued requests
> -			 * are kept in separate per-prio queues and adding
> -			 * a new request will iterate thru shorter per-prio
> -			 * list. Blocks with numbers higher than
> -			 *  _SC_AIO_PRIO_DELTA_MAX go with priority 0.
> -			 */
> -			prio = delta_max - i;
> -			map->aio.cblocks[i].aio_reqprio = prio >= 0 ? prio : 0;
>  		}
> +		if (init_blocks)
> +			ret = perf_mmap__aio_mmap_blocks(map);
>  	}
>  
> -	return 0;
> +	return ret;
>  }

SNIP

it seems like little refactoring happened in here (up and down) for
aio code, which is not explained and I'm unable to follow it.. please
separate this in simple change

SNIP

> +#ifdef HAVE_AIO_SUPPORT
> +static int perf_mmap__aio_mmap_blocks(struct perf_mmap *map)
> +{
> +	int delta_max, i, prio;
> +
> +	map->aio.aiocb = calloc(map->aio.nr_cblocks, sizeof(struct aiocb *));
> +	if (!map->aio.aiocb) {
> +		pr_debug2("failed to allocate aiocb for data buffer, error %m\n");
> +		return -1;
> +	}
> +	map->aio.cblocks = calloc(map->aio.nr_cblocks, sizeof(struct aiocb));
> +	if (!map->aio.cblocks) {
> +		pr_debug2("failed to allocate cblocks for data buffer, error %m\n");
> +		return -1;
> +	}
> +	delta_max = sysconf(_SC_AIO_PRIO_DELTA_MAX);
> +	for (i = 0; i < map->aio.nr_cblocks; ++i) {
> +		/*
> +		 * Use cblock.aio_fildes value different from -1
> +		 * to denote started aio write operation on the
> +		 * cblock so it requires explicit record__aio_sync()
> +		 * call prior the cblock may be reused again.
> +		 */
> +		map->aio.cblocks[i].aio_fildes = -1;
> +		/*
> +		 * Allocate cblocks with priority delta to have
> +		 * faster aio write system calls because queued requests
> +		 * are kept in separate per-prio queues and adding
> +		 * a new request will iterate thru shorter per-prio
> +		 * list. Blocks with numbers higher than
> +		 *  _SC_AIO_PRIO_DELTA_MAX go with priority 0.
> +		 */
> +		prio = delta_max - i;
> +		map->aio.cblocks[i].aio_reqprio = prio >= 0 ? prio : 0;
> +	}
> +
> +	return 0;
> +}
> +

SNIP

^ permalink raw reply	[flat|nested] 67+ messages in thread

* Re: [PATCH v2 2/4] perf record: implement -z=<level> and --mmap-flush=<thres> options
  2019-02-11 20:22 ` [PATCH v2 2/4] perf record: implement -z=<level> and --mmap-flush=<thres> options Alexey Budankov
                     ` (2 preceding siblings ...)
  2019-02-12 13:08   ` Jiri Olsa
@ 2019-02-12 13:08   ` Jiri Olsa
  2019-02-20 14:15     ` Alexey Budankov
  2019-02-12 13:08   ` Jiri Olsa
                     ` (3 subsequent siblings)
  7 siblings, 1 reply; 67+ messages in thread
From: Jiri Olsa @ 2019-02-12 13:08 UTC (permalink / raw)
  To: Alexey Budankov
  Cc: Arnaldo Carvalho de Melo, Ingo Molnar, Peter Zijlstra,
	Namhyung Kim, Alexander Shishkin, Andi Kleen, linux-kernel

On Mon, Feb 11, 2019 at 11:22:38PM +0300, Alexey Budankov wrote:

SNIP

> -	if (rec->opts.nr_cblocks > nr_cblocks_max)
> -		rec->opts.nr_cblocks = nr_cblocks_max;
> -	if (verbose > 0)
> -		pr_info("nr_cblocks: %d\n", rec->opts.nr_cblocks);
> +	if (rec->opts.comp_level > 22)
> +		rec->opts.comp_level = 0;
> +	if (record__comp_enabled(rec) && !rec->opts.nr_cblocks) {
> +		 /*
> +		  * Allocate aio.data[0] buffer for compression.
> +		  */
> +		rec->opts.nr_cblocks = -1;
> +	}

I assume you're using aio.data[0] as a buffer for non-aio case
as compression buffer.. if that's the case, please dont do that
and use separate buffer for that

jirka

^ permalink raw reply	[flat|nested] 67+ messages in thread

* Re: [PATCH v2 2/4] perf record: implement -z=<level> and --mmap-flush=<thres> options
  2019-02-11 20:22 ` [PATCH v2 2/4] perf record: implement -z=<level> and --mmap-flush=<thres> options Alexey Budankov
                     ` (3 preceding siblings ...)
  2019-02-12 13:08   ` Jiri Olsa
@ 2019-02-12 13:08   ` Jiri Olsa
  2019-02-20 14:13     ` Alexey Budankov
  2019-02-12 13:09   ` Jiri Olsa
                     ` (2 subsequent siblings)
  7 siblings, 1 reply; 67+ messages in thread
From: Jiri Olsa @ 2019-02-12 13:08 UTC (permalink / raw)
  To: Alexey Budankov
  Cc: Arnaldo Carvalho de Melo, Ingo Molnar, Peter Zijlstra,
	Namhyung Kim, Alexander Shishkin, Andi Kleen, linux-kernel

On Mon, Feb 11, 2019 at 11:22:38PM +0300, Alexey Budankov wrote:

SNIP

>  static void record__init_features(struct record *rec)
> @@ -838,6 +881,9 @@ static void record__init_features(struct record *rec)
>  	if (!(rec->opts.use_clockid && rec->opts.clockid_res_ns))
>  		perf_header__clear_feat(&session->header, HEADER_CLOCKID);
>  
> +	if (!record__comp_enabled(rec))
> +		perf_header__clear_feat(&session->header, HEADER_COMPRESSED);
> +
>  	perf_header__clear_feat(&session->header, HEADER_STAT);
>  }
>  
> @@ -1147,6 +1193,10 @@ static int __cmd_record(struct record *rec, int argc, const char **argv)
>  	fd = perf_data__fd(data);
>  	rec->session = session;
>  
> +	rec->opts.comp_level = 0;
> +	session->header.env.comp_level = rec->opts.comp_level;
> +	session->header.env.comp_type = PERF_COMP_NONE;

hum I think both record and session are zerod from start,
so this seems superfluous

SNIP

> diff --git a/tools/perf/util/header.c b/tools/perf/util/header.c
> index dec6d218c31c..5ad3a27a042f 100644
> --- a/tools/perf/util/header.c
> +++ b/tools/perf/util/header.c
> @@ -1463,6 +1463,21 @@ static int write_mem_topology(struct feat_fd *ff __maybe_unused,
>  	return ret;
>  }
>  
> +static int write_compressed(struct feat_fd *ff __maybe_unused,
> +			    struct perf_evlist *evlist __maybe_unused)
> +{
> +	int ret;
> +	u64 compression_info = ((u64)ff->ph->env.comp_type  << 32) | ff->ph->env.comp_level;
> +
> +	ret = do_write(ff, &compression_info, sizeof(compression_info));
> +	if (ret)
> +		return ret;
> +
> +	compression_info = ((u64)ff->ph->env.comp_ratio << 32) | ff->ph->env.comp_mmap_len;

let's not complicate this, if you have two u32 values
to store, store two u32 values ;-)

also please add version in case we will add new fields in the future,
so we can stay backward compatible

jirka

^ permalink raw reply	[flat|nested] 67+ messages in thread

* Re: [PATCH v2 3/4] perf record: enable runtime trace compression
  2019-02-11 20:23 ` [PATCH v2 3/4] perf record: enable runtime trace compression Alexey Budankov
@ 2019-02-12 13:08   ` Jiri Olsa
  2019-02-20 15:13     ` Alexey Budankov
  2019-02-12 13:08   ` Jiri Olsa
                     ` (4 subsequent siblings)
  5 siblings, 1 reply; 67+ messages in thread
From: Jiri Olsa @ 2019-02-12 13:08 UTC (permalink / raw)
  To: Alexey Budankov
  Cc: Arnaldo Carvalho de Melo, Ingo Molnar, Peter Zijlstra,
	Namhyung Kim, Alexander Shishkin, Andi Kleen, linux-kernel

On Mon, Feb 11, 2019 at 11:23:40PM +0300, Alexey Budankov wrote:

SNIP

> diff --git a/tools/perf/util/session.c b/tools/perf/util/session.c
> index 18fb9c8cbf9c..5d406eebd058 100644
> --- a/tools/perf/util/session.c
> +++ b/tools/perf/util/session.c
> @@ -29,6 +29,112 @@
>  #include "stat.h"
>  #include "arch/common.h"
>  

I think this should not be in session.c but separated in new object

> +#ifdef HAVE_ZSTD_SUPPORT
> +int perf_session__zstd_init(struct perf_session *session, int level)
> +{
> +	size_t ret;
> +
> +	session->header.env.comp_type  = PERF_COMP_NONE;
> +	session->header.env.comp_level = 0;
> +
> +	session->zstd_cstream = ZSTD_createCStream();
> +	if (session->zstd_cstream == NULL) {
> +		pr_err("Couldn't create compression stream.\n");
> +		return -1;
> +	}
> +
> +	ret = ZSTD_initCStream(session->zstd_cstream, level);
> +	if (ZSTD_isError(ret)) {
> +		pr_err("Failed to initialize compression stream: %s\n", ZSTD_getErrorName(ret));
> +		return -1;
> +	}
> +
> +	session->header.env.comp_type  = PERF_COMP_ZSTD;
> +	session->header.env.comp_level = level;
> +
> +	return 0;
> +}
> +
> +int perf_session__zstd_fini(struct perf_session *session)
> +{
> +	if (session->zstd_cstream) {
> +		ZSTD_freeCStream(session->zstd_cstream);
> +		session->zstd_cstream = NULL;
> +	}
> +
> +	return 0;
> +}
> +
> +size_t perf_session__zstd_compress(void *to,  void *dst, size_t dst_size,
> +				   void *src, size_t src_size)
> +{
> +	struct perf_session *session = to;
> +	size_t ret, size, compressed = 0;
> +	struct compressed_event *event = NULL;
> +	/* maximum size of record data size (2^16 - 1 - header) */
> +	const size_t max_data_size = (1 << 8 * sizeof(event->header.size)) -
> +				      1 - sizeof(struct compressed_event);
> +	ZSTD_inBuffer input = { src, src_size, 0 };
> +	ZSTD_outBuffer output;
> +
> +	while (input.pos < input.size) {
> +		event = dst;
> +
> +		event->header.type = PERF_RECORD_COMPRESSED;
> +		event->header.size = size = sizeof(struct compressed_event);
> +		compressed += size;
> +		dst += size;
> +		dst_size -= size;
> +
> +		output = (ZSTD_outBuffer){ dst, (dst_size > max_data_size) ?
> +						max_data_size : dst_size, 0 };
> +		ret = ZSTD_compressStream(session->zstd_cstream, &output, &input);
> +		ZSTD_flushStream(session->zstd_cstream, &output);
> +		if (ZSTD_isError(ret)) {
> +			pr_err("failed to compress %ld bytes: %s\n",
> +				(long)src_size, ZSTD_getErrorName(ret));
> +			return perf_session__zstd_copy(session, dst, dst_size, src, src_size);
> +		}
> +		size = output.pos;
> +
> +		event->header.size += size;
> +		compressed += size;
> +		dst += size;
> +		dst_size -= size;
> +	}
> +
> +	session->bytes_transferred += src_size;
> +	session->bytes_compressed  += compressed;
> +
> +	return compressed;
> +}
> +#else /* !HAVE_ZSTD_SUPPORT */
> +int perf_session__zstd_init(struct perf_session *session __maybe_unused, int level __maybe_unused)
> +{
> +	return 0;
> +}
> +
> +int perf_session__zstd_fini(struct perf_session *session __maybe_unused)
> +{
> +	return 0;
> +}
> +
> +size_t perf_session__zstd_compress(void *to __maybe_unused,
> +				void *dst __maybe_unused, size_t dst_size __maybe_unused,
> +				void *src __maybe_unused, size_t src_size __maybe_unused)
> +{
> +	return 0;
> +}
> +#endif
> +
> +size_t perf_session__zstd_copy(void *to __maybe_unused,
> +			void *dst, size_t dst_size __maybe_unused,
> +			void *src, size_t src_size)
> +{
> +	memcpy(dst, src, src_size);
> +	return src_size;
> +}
> +

SNIP

^ permalink raw reply	[flat|nested] 67+ messages in thread

* Re: [PATCH v2 4/4] perf report: support record trace file decompression
  2019-02-11 20:25 ` [PATCH v2 4/4] perf report: support record trace file decompression Alexey Budankov
@ 2019-02-12 13:08   ` Jiri Olsa
  2019-02-20 15:19     ` Alexey Budankov
  2019-02-12 13:09   ` Jiri Olsa
                     ` (2 subsequent siblings)
  3 siblings, 1 reply; 67+ messages in thread
From: Jiri Olsa @ 2019-02-12 13:08 UTC (permalink / raw)
  To: Alexey Budankov
  Cc: Arnaldo Carvalho de Melo, Ingo Molnar, Peter Zijlstra,
	Namhyung Kim, Alexander Shishkin, Andi Kleen, linux-kernel

On Mon, Feb 11, 2019 at 11:25:00PM +0300, Alexey Budankov wrote:

SNIP

>  size_t perf_session__zstd_copy(void *to __maybe_unused,
> @@ -533,6 +646,8 @@ void perf_tool__fill_defaults(struct perf_tool *tool)
>  		tool->time_conv = process_event_op2_stub;
>  	if (tool->feature == NULL)
>  		tool->feature = process_event_op2_stub;
> +	if (tool->compressed == NULL)
> +		tool->compressed = perf_session__process_compressed_event;
>  }
>  
>  static void swap_sample_id_all(union perf_event *event, void *data)
> @@ -1469,7 +1584,8 @@ static s64 perf_session__process_user_event(struct perf_session *session,
>  	int fd = perf_data__fd(session->data);
>  	int err;
>  
> -	dump_event(session->evlist, event, file_offset, &sample);
> +	if (event->header.type != PERF_RECORD_COMPRESSED)
> +		dump_event(session->evlist, event, file_offset, &sample);
>  
>  	/* These events are processed right away */
>  	switch (event->header.type) {
> @@ -1522,6 +1638,11 @@ static s64 perf_session__process_user_event(struct perf_session *session,
>  		return tool->time_conv(session, event);
>  	case PERF_RECORD_HEADER_FEATURE:
>  		return tool->feature(session, event);
> +	case PERF_RECORD_COMPRESSED:
> +		err = tool->compressed(session, event, file_offset);
> +		if (err)
> +			dump_event(session->evlist, event, file_offset, &sample);

I'm not sure about having compressed callback at all, but maybe
it could be usefull for inject, to get the compressed data..?

I assume inject is working now and it will get you the uncompressed
perf.data?

jirka

^ permalink raw reply	[flat|nested] 67+ messages in thread

* Re: [PATCH v2 3/4] perf record: enable runtime trace compression
  2019-02-11 20:23 ` [PATCH v2 3/4] perf record: enable runtime trace compression Alexey Budankov
  2019-02-12 13:08   ` Jiri Olsa
@ 2019-02-12 13:08   ` Jiri Olsa
  2019-02-20 14:53     ` Alexey Budankov
  2019-02-12 13:09   ` Jiri Olsa
                     ` (3 subsequent siblings)
  5 siblings, 1 reply; 67+ messages in thread
From: Jiri Olsa @ 2019-02-12 13:08 UTC (permalink / raw)
  To: Alexey Budankov
  Cc: Arnaldo Carvalho de Melo, Ingo Molnar, Peter Zijlstra,
	Namhyung Kim, Alexander Shishkin, Andi Kleen, linux-kernel

On Mon, Feb 11, 2019 at 11:23:40PM +0300, Alexey Budankov wrote:

SNIP

> @@ -774,6 +775,8 @@ static int record__mmap_read_evlist(struct record *rec, struct perf_evlist *evli
>  	struct perf_mmap *maps;
>  	int trace_fd = rec->data.file.fd;
>  	off_t off;
> +	struct perf_session *session = rec->session;
> +	perf_mmap__compress_fn_t compress_fn;
>  
>  	if (!evlist)
>  		return 0;
> @@ -785,6 +788,9 @@ static int record__mmap_read_evlist(struct record *rec, struct perf_evlist *evli
>  	if (overwrite && evlist->bkw_mmap_state != BKW_MMAP_DATA_PENDING)
>  		return 0;
>  
> +	compress_fn = (record__comp_enabled(rec) ?
> +		perf_session__zstd_compress : perf_session__zstd_copy);
> +

I don't follow what's the perf_session__zstd_copy function for..?

for !record__comp_enabled case we seem not to use it
and calling the current perf_mmap__push interface

however I dont see point to have this function at all

jirka


>  	if (record__aio_enabled(rec))
>  		off = record__aio_get_pos(trace_fd);
>  
> @@ -799,11 +805,21 @@ static int record__mmap_read_evlist(struct record *rec, struct perf_evlist *evli
>  				map->flush = MMAP_FLUSH_DEFAULT;
>  			}
>  			if (!record__aio_enabled(rec)) {
> -				if (perf_mmap__push(map, rec, record__pushfn) != 0) {
> -					if (sync)
> -						map->flush = flush;
> -					rc = -1;
> -					goto out;
> +				if (!record__comp_enabled(rec)) {
> +					if (perf_mmap__push(map, rec, record__pushfn) != 0) {
> +						if (sync)
> +							map->flush = flush;
> +						rc = -1;
> +						goto out;
> +					}
> +				} else {
> +					if (perf_mmap__pack(map, rec, record__pushfn,
> +							compress_fn, session) != 0) {
> +						if (sync)
> +							map->flush = flush;
> +						rc = -1;
> +						goto out;
> +					}

SNIP

^ permalink raw reply	[flat|nested] 67+ messages in thread

* Re: [PATCH v2 3/4] perf record: enable runtime trace compression
  2019-02-11 20:23 ` [PATCH v2 3/4] perf record: enable runtime trace compression Alexey Budankov
  2019-02-12 13:08   ` Jiri Olsa
  2019-02-12 13:08   ` Jiri Olsa
@ 2019-02-12 13:09   ` Jiri Olsa
  2019-02-20 15:09     ` Alexey Budankov
  2019-02-12 13:09   ` Jiri Olsa
                     ` (2 subsequent siblings)
  5 siblings, 1 reply; 67+ messages in thread
From: Jiri Olsa @ 2019-02-12 13:09 UTC (permalink / raw)
  To: Alexey Budankov
  Cc: Arnaldo Carvalho de Melo, Ingo Molnar, Peter Zijlstra,
	Namhyung Kim, Alexander Shishkin, Andi Kleen, linux-kernel

On Mon, Feb 11, 2019 at 11:23:40PM +0300, Alexey Budankov wrote:

SNIP

> +	compress_fn = (record__comp_enabled(rec) ?
> +		perf_session__zstd_compress : perf_session__zstd_copy);
> +
>  	if (record__aio_enabled(rec))
>  		off = record__aio_get_pos(trace_fd);
>  
> @@ -799,11 +805,21 @@ static int record__mmap_read_evlist(struct record *rec, struct perf_evlist *evli
>  				map->flush = MMAP_FLUSH_DEFAULT;
>  			}
>  			if (!record__aio_enabled(rec)) {
> -				if (perf_mmap__push(map, rec, record__pushfn) != 0) {
> -					if (sync)
> -						map->flush = flush;
> -					rc = -1;
> -					goto out;
> +				if (!record__comp_enabled(rec)) {
> +					if (perf_mmap__push(map, rec, record__pushfn) != 0) {
> +						if (sync)
> +							map->flush = flush;
> +						rc = -1;
> +						goto out;
> +					}
> +				} else {
> +					if (perf_mmap__pack(map, rec, record__pushfn,
> +							compress_fn, session) != 0) {
> +						if (sync)
> +							map->flush = flush;
> +						rc = -1;
> +						goto out;
> +					}

I really dont like this.. the non-compression flow needs to stay untouched

why don't you overload the record__pushfn function for compression
processing and stash the data in there?

thanks,
jirka

^ permalink raw reply	[flat|nested] 67+ messages in thread

* Re: [PATCH v2 3/4] perf record: enable runtime trace compression
  2019-02-11 20:23 ` [PATCH v2 3/4] perf record: enable runtime trace compression Alexey Budankov
                     ` (2 preceding siblings ...)
  2019-02-12 13:09   ` Jiri Olsa
@ 2019-02-12 13:09   ` Jiri Olsa
  2019-02-20 15:11     ` Alexey Budankov
  2019-02-12 13:09   ` Jiri Olsa
  2019-02-12 13:09   ` Jiri Olsa
  5 siblings, 1 reply; 67+ messages in thread
From: Jiri Olsa @ 2019-02-12 13:09 UTC (permalink / raw)
  To: Alexey Budankov
  Cc: Arnaldo Carvalho de Melo, Ingo Molnar, Peter Zijlstra,
	Namhyung Kim, Alexander Shishkin, Andi Kleen, linux-kernel

On Mon, Feb 11, 2019 at 11:23:40PM +0300, Alexey Budankov wrote:

SNIP

> diff --git a/tools/perf/util/mmap.c b/tools/perf/util/mmap.c
> index 239e9a13c2b7..980784b77fe2 100644
> --- a/tools/perf/util/mmap.c
> +++ b/tools/perf/util/mmap.c
> @@ -156,6 +156,86 @@ void __weak auxtrace_mmap_params__set_idx(struct auxtrace_mmap_params *mp __mayb
>  {
>  }
>  
> +static ssize_t perf_mmap__capture(struct perf_mmap *md, int idx,
> +			perf_mmap__compress_fn_t compress, void *where)
> +{
> +	u64 head = perf_mmap__read_head(md);
> +	unsigned char *data = md->base + page_size;
> +	unsigned long size, size0 = 0;
> +	void *buf;
> +	int rc = 0;
> +	size_t mmap_len = perf_mmap__mmap_len(md);

again, this function seems like duplicate of perf_mmap__push,
can't you just overload the push callback?

thanks,
jirka

^ permalink raw reply	[flat|nested] 67+ messages in thread

* Re: [PATCH v2 4/4] perf report: support record trace file decompression
  2019-02-11 20:25 ` [PATCH v2 4/4] perf report: support record trace file decompression Alexey Budankov
  2019-02-12 13:08   ` Jiri Olsa
@ 2019-02-12 13:09   ` Jiri Olsa
  2019-02-20 14:48     ` Alexey Budankov
  2019-02-12 13:09   ` Jiri Olsa
  2019-02-12 13:09   ` Jiri Olsa
  3 siblings, 1 reply; 67+ messages in thread
From: Jiri Olsa @ 2019-02-12 13:09 UTC (permalink / raw)
  To: Alexey Budankov
  Cc: Arnaldo Carvalho de Melo, Ingo Molnar, Peter Zijlstra,
	Namhyung Kim, Alexander Shishkin, Andi Kleen, linux-kernel

On Mon, Feb 11, 2019 at 11:25:00PM +0300, Alexey Budankov wrote:
> 
> PERF_RECORD_COMPRESSED records are decompressed from trace file into
> a linked list of mmaped memory regions using streaming Zstandard API.
> After that the regions are loaded fetching uncompressed events. When
> dumping raw trace (e.g., perf report -D --header) file offsets of
> events from compressed records are set to zero.
> 
> Signed-off-by: Alexey Budankov <alexey.budankov@linux.intel.com>
> ---
> Changes in v2:
> - moved compression/decompression code to session layer

could you please add some automated test for this?

I think it could be part of the sample synthesize test 
or even new script test under tests/shell.. or anything
else really.. just to see that we have some clean
interface for this that we can test

thanks,
jirka

^ permalink raw reply	[flat|nested] 67+ messages in thread

* Re: [PATCH v2 4/4] perf report: support record trace file decompression
  2019-02-11 20:25 ` [PATCH v2 4/4] perf report: support record trace file decompression Alexey Budankov
  2019-02-12 13:08   ` Jiri Olsa
  2019-02-12 13:09   ` Jiri Olsa
@ 2019-02-12 13:09   ` Jiri Olsa
  2019-02-20 14:46     ` Alexey Budankov
  2019-02-12 13:09   ` Jiri Olsa
  3 siblings, 1 reply; 67+ messages in thread
From: Jiri Olsa @ 2019-02-12 13:09 UTC (permalink / raw)
  To: Alexey Budankov
  Cc: Arnaldo Carvalho de Melo, Ingo Molnar, Peter Zijlstra,
	Namhyung Kim, Alexander Shishkin, Andi Kleen, linux-kernel

On Mon, Feb 11, 2019 at 11:25:00PM +0300, Alexey Budankov wrote:

SNIP

> +static int perf_session__process_compressed_event(struct perf_session *session,
> +					union perf_event *event, u64 file_offset)
> +{
> +	void *src;
> +	size_t decomp_size, src_size;
> +	u64 decomp_last_rem = 0;
> +	size_t decomp_len = session->header.env.comp_mmap_len;
> +	struct decomp *decomp, *decomp_last = session->decomp_last;
> +
> +	decomp = mmap(NULL, sizeof(struct decomp) + decomp_len, PROT_READ|PROT_WRITE,
> +		      MAP_ANONYMOUS|MAP_PRIVATE, -1, 0);
> +	if (decomp == MAP_FAILED) {
> +		pr_err("Couldn't allocate memory for decompression\n");
> +		return -1;
> +	}
> +
> +	decomp->file_pos = file_offset;
> +	decomp->head = 0;
> +
> +	if (decomp_last) {
> +		decomp_last_rem = decomp_last->size - decomp_last->head;
> +		memcpy(decomp->data, &(decomp_last->data[decomp_last->head]), decomp_last_rem);
> +		decomp->size = decomp_last_rem;
> +	}
> +
> +	src = (void *)event + sizeof(struct compressed_event);
> +	src_size = event->pack.header.size - sizeof(struct compressed_event);
> +
> +	decomp_size = perf_session__zstd_decompress(session, src, src_size,
> +				&(decomp->data[decomp_last_rem]), decomp_len - decomp_last_rem);
> +	if (!decomp_size) {
> +		munmap(decomp, sizeof(struct decomp) + decomp_len);
> +		pr_err("Couldn't decompress data\n");
> +		return -1;
> +	}
> +
> +	decomp->size += decomp_size;
> +
> +	if (session->decomp == NULL) {
> +		session->decomp = decomp;
> +		session->decomp_last = decomp;
> +	} else {
> +		session->decomp_last->next = decomp;
> +		session->decomp_last = decomp;
> +	}

can this happen? can you process more compressed events
before you deliver them in __perf_session__process_decomp_events?

I think I asked similar question in my other email..

thanks,
jirka

^ permalink raw reply	[flat|nested] 67+ messages in thread

* Re: [PATCH v2 3/4] perf record: enable runtime trace compression
  2019-02-11 20:23 ` [PATCH v2 3/4] perf record: enable runtime trace compression Alexey Budankov
                     ` (3 preceding siblings ...)
  2019-02-12 13:09   ` Jiri Olsa
@ 2019-02-12 13:09   ` Jiri Olsa
  2019-02-20 15:03     ` Alexey Budankov
  2019-02-12 13:09   ` Jiri Olsa
  5 siblings, 1 reply; 67+ messages in thread
From: Jiri Olsa @ 2019-02-12 13:09 UTC (permalink / raw)
  To: Alexey Budankov
  Cc: Arnaldo Carvalho de Melo, Ingo Molnar, Peter Zijlstra,
	Namhyung Kim, Alexander Shishkin, Andi Kleen, linux-kernel

On Mon, Feb 11, 2019 at 11:23:40PM +0300, Alexey Budankov wrote:

SNIP

> diff --git a/tools/perf/util/mmap.c b/tools/perf/util/mmap.c
> index 239e9a13c2b7..980784b77fe2 100644
> --- a/tools/perf/util/mmap.c
> +++ b/tools/perf/util/mmap.c
> @@ -156,6 +156,86 @@ void __weak auxtrace_mmap_params__set_idx(struct auxtrace_mmap_params *mp __mayb
>  {
>  }
>  
> +static ssize_t perf_mmap__capture(struct perf_mmap *md, int idx,
> +			perf_mmap__compress_fn_t compress, void *where)

what's the point of casting perf_session* to void* ? all the way down to
the compression, where you cast it back?

SNIP

> +int perf_session__zstd_fini(struct perf_session *session)
> +{
> +	if (session->zstd_cstream) {
> +		ZSTD_freeCStream(session->zstd_cstream);
> +		session->zstd_cstream = NULL;
> +	}
> +
> +	return 0;
> +}
> +
> +size_t perf_session__zstd_compress(void *to,  void *dst, size_t dst_size,
> +				   void *src, size_t src_size)
> +{
> +	struct perf_session *session = to;

in here... it could be just struct perf_session* no?

jirka

^ permalink raw reply	[flat|nested] 67+ messages in thread

* Re: [PATCH v2 4/4] perf report: support record trace file decompression
  2019-02-11 20:25 ` [PATCH v2 4/4] perf report: support record trace file decompression Alexey Budankov
                     ` (2 preceding siblings ...)
  2019-02-12 13:09   ` Jiri Olsa
@ 2019-02-12 13:09   ` Jiri Olsa
  2019-02-20 14:44     ` Alexey Budankov
  3 siblings, 1 reply; 67+ messages in thread
From: Jiri Olsa @ 2019-02-12 13:09 UTC (permalink / raw)
  To: Alexey Budankov
  Cc: Arnaldo Carvalho de Melo, Ingo Molnar, Peter Zijlstra,
	Namhyung Kim, Alexander Shishkin, Andi Kleen, linux-kernel

On Mon, Feb 11, 2019 at 11:25:00PM +0300, Alexey Budankov wrote:

SNIP

> @@ -1932,6 +2059,38 @@ fetch_mmaped_event(struct perf_session *session,
>  	return event;
>  }
>  
> +static int __perf_session__process_decomp_events(struct perf_session *session)
> +{
> +	s64 skip;
> +	u64 size, file_pos = 0;
> +	union perf_event *event;
> +	struct decomp *decomp = session->decomp_last;
> +
> +	if (!decomp)
> +		return 0;
> +
> +	while (decomp->head < decomp->size && !session_done()) {

so how this actualy works? does one PERF_RECORD_COMPRESSED event carry
complete data to unpack? or you wait to receive more PERF_RECORD_COMPRESSED
to give you data you can unpack?

jirka

^ permalink raw reply	[flat|nested] 67+ messages in thread

* Re: [PATCH v2 2/4] perf record: implement -z=<level> and --mmap-flush=<thres> options
  2019-02-11 20:22 ` [PATCH v2 2/4] perf record: implement -z=<level> and --mmap-flush=<thres> options Alexey Budankov
                     ` (4 preceding siblings ...)
  2019-02-12 13:08   ` Jiri Olsa
@ 2019-02-12 13:09   ` Jiri Olsa
  2019-02-20 15:19     ` Alexey Budankov
  2019-02-12 13:09   ` Jiri Olsa
  2019-02-12 13:09   ` Jiri Olsa
  7 siblings, 1 reply; 67+ messages in thread
From: Jiri Olsa @ 2019-02-12 13:09 UTC (permalink / raw)
  To: Alexey Budankov
  Cc: Arnaldo Carvalho de Melo, Ingo Molnar, Peter Zijlstra,
	Namhyung Kim, Alexander Shishkin, Andi Kleen, linux-kernel

On Mon, Feb 11, 2019 at 11:22:38PM +0300, Alexey Budankov wrote:
> 
> Implement -z,--compression_level=<n> and --mmap-flush=<dump_least_size>
> options as well as a special PERF_RECORD_COMPRESSED record that contains
> compressed parts of kernel data buffer.
> 
> Because compression requires auxilary memory to implement encoding of
> kernel data record->opts.nr_cblocks == -1 signifies to preallocate single
> AIO data buffer aio.data[0] without accompnying AIO control blocks.
> 
> Signed-off-by: Alexey Budankov <alexey.budankov@linux.intel.com>
> ---
> Changes in v2:
> - enabled allocation aio data buffers for compression
> 
> ---
>  tools/perf/Documentation/perf-record.txt |   9 ++
>  tools/perf/builtin-record.c              | 110 +++++++++++++++++++----
>  tools/perf/perf.h                        |   2 +
>  tools/perf/util/env.h                    |  10 +++
>  tools/perf/util/event.c                  |   1 +
>  tools/perf/util/event.h                  |   7 ++
>  tools/perf/util/evlist.c                 |   6 +-
>  tools/perf/util/evlist.h                 |   3 +-
>  tools/perf/util/header.c                 |  45 +++++++++-
>  tools/perf/util/header.h                 |   1 +
>  tools/perf/util/mmap.c                   |  98 ++++++++++++--------
>  tools/perf/util/mmap.h                   |   7 +-
>  tools/perf/util/session.h                |   2 +
>  13 files changed, 240 insertions(+), 61 deletions(-)

this patch is super complicated.. please think of reviewers
when writing code ;-)

it needs to be split to small logical patches,
where each change is explained in changelog

jirka

^ permalink raw reply	[flat|nested] 67+ messages in thread

* Re: [PATCH v2 2/4] perf record: implement -z=<level> and --mmap-flush=<thres> options
  2019-02-11 20:22 ` [PATCH v2 2/4] perf record: implement -z=<level> and --mmap-flush=<thres> options Alexey Budankov
                     ` (5 preceding siblings ...)
  2019-02-12 13:09   ` Jiri Olsa
@ 2019-02-12 13:09   ` Jiri Olsa
  2019-02-20 14:25     ` Alexey Budankov
  2019-02-12 13:09   ` Jiri Olsa
  7 siblings, 1 reply; 67+ messages in thread
From: Jiri Olsa @ 2019-02-12 13:09 UTC (permalink / raw)
  To: Alexey Budankov
  Cc: Arnaldo Carvalho de Melo, Ingo Molnar, Peter Zijlstra,
	Namhyung Kim, Alexander Shishkin, Andi Kleen, linux-kernel

On Mon, Feb 11, 2019 at 11:22:38PM +0300, Alexey Budankov wrote:

SNIP

> +static int process_compressed(struct feat_fd *ff,
> +			      void *data __maybe_unused)
> +{
> +	u64 compression_info;
> +
> +	if (do_read_u64(ff, &compression_info))
> +		return -1;
> +
> +	ff->ph->env.comp_type  = (compression_info >> 32) & 0xffffffffULL;
> +	ff->ph->env.comp_level = compression_info & 0xffffffffULL;
> +
> +	if (do_read_u64(ff, &compression_info))
> +		return -1;
> +
> +	ff->ph->env.comp_ratio = (compression_info >> 32) & 0xffffffffULL;
> +	ff->ph->env.comp_mmap_len = compression_info & 0xffffffffULL;
> +
> +	return 0;
> +}
> +
>  struct feature_ops {
>  	int (*write)(struct feat_fd *ff, struct perf_evlist *evlist);
>  	void (*print)(struct feat_fd *ff, FILE *fp);
> @@ -2651,7 +2693,8 @@ static const struct feature_ops feat_ops[HEADER_LAST_FEATURE] = {
>  	FEAT_OPN(CACHE,		cache,		true),
>  	FEAT_OPR(SAMPLE_TIME,	sample_time,	false),
>  	FEAT_OPR(MEM_TOPOLOGY,	mem_topology,	true),
> -	FEAT_OPR(CLOCKID,       clockid,        false)
> +	FEAT_OPR(CLOCKID,       clockid,        false),
> +	FEAT_OPR(COMPRESSED,	compressed,	false)

please separate the COMPRESSED feature ddition into single patch

thanks,
jirka

^ permalink raw reply	[flat|nested] 67+ messages in thread

* Re: [PATCH v2 2/4] perf record: implement -z=<level> and --mmap-flush=<thres> options
  2019-02-11 20:22 ` [PATCH v2 2/4] perf record: implement -z=<level> and --mmap-flush=<thres> options Alexey Budankov
                     ` (6 preceding siblings ...)
  2019-02-12 13:09   ` Jiri Olsa
@ 2019-02-12 13:09   ` Jiri Olsa
  2019-02-20 14:14     ` Alexey Budankov
  7 siblings, 1 reply; 67+ messages in thread
From: Jiri Olsa @ 2019-02-12 13:09 UTC (permalink / raw)
  To: Alexey Budankov
  Cc: Arnaldo Carvalho de Melo, Ingo Molnar, Peter Zijlstra,
	Namhyung Kim, Alexander Shishkin, Andi Kleen, linux-kernel

On Mon, Feb 11, 2019 at 11:22:38PM +0300, Alexey Budankov wrote:

SNIP

> @@ -1147,6 +1193,10 @@ static int __cmd_record(struct record *rec, int argc, const char **argv)
>  	fd = perf_data__fd(data);
>  	rec->session = session;
>  
> +	rec->opts.comp_level = 0;
> +	session->header.env.comp_level = rec->opts.comp_level;
> +	session->header.env.comp_type = PERF_COMP_NONE;
> +
>  	record__init_features(rec);
>  
>  	if (rec->opts.use_clockid && rec->opts.clockid_res_ns)
> @@ -1176,6 +1226,7 @@ static int __cmd_record(struct record *rec, int argc, const char **argv)
>  		err = -1;
>  		goto out_child;
>  	}
> +	session->header.env.comp_mmap_len = session->evlist->mmap_len;

so the comp_mmap_len is the max length of the compressed packet?

any idea if this value might have some impact on the processing speed?

I see you mentioned the size reduction, could you also meassure
the record overhead?

thanks,
jirka

^ permalink raw reply	[flat|nested] 67+ messages in thread

* Re: [PATCH v2 3/4] perf record: enable runtime trace compression
  2019-02-11 20:23 ` [PATCH v2 3/4] perf record: enable runtime trace compression Alexey Budankov
                     ` (4 preceding siblings ...)
  2019-02-12 13:09   ` Jiri Olsa
@ 2019-02-12 13:09   ` Jiri Olsa
  2019-02-20 15:06     ` Alexey Budankov
  5 siblings, 1 reply; 67+ messages in thread
From: Jiri Olsa @ 2019-02-12 13:09 UTC (permalink / raw)
  To: Alexey Budankov
  Cc: Arnaldo Carvalho de Melo, Ingo Molnar, Peter Zijlstra,
	Namhyung Kim, Alexander Shishkin, Andi Kleen, linux-kernel

On Mon, Feb 11, 2019 at 11:23:40PM +0300, Alexey Budankov wrote:
> 
> Compression is implemented using simple Zstd API and employs AIO data
> buffer as the memory to operate on. If the API call fails for some
> reason compression falls back to memcpy().
> 
> Data chunks are split and packed into PERF_RECORD_COMPRESSED records
> by 64KB at max. mmap-flush option value can be used to avoid compression
> of every single byte of data and increase compression ratio.
> 
> Signed-off-by: Alexey Budankov <alexey.budankov@linux.intel.com>
> ---
> Changes in v2:
> - enabled trace compression for serial trace streaming
> - moved compression/decompression code to session layer
> 
> ---
>  tools/perf/builtin-record.c |  67 +++++++++-----
>  tools/perf/util/mmap.c      | 173 +++++++++++++++++++++---------------
>  tools/perf/util/mmap.h      |  24 +++--
>  tools/perf/util/session.c   | 106 ++++++++++++++++++++++
>  tools/perf/util/session.h   |  13 +++
>  5 files changed, 280 insertions(+), 103 deletions(-)
> 
> diff --git a/tools/perf/builtin-record.c b/tools/perf/builtin-record.c
> index 227dbbd47d3f..435ff88dfc5e 100644
> --- a/tools/perf/builtin-record.c
> +++ b/tools/perf/builtin-record.c
> @@ -112,8 +112,7 @@ static bool switch_output_time(struct record *rec)
>  	       trigger_is_ready(&switch_output_trigger);
>  }
>  
> -static int record__write(struct record *rec, struct perf_mmap *map __maybe_unused,
> -			 void *bf, size_t size)
> +static int record__write(struct record *rec, void *bf, size_t size)

please keep the 'struct perf_mmap *map' in, I'm using it in
the multiple file storage, where each map holds pointer to
its data file

thanks,
jirka

^ permalink raw reply	[flat|nested] 67+ messages in thread

* Re: [PATCH v2 0/4] perf: enable compression of record mode trace to save storage space
  2019-02-12 12:27 ` [PATCH v2 0/4] perf: enable compression of record mode trace to save storage space Arnaldo Carvalho de Melo
@ 2019-02-12 14:06   ` Alexey Budankov
  0 siblings, 0 replies; 67+ messages in thread
From: Alexey Budankov @ 2019-02-12 14:06 UTC (permalink / raw)
  To: Arnaldo Carvalho de Melo
  Cc: Ingo Molnar, Peter Zijlstra, Jiri Olsa, Namhyung Kim,
	Alexander Shishkin, Andi Kleen, linux-kernel


On 12.02.2019 15:27, Arnaldo Carvalho de Melo wrote:
> Em Mon, Feb 11, 2019 at 11:17:14PM +0300, Alexey Budankov escreveu:
>> This is the rebase to the tip of Arnaldo's perf/core repository.
>  
>> The patch set implements runtime trace compression for record mode and 
>> trace file decompression for report mode. Zstandard API [1] is used for 
>> compression/decompression of data that come from perf_events kernel 
>> data buffers.
>  
>> Realized -z,--compression_level=n option provides ~3-5x avg. trace file 
> 
> Realized? Perhaps "Using the -z/--compression_level=n options provides
> 3.5x average size reduction on..."

Yep, wording could be adjusted. Thanks!

> 
>> size reduction on variety of tested workloads what saves user storage 
>> space on larger server systems where trace file size can easily reach 
>> several tens or even hundreds of GiBs, especially when profiling with 
>> dwarf-based stacks and tracing of  context-switches.
>  
>> --mmap-flush option can be used to avoid compressing every single byte 
> 
> What is the default for --mmap-flush when -z is used? Does it vary with
> the compression level? Can we expect something that is reasonable or is
> using --mmap-flush always needed? If so can we do away with that by
> using some sensible default according to the -z level used?

Default value is 1 what is equal to the current perf record implementation. 
The option is independent from -z setting and doesn't vary with compression 
level. Default could be different if -z is specified.

> 
>> of data and increase compression ratio at the same time lowering tool 
>> runtime overhead.
>>
>>   $ tools/perf/perf record -z 1 -e cycles -- matrix.gcc
>>   $ tools/perf/perf record -z 1 --mmap-flush 1024 -e cycles -- matrix.gcc
>>   $ tools/perf/perf record -z 1 --mmap-flush 1024 --aio -e cycles -- matrix.gcc
> 
> Please avoid having these explanations just on the cover letter since it
> will not get recorded in the git history for the project, please add
> examples with output from real workloads, preferably providing real
> cases so that other people can readily replicate your results when
> having access to similar large machines.

Will do. Thanks!

>  
>> The compression functionality can be disabled from the command line 
>> using NO_LIBZSTD define and Zstandard sources can be overridden using 
>> value of LIBZSTD_DIR define:
>>
>>   $ make -C tools/perf NO_LIBZSTD=1 clean all
>>   $ make -C tools/perf LIBZSTD_DIR=/path/to/zstd-1.3.7 clean all
> 
> Showing the relevant output of the these commands before/after
> installing the libzstd library is useful as well. From what you use
> above is it fair to conclude that there are no readily made packages for
> this library and we need to get it from github and build it ourselves?

If your system has some version of the package preinstalled then the build 
system finds and uses it during the build. Auto detection feature status is 
reported just before compilation starts.

If you still prefer to compile with some newer version that is not 
preinstalled, and you don't want upgrading your system with newer version, 
you have capability to download, build and then refer the compilation to 
the newer just built library version using LIBZSTD_DIR.

> 
> On my notebook, a Fedora 29 system:
> 
> [root@quaco ~]# dnf search zstd
> ========================================================== Name & Summary Matched: zstd ==========================================================
> zstd.x86_64 : Zstd compression library
> zstd.x86_64 : Zstd compression library
> libzstd.x86_64 : Zstd shared library
> libzstd.i686 : Zstd shared library
> libzstd.x86_64 : Zstd shared library
> libzstd-devel.i686 : Header files for Zstd library
> libzstd-devel.x86_64 : Header files for Zstd library
> ============================================================= Summary Matched: zstd ==============================================================
> fedora-repo-zdicts.noarch : Zstd dictionaries for Fedora repository metadata
> [root@quaco ~]#
> [root@quaco ~]# dnf install libzstd-devel
> <SNIP>
>  libzstd-devel                         x86_64                         1.3.8-1.fc29                          updates                          39 k
> <SNIP>
> 
> So, do I need some specific 1.3.7 version? Is that why you used that
> LIBZSTD_DIR in your example?

v1.3.7 is just an example. v1.3.8 can be also provided.

> 
> I'm trying these steps, but please consider following these suggestions
> in future posts,
> 
> And thanks for working on this!

Thanks for meaningful comments. Will do my best to follow it.

Thanks,
Alexey

> 
> - Arnaldo
>  

^ permalink raw reply	[flat|nested] 67+ messages in thread

* Re: [PATCH v2 2/4] perf record: implement -z=<level> and --mmap-flush=<thres> options
  2019-02-12 13:08   ` Jiri Olsa
@ 2019-02-20 14:13     ` Alexey Budankov
  0 siblings, 0 replies; 67+ messages in thread
From: Alexey Budankov @ 2019-02-20 14:13 UTC (permalink / raw)
  To: Jiri Olsa
  Cc: Arnaldo Carvalho de Melo, Ingo Molnar, Peter Zijlstra,
	Namhyung Kim, Alexander Shishkin, Andi Kleen, linux-kernel


On 12.02.2019 16:08, Jiri Olsa wrote:
> On Mon, Feb 11, 2019 at 11:22:38PM +0300, Alexey Budankov wrote:
> 
> SNIP
> 
>>  static int process_synthesized_event(struct perf_tool *tool,
>>  				     union perf_event *event,
>>  				     struct perf_sample *sample __maybe_unused,

<SNIP>

>>  	for (i = 0; i < evlist->nr_mmaps; i++) {
>> +		u64 flush = MMAP_FLUSH_DEFAULT;
>>  		struct perf_mmap *map = &maps[i];
>>  
>>  		if (map->base) {
>>  			record__adjust_affinity(rec, map);
>> +			if (sync) {
>> +				flush = map->flush;
>> +				map->flush = MMAP_FLUSH_DEFAULT;
>> +			}
>>  			if (!record__aio_enabled(rec)) {
>>  				if (perf_mmap__push(map, rec, record__pushfn) != 0) {
>> +					if (sync)
>> +						map->flush = flush;
>>  					rc = -1;
>>  					goto out;
>>  				}
>> @@ -775,10 +814,14 @@ static int record__mmap_read_evlist(struct record *rec, struct perf_evlist *evli
>>  				idx = record__aio_sync(map, false);
>>  				if (perf_mmap__aio_push(map, rec, idx, record__aio_pushfn, &off) != 0) {
>>  					record__aio_set_pos(trace_fd, off);
>> +					if (sync)
>> +						map->flush = flush;
>>  					rc = -1;
>>  					goto out;
>>  				}
>>  			}
>> +			if (sync)
>> +				map->flush = flush;
>>  		}
> 
> what's 'flush' and 'sync' for? also the above handling seems confusing,
> why would you set it temporarily for default value if you let it set
> by user command line option?

flush is the threshold postponing and triggering the move of data to a 
trace file from the memory buffers. sync is the mean to force that move 
independently from the threshold value.

Despite a user provides flush value from the command line, the tool has 
to drain memory buffers, at least in the end of the collection, so that 
it is technically implemented like this.

> 
> SNIP
> 
>> -static void perf_mmap__aio_munmap(struct perf_mmap *map __maybe_unused)
>>  

<SNIP>

>> @@ -492,7 +518,7 @@ static int __perf_mmap__read_init(struct perf_mmap *md)
>>  	md->start = md->overwrite ? head : old;
>>  	md->end = md->overwrite ? old : head;
>>  
>> -	if (md->start == md->end)
>> +	if ((md->end - md->start) < md->flush)
>>  		return -EAGAIN;
> 
> we need document and explain this change in changelog in separate patch

Moved --mmap-flush option implementation into a separate patch.

Thanks,
Alexey

> 
> thanks,
> jirka
> 

^ permalink raw reply	[flat|nested] 67+ messages in thread

* Re: [PATCH v2 2/4] perf record: implement -z=<level> and --mmap-flush=<thres> options
  2019-02-12 13:08   ` Jiri Olsa
@ 2019-02-20 14:13     ` Alexey Budankov
  0 siblings, 0 replies; 67+ messages in thread
From: Alexey Budankov @ 2019-02-20 14:13 UTC (permalink / raw)
  To: Jiri Olsa
  Cc: Arnaldo Carvalho de Melo, Ingo Molnar, Peter Zijlstra,
	Namhyung Kim, Alexander Shishkin, Andi Kleen, linux-kernel


On 12.02.2019 16:08, Jiri Olsa wrote:
> On Mon, Feb 11, 2019 at 11:22:38PM +0300, Alexey Budankov wrote:
> 
> SNIP
> 
>> diff --git a/tools/perf/util/session.h b/tools/perf/util/session.h
>> index d96eccd7d27f..0e14884f28b2 100644
>> --- a/tools/perf/util/session.h
>> +++ b/tools/perf/util/session.h
>> @@ -35,6 +35,8 @@ struct perf_session {
>>  	struct ordered_events	ordered_events;
>>  	struct perf_data	*data;
>>  	struct perf_tool	*tool;
>> +	u64			bytes_transferred;
>> +	u64			bytes_compressed;
> 
> please add those in separate patch, so we can see how they are
> incremented and displayed easily

Moved declaration and printing of the values into a separate patch.

Thanks,
Alexey

> 
> thanks,
> jirka
> 

^ permalink raw reply	[flat|nested] 67+ messages in thread

* Re: [PATCH v2 2/4] perf record: implement -z=<level> and --mmap-flush=<thres> options
  2019-02-12 13:08   ` Jiri Olsa
@ 2019-02-20 14:13     ` Alexey Budankov
  0 siblings, 0 replies; 67+ messages in thread
From: Alexey Budankov @ 2019-02-20 14:13 UTC (permalink / raw)
  To: Jiri Olsa
  Cc: Arnaldo Carvalho de Melo, Ingo Molnar, Peter Zijlstra,
	Namhyung Kim, Alexander Shishkin, Andi Kleen, linux-kernel


On 12.02.2019 16:08, Jiri Olsa wrote:
> On Mon, Feb 11, 2019 at 11:22:38PM +0300, Alexey Budankov wrote:
> 
> SNIP
> 
>>  static void record__init_features(struct record *rec)
>> @@ -838,6 +881,9 @@ static void record__init_features(struct record *rec)
>>  	if (!(rec->opts.use_clockid && rec->opts.clockid_res_ns))
>>  		perf_header__clear_feat(&session->header, HEADER_CLOCKID);
>>  
>> +	if (!record__comp_enabled(rec))
>> +		perf_header__clear_feat(&session->header, HEADER_COMPRESSED);
>> +
>>  	perf_header__clear_feat(&session->header, HEADER_STAT);
>>  }
>>  
>> @@ -1147,6 +1193,10 @@ static int __cmd_record(struct record *rec, int argc, const char **argv)
>>  	fd = perf_data__fd(data);
>>  	rec->session = session;
>>  
>> +	rec->opts.comp_level = 0;
>> +	session->header.env.comp_level = rec->opts.comp_level;
>> +	session->header.env.comp_type = PERF_COMP_NONE;
> 
> hum I think both record and session are zerod from start,
> so this seems superfluous

Removed.

> 
> SNIP
> 
>> diff --git a/tools/perf/util/header.c b/tools/perf/util/header.c
>> index dec6d218c31c..5ad3a27a042f 100644
>> --- a/tools/perf/util/header.c
>> +++ b/tools/perf/util/header.c
>> @@ -1463,6 +1463,21 @@ static int write_mem_topology(struct feat_fd *ff __maybe_unused,
>>  	return ret;
>>  }
>>  
>> +static int write_compressed(struct feat_fd *ff __maybe_unused,
>> +			    struct perf_evlist *evlist __maybe_unused)
>> +{
>> +	int ret;
>> +	u64 compression_info = ((u64)ff->ph->env.comp_type  << 32) | ff->ph->env.comp_level;
>> +
>> +	ret = do_write(ff, &compression_info, sizeof(compression_info));
>> +	if (ret)
>> +		return ret;
>> +
>> +	compression_info = ((u64)ff->ph->env.comp_ratio << 32) | ff->ph->env.comp_mmap_len;
> 
> let's not complicate this, if you have two u32 values
> to store, store two u32 values ;-)

Corrected.

> 
> also please add version in case we will add new fields in the future,
> so we can stay backward compatible

Implemented.

Thanks,
Alexey

> 
> jirka
> 

^ permalink raw reply	[flat|nested] 67+ messages in thread

* Re: [PATCH v2 2/4] perf record: implement -z=<level> and --mmap-flush=<thres> options
  2019-02-12 13:09   ` Jiri Olsa
@ 2019-02-20 14:14     ` Alexey Budankov
  2019-02-25 15:30       ` Alexey Budankov
  0 siblings, 1 reply; 67+ messages in thread
From: Alexey Budankov @ 2019-02-20 14:14 UTC (permalink / raw)
  To: Jiri Olsa
  Cc: Arnaldo Carvalho de Melo, Ingo Molnar, Peter Zijlstra,
	Namhyung Kim, Alexander Shishkin, Andi Kleen, linux-kernel


On 12.02.2019 16:09, Jiri Olsa wrote:
> On Mon, Feb 11, 2019 at 11:22:38PM +0300, Alexey Budankov wrote:
> 
> SNIP
> 
>> @@ -1147,6 +1193,10 @@ static int __cmd_record(struct record *rec, int argc, const char **argv)
>>  	fd = perf_data__fd(data);
>>  	rec->session = session;
>>  
>> +	rec->opts.comp_level = 0;
>> +	session->header.env.comp_level = rec->opts.comp_level;
>> +	session->header.env.comp_type = PERF_COMP_NONE;
>> +
>>  	record__init_features(rec);
>>  
>>  	if (rec->opts.use_clockid && rec->opts.clockid_res_ns)
>> @@ -1176,6 +1226,7 @@ static int __cmd_record(struct record *rec, int argc, const char **argv)
>>  		err = -1;
>>  		goto out_child;
>>  	}
>> +	session->header.env.comp_mmap_len = session->evlist->mmap_len;
> 
> so the comp_mmap_len is the max length of the compressed packet?

comp_mmap_len is the size of buffer to encompass one compressed chunk 
of data after its decompression.

> 
> any idea if this value might have some impact on the processing speed?

It increases memory consumption at the loading and processing stages.

> 
> I see you mentioned the size reduction, could you also meassure
> the record overhead?

Let's get back to this after the code review.

Thanks,
Alexey

> 
> thanks,
> jirka
> 

^ permalink raw reply	[flat|nested] 67+ messages in thread

* Re: [PATCH v2 2/4] perf record: implement -z=<level> and --mmap-flush=<thres> options
  2019-02-12 13:08   ` Jiri Olsa
@ 2019-02-20 14:15     ` Alexey Budankov
  2019-02-21  9:47       ` Jiri Olsa
  0 siblings, 1 reply; 67+ messages in thread
From: Alexey Budankov @ 2019-02-20 14:15 UTC (permalink / raw)
  To: Jiri Olsa
  Cc: Arnaldo Carvalho de Melo, Ingo Molnar, Peter Zijlstra,
	Namhyung Kim, Alexander Shishkin, Andi Kleen, linux-kernel


On 12.02.2019 16:08, Jiri Olsa wrote:
> On Mon, Feb 11, 2019 at 11:22:38PM +0300, Alexey Budankov wrote:
> 
> SNIP
> 
>> -	if (rec->opts.nr_cblocks > nr_cblocks_max)
>> -		rec->opts.nr_cblocks = nr_cblocks_max;
>> -	if (verbose > 0)
>> -		pr_info("nr_cblocks: %d\n", rec->opts.nr_cblocks);
>> +	if (rec->opts.comp_level > 22)
>> +		rec->opts.comp_level = 0;
>> +	if (record__comp_enabled(rec) && !rec->opts.nr_cblocks) {
>> +		 /*
>> +		  * Allocate aio.data[0] buffer for compression.
>> +		  */
>> +		rec->opts.nr_cblocks = -1;
>> +	}
> 
> I assume you're using aio.data[0] as a buffer for non-aio case
> as compression buffer.. if that's the case, please dont do that
> and use separate buffer for that

Do you want it to be one more field in the mmap struct or 
aio related fields still could be reworked?

Thanks,
Alexey

> 
> jirka
> 

^ permalink raw reply	[flat|nested] 67+ messages in thread

* Re: [PATCH v2 2/4] perf record: implement -z=<level> and --mmap-flush=<thres> options
  2019-02-12 13:09   ` Jiri Olsa
@ 2019-02-20 14:25     ` Alexey Budankov
  0 siblings, 0 replies; 67+ messages in thread
From: Alexey Budankov @ 2019-02-20 14:25 UTC (permalink / raw)
  To: Jiri Olsa
  Cc: Arnaldo Carvalho de Melo, Ingo Molnar, Peter Zijlstra,
	Namhyung Kim, Alexander Shishkin, Andi Kleen, linux-kernel


On 12.02.2019 16:09, Jiri Olsa wrote:
> On Mon, Feb 11, 2019 at 11:22:38PM +0300, Alexey Budankov wrote:
> 
> SNIP
> 
>> +static int process_compressed(struct feat_fd *ff,
>> +			      void *data __maybe_unused)
>> +{
>> +	u64 compression_info;
>> +
>> +	if (do_read_u64(ff, &compression_info))
>> +		return -1;
>> +
>> +	ff->ph->env.comp_type  = (compression_info >> 32) & 0xffffffffULL;
>> +	ff->ph->env.comp_level = compression_info & 0xffffffffULL;
>> +
>> +	if (do_read_u64(ff, &compression_info))
>> +		return -1;
>> +
>> +	ff->ph->env.comp_ratio = (compression_info >> 32) & 0xffffffffULL;
>> +	ff->ph->env.comp_mmap_len = compression_info & 0xffffffffULL;
>> +
>> +	return 0;
>> +}
>> +
>>  struct feature_ops {
>>  	int (*write)(struct feat_fd *ff, struct perf_evlist *evlist);
>>  	void (*print)(struct feat_fd *ff, FILE *fp);
>> @@ -2651,7 +2693,8 @@ static const struct feature_ops feat_ops[HEADER_LAST_FEATURE] = {
>>  	FEAT_OPN(CACHE,		cache,		true),
>>  	FEAT_OPR(SAMPLE_TIME,	sample_time,	false),
>>  	FEAT_OPR(MEM_TOPOLOGY,	mem_topology,	true),
>> -	FEAT_OPR(CLOCKID,       clockid,        false)
>> +	FEAT_OPR(CLOCKID,       clockid,        false),
>> +	FEAT_OPR(COMPRESSED,	compressed,	false)
> 
> please separate the COMPRESSED feature ddition into single patch

Moved to a separate patch.

Thanks,
Alexey

> 
> thanks,
> jirka
> 

^ permalink raw reply	[flat|nested] 67+ messages in thread

* Re: [PATCH v2 4/4] perf report: support record trace file decompression
  2019-02-12 13:09   ` Jiri Olsa
@ 2019-02-20 14:44     ` Alexey Budankov
  0 siblings, 0 replies; 67+ messages in thread
From: Alexey Budankov @ 2019-02-20 14:44 UTC (permalink / raw)
  To: Jiri Olsa
  Cc: Arnaldo Carvalho de Melo, Ingo Molnar, Peter Zijlstra,
	Namhyung Kim, Alexander Shishkin, Andi Kleen, linux-kernel


On 12.02.2019 16:09, Jiri Olsa wrote:
> On Mon, Feb 11, 2019 at 11:25:00PM +0300, Alexey Budankov wrote:
> 
> SNIP
> 
>> @@ -1932,6 +2059,38 @@ fetch_mmaped_event(struct perf_session *session,
>>  	return event;
>>  }
>>  
>> +static int __perf_session__process_decomp_events(struct perf_session *session)
>> +{
>> +	s64 skip;
>> +	u64 size, file_pos = 0;
>> +	union perf_event *event;
>> +	struct decomp *decomp = session->decomp_last;
>> +
>> +	if (!decomp)
>> +		return 0;
>> +
>> +	while (decomp->head < decomp->size && !session_done()) {
> 
> so how this actualy works? does one PERF_RECORD_COMPRESSED event carry
> complete data to unpack? or you wait to receive more PERF_RECORD_COMPRESSED
> to give you data you can unpack?

One record contains compressed data chunk that is ready for decompression.
Record header is skipped and the rest is fetched to perf_session__zstd_decompress()
also providing the buffer to decompress to, which is of the size encoded in 
the file header (comp_mmap_len). After decompression the buffer is iterated 
the same way as mmapped event records. Decompressed split event records are 
handled. Decompressed buffers are chained into the linked list so the records 
memory remains available for later references.

Thanks,
Alexey

> 
> jirka
> 

^ permalink raw reply	[flat|nested] 67+ messages in thread

* Re: [PATCH v2 4/4] perf report: support record trace file decompression
  2019-02-12 13:09   ` Jiri Olsa
@ 2019-02-20 14:46     ` Alexey Budankov
  0 siblings, 0 replies; 67+ messages in thread
From: Alexey Budankov @ 2019-02-20 14:46 UTC (permalink / raw)
  To: Jiri Olsa
  Cc: Arnaldo Carvalho de Melo, Ingo Molnar, Peter Zijlstra,
	Namhyung Kim, Alexander Shishkin, Andi Kleen, linux-kernel


On 12.02.2019 16:09, Jiri Olsa wrote:
> On Mon, Feb 11, 2019 at 11:25:00PM +0300, Alexey Budankov wrote:
> 
> SNIP
> 
>> +static int perf_session__process_compressed_event(struct perf_session *session,
>> +					union perf_event *event, u64 file_offset)
>> +{
>> +	void *src;
>> +	size_t decomp_size, src_size;
>> +	u64 decomp_last_rem = 0;
>> +	size_t decomp_len = session->header.env.comp_mmap_len;
>> +	struct decomp *decomp, *decomp_last = session->decomp_last;
>> +
>> +	decomp = mmap(NULL, sizeof(struct decomp) + decomp_len, PROT_READ|PROT_WRITE,
>> +		      MAP_ANONYMOUS|MAP_PRIVATE, -1, 0);
>> +	if (decomp == MAP_FAILED) {
>> +		pr_err("Couldn't allocate memory for decompression\n");
>> +		return -1;
>> +	}
>> +
>> +	decomp->file_pos = file_offset;
>> +	decomp->head = 0;
>> +
>> +	if (decomp_last) {
>> +		decomp_last_rem = decomp_last->size - decomp_last->head;
>> +		memcpy(decomp->data, &(decomp_last->data[decomp_last->head]), decomp_last_rem);
>> +		decomp->size = decomp_last_rem;
>> +	}
>> +
>> +	src = (void *)event + sizeof(struct compressed_event);
>> +	src_size = event->pack.header.size - sizeof(struct compressed_event);
>> +
>> +	decomp_size = perf_session__zstd_decompress(session, src, src_size,
>> +				&(decomp->data[decomp_last_rem]), decomp_len - decomp_last_rem);
>> +	if (!decomp_size) {
>> +		munmap(decomp, sizeof(struct decomp) + decomp_len);
>> +		pr_err("Couldn't decompress data\n");
>> +		return -1;
>> +	}
>> +
>> +	decomp->size += decomp_size;
>> +
>> +	if (session->decomp == NULL) {
>> +		session->decomp = decomp;
>> +		session->decomp_last = decomp;
>> +	} else {
>> +		session->decomp_last->next = decomp;
>> +		session->decomp_last = decomp;
>> +	}
> 
> can this happen? can you process more compressed events
> before you deliver them in __perf_session__process_decomp_events?
> 
> I think I asked similar question in my other email..

Answered to that mail.

Thanks,
Alexey

> 
> thanks,
> jirka
> 

^ permalink raw reply	[flat|nested] 67+ messages in thread

* Re: [PATCH v2 4/4] perf report: support record trace file decompression
  2019-02-12 13:09   ` Jiri Olsa
@ 2019-02-20 14:48     ` Alexey Budankov
       [not found]       ` <0132ec08-e28b-4102-5053-8f8e21e7fd44@linux.intel.com>
  0 siblings, 1 reply; 67+ messages in thread
From: Alexey Budankov @ 2019-02-20 14:48 UTC (permalink / raw)
  To: Jiri Olsa
  Cc: Arnaldo Carvalho de Melo, Ingo Molnar, Peter Zijlstra,
	Namhyung Kim, Alexander Shishkin, Andi Kleen, linux-kernel


On 12.02.2019 16:09, Jiri Olsa wrote:
> On Mon, Feb 11, 2019 at 11:25:00PM +0300, Alexey Budankov wrote:
>>
>> PERF_RECORD_COMPRESSED records are decompressed from trace file into
>> a linked list of mmaped memory regions using streaming Zstandard API.
>> After that the regions are loaded fetching uncompressed events. When
>> dumping raw trace (e.g., perf report -D --header) file offsets of
>> events from compressed records are set to zero.
>>
>> Signed-off-by: Alexey Budankov <alexey.budankov@linux.intel.com>
>> ---
>> Changes in v2:
>> - moved compression/decompression code to session layer
> 
> could you please add some automated test for this?
> 
> I think it could be part of the sample synthesize test 
> or even new script test under tests/shell.. or anything
> else really.. just to see that we have some clean
> interface for this that we can test

This is possible. Let's complete the code review first.

Thanks,
Alexey

> 
> thanks,
> jirka
> 

^ permalink raw reply	[flat|nested] 67+ messages in thread

* Re: [PATCH v2 3/4] perf record: enable runtime trace compression
  2019-02-12 13:08   ` Jiri Olsa
@ 2019-02-20 14:53     ` Alexey Budankov
  2019-02-21  9:43       ` Jiri Olsa
  0 siblings, 1 reply; 67+ messages in thread
From: Alexey Budankov @ 2019-02-20 14:53 UTC (permalink / raw)
  To: Jiri Olsa
  Cc: Arnaldo Carvalho de Melo, Ingo Molnar, Peter Zijlstra,
	Namhyung Kim, Alexander Shishkin, Andi Kleen, linux-kernel


On 12.02.2019 16:08, Jiri Olsa wrote:
> On Mon, Feb 11, 2019 at 11:23:40PM +0300, Alexey Budankov wrote:
> 
> SNIP
> 
>> @@ -774,6 +775,8 @@ static int record__mmap_read_evlist(struct record *rec, struct perf_evlist *evli
>>  	struct perf_mmap *maps;
>>  	int trace_fd = rec->data.file.fd;
>>  	off_t off;
>> +	struct perf_session *session = rec->session;
>> +	perf_mmap__compress_fn_t compress_fn;
>>  
>>  	if (!evlist)
>>  		return 0;
>> @@ -785,6 +788,9 @@ static int record__mmap_read_evlist(struct record *rec, struct perf_evlist *evli
>>  	if (overwrite && evlist->bkw_mmap_state != BKW_MMAP_DATA_PENDING)
>>  		return 0;
>>  
>> +	compress_fn = (record__comp_enabled(rec) ?
>> +		perf_session__zstd_compress : perf_session__zstd_copy);
>> +
> 
> I don't follow what's the perf_session__zstd_copy function for..?

It bridges AIO without compression case.

Thanks,
Alexey

> 
> for !record__comp_enabled case we seem not to use it
> and calling the current perf_mmap__push interface
> 
> however I dont see point to have this function at all
> 
> jirka
> 
> 
>>  	if (record__aio_enabled(rec))
>>  		off = record__aio_get_pos(trace_fd);
>>  
>> @@ -799,11 +805,21 @@ static int record__mmap_read_evlist(struct record *rec, struct perf_evlist *evli
>>  				map->flush = MMAP_FLUSH_DEFAULT;
>>  			}
>>  			if (!record__aio_enabled(rec)) {
>> -				if (perf_mmap__push(map, rec, record__pushfn) != 0) {
>> -					if (sync)
>> -						map->flush = flush;
>> -					rc = -1;
>> -					goto out;
>> +				if (!record__comp_enabled(rec)) {
>> +					if (perf_mmap__push(map, rec, record__pushfn) != 0) {
>> +						if (sync)
>> +							map->flush = flush;
>> +						rc = -1;
>> +						goto out;
>> +					}
>> +				} else {
>> +					if (perf_mmap__pack(map, rec, record__pushfn,
>> +							compress_fn, session) != 0) {
>> +						if (sync)
>> +							map->flush = flush;
>> +						rc = -1;
>> +						goto out;
>> +					}
> 
> SNIP
> 

^ permalink raw reply	[flat|nested] 67+ messages in thread

* Re: [PATCH v2 3/4] perf record: enable runtime trace compression
  2019-02-12 13:09   ` Jiri Olsa
@ 2019-02-20 15:03     ` Alexey Budankov
  0 siblings, 0 replies; 67+ messages in thread
From: Alexey Budankov @ 2019-02-20 15:03 UTC (permalink / raw)
  To: Jiri Olsa
  Cc: Arnaldo Carvalho de Melo, Ingo Molnar, Peter Zijlstra,
	Namhyung Kim, Alexander Shishkin, Andi Kleen, linux-kernel


On 12.02.2019 16:09, Jiri Olsa wrote:
> On Mon, Feb 11, 2019 at 11:23:40PM +0300, Alexey Budankov wrote:
> 
> SNIP
> 
>> diff --git a/tools/perf/util/mmap.c b/tools/perf/util/mmap.c
>> index 239e9a13c2b7..980784b77fe2 100644
>> --- a/tools/perf/util/mmap.c
>> +++ b/tools/perf/util/mmap.c
>> @@ -156,6 +156,86 @@ void __weak auxtrace_mmap_params__set_idx(struct auxtrace_mmap_params *mp __mayb
>>  {
>>  }
>>  
>> +static ssize_t perf_mmap__capture(struct perf_mmap *md, int idx,
>> +			perf_mmap__compress_fn_t compress, void *where)
> 
> what's the point of casting perf_session* to void* ? all the way down to
> the compression, where you cast it back?
> 
> SNIP
> 
>> +int perf_session__zstd_fini(struct perf_session *session)
>> +{
>> +	if (session->zstd_cstream) {
>> +		ZSTD_freeCStream(session->zstd_cstream);
>> +		session->zstd_cstream = NULL;
>> +	}
>> +
>> +	return 0;
>> +}
>> +
>> +size_t perf_session__zstd_compress(void *to,  void *dst, size_t dst_size,
>> +				   void *src, size_t src_size)
>> +{
>> +	struct perf_session *session = to;
> 
> in here... it could be just struct perf_session* no?

As far as it is a pointer. Casting a pointer to void* 
and back is harmless. Well, let's clean that up.

Thanks,
Alexey

> 
> jirka
> 

^ permalink raw reply	[flat|nested] 67+ messages in thread

* Re: [PATCH v2 3/4] perf record: enable runtime trace compression
  2019-02-12 13:09   ` Jiri Olsa
@ 2019-02-20 15:06     ` Alexey Budankov
  0 siblings, 0 replies; 67+ messages in thread
From: Alexey Budankov @ 2019-02-20 15:06 UTC (permalink / raw)
  To: Jiri Olsa
  Cc: Arnaldo Carvalho de Melo, Ingo Molnar, Peter Zijlstra,
	Namhyung Kim, Alexander Shishkin, Andi Kleen, linux-kernel


On 12.02.2019 16:09, Jiri Olsa wrote:
> On Mon, Feb 11, 2019 at 11:23:40PM +0300, Alexey Budankov wrote:
>>
>> Compression is implemented using simple Zstd API and employs AIO data
>> buffer as the memory to operate on. If the API call fails for some
>> reason compression falls back to memcpy().
>>
>> Data chunks are split and packed into PERF_RECORD_COMPRESSED records
>> by 64KB at max. mmap-flush option value can be used to avoid compression
>> of every single byte of data and increase compression ratio.
>>
>> Signed-off-by: Alexey Budankov <alexey.budankov@linux.intel.com>
>> ---
>> Changes in v2:
>> - enabled trace compression for serial trace streaming
>> - moved compression/decompression code to session layer
>>
>> ---
>>  tools/perf/builtin-record.c |  67 +++++++++-----
>>  tools/perf/util/mmap.c      | 173 +++++++++++++++++++++---------------
>>  tools/perf/util/mmap.h      |  24 +++--
>>  tools/perf/util/session.c   | 106 ++++++++++++++++++++++
>>  tools/perf/util/session.h   |  13 +++
>>  5 files changed, 280 insertions(+), 103 deletions(-)
>>
>> diff --git a/tools/perf/builtin-record.c b/tools/perf/builtin-record.c
>> index 227dbbd47d3f..435ff88dfc5e 100644
>> --- a/tools/perf/builtin-record.c
>> +++ b/tools/perf/builtin-record.c
>> @@ -112,8 +112,7 @@ static bool switch_output_time(struct record *rec)
>>  	       trigger_is_ready(&switch_output_trigger);
>>  }
>>  
>> -static int record__write(struct record *rec, struct perf_mmap *map __maybe_unused,
>> -			 void *bf, size_t size)
>> +static int record__write(struct record *rec, void *bf, size_t size)
> 
> please keep the 'struct perf_mmap *map' in, I'm using it in
> the multiple file storage, where each map holds pointer to
> its data file

Yes, it will be reverted.

Thanks,
Alexey

> 
> thanks,
> jirka
> 

^ permalink raw reply	[flat|nested] 67+ messages in thread

* Re: [PATCH v2 3/4] perf record: enable runtime trace compression
  2019-02-12 13:09   ` Jiri Olsa
@ 2019-02-20 15:09     ` Alexey Budankov
  2019-02-25 15:27       ` Alexey Budankov
  0 siblings, 1 reply; 67+ messages in thread
From: Alexey Budankov @ 2019-02-20 15:09 UTC (permalink / raw)
  To: Jiri Olsa
  Cc: Arnaldo Carvalho de Melo, Ingo Molnar, Peter Zijlstra,
	Namhyung Kim, Alexander Shishkin, Andi Kleen, linux-kernel


On 12.02.2019 16:09, Jiri Olsa wrote:
> On Mon, Feb 11, 2019 at 11:23:40PM +0300, Alexey Budankov wrote:
> 
> SNIP
> 
>> +	compress_fn = (record__comp_enabled(rec) ?
>> +		perf_session__zstd_compress : perf_session__zstd_copy);
>> +
>>  	if (record__aio_enabled(rec))
>>  		off = record__aio_get_pos(trace_fd);
>>  
>> @@ -799,11 +805,21 @@ static int record__mmap_read_evlist(struct record *rec, struct perf_evlist *evli
>>  				map->flush = MMAP_FLUSH_DEFAULT;
>>  			}
>>  			if (!record__aio_enabled(rec)) {
>> -				if (perf_mmap__push(map, rec, record__pushfn) != 0) {
>> -					if (sync)
>> -						map->flush = flush;
>> -					rc = -1;
>> -					goto out;
>> +				if (!record__comp_enabled(rec)) {
>> +					if (perf_mmap__push(map, rec, record__pushfn) != 0) {
>> +						if (sync)
>> +							map->flush = flush;
>> +						rc = -1;
>> +						goto out;
>> +					}
>> +				} else {
>> +					if (perf_mmap__pack(map, rec, record__pushfn,
>> +							compress_fn, session) != 0) {
>> +						if (sync)
>> +							map->flush = flush;
>> +						rc = -1;
>> +						goto out;
>> +					}
> 
> I really dont like this.. the non-compression flow needs to stay untouched
> 
> why don't you overload the record__pushfn function for compression
> processing and stash the data in there?

Yes, this becomes complicated and overloading could be an option to simplify it.

Thanks,
Alexey

> 
> thanks,
> jirka
> 

^ permalink raw reply	[flat|nested] 67+ messages in thread

* Re: [PATCH v2 3/4] perf record: enable runtime trace compression
  2019-02-12 13:09   ` Jiri Olsa
@ 2019-02-20 15:11     ` Alexey Budankov
  0 siblings, 0 replies; 67+ messages in thread
From: Alexey Budankov @ 2019-02-20 15:11 UTC (permalink / raw)
  To: Jiri Olsa
  Cc: Arnaldo Carvalho de Melo, Ingo Molnar, Peter Zijlstra,
	Namhyung Kim, Alexander Shishkin, Andi Kleen, linux-kernel


On 12.02.2019 16:09, Jiri Olsa wrote:
> On Mon, Feb 11, 2019 at 11:23:40PM +0300, Alexey Budankov wrote:
> 
> SNIP
> 
>> diff --git a/tools/perf/util/mmap.c b/tools/perf/util/mmap.c
>> index 239e9a13c2b7..980784b77fe2 100644
>> --- a/tools/perf/util/mmap.c
>> +++ b/tools/perf/util/mmap.c
>> @@ -156,6 +156,86 @@ void __weak auxtrace_mmap_params__set_idx(struct auxtrace_mmap_params *mp __mayb
>>  {
>>  }
>>  
>> +static ssize_t perf_mmap__capture(struct perf_mmap *md, int idx,
>> +			perf_mmap__compress_fn_t compress, void *where)
>> +{
>> +	u64 head = perf_mmap__read_head(md);
>> +	unsigned char *data = md->base + page_size;
>> +	unsigned long size, size0 = 0;
>> +	void *buf;
>> +	int rc = 0;
>> +	size_t mmap_len = perf_mmap__mmap_len(md);
> 
> again, this function seems like duplicate of perf_mmap__push,
> can't you just overload the push callback?

Answered related question in other mail.
Overloading could be an option.

Thanks,
Alexey

> 
> thanks,
> jirka
> 

^ permalink raw reply	[flat|nested] 67+ messages in thread

* Re: [PATCH v2 3/4] perf record: enable runtime trace compression
  2019-02-12 13:08   ` Jiri Olsa
@ 2019-02-20 15:13     ` Alexey Budankov
  2019-02-21  9:43       ` Jiri Olsa
  0 siblings, 1 reply; 67+ messages in thread
From: Alexey Budankov @ 2019-02-20 15:13 UTC (permalink / raw)
  To: Jiri Olsa
  Cc: Arnaldo Carvalho de Melo, Ingo Molnar, Peter Zijlstra,
	Namhyung Kim, Alexander Shishkin, Andi Kleen, linux-kernel


On 12.02.2019 16:08, Jiri Olsa wrote:
> On Mon, Feb 11, 2019 at 11:23:40PM +0300, Alexey Budankov wrote:
> 
> SNIP
> 
>> diff --git a/tools/perf/util/session.c b/tools/perf/util/session.c
>> index 18fb9c8cbf9c..5d406eebd058 100644
>> --- a/tools/perf/util/session.c
>> +++ b/tools/perf/util/session.c
>> @@ -29,6 +29,112 @@
>>  #include "stat.h"
>>  #include "arch/common.h"
>>  
> 
> I think this should not be in session.c but separated in new object

Do you mean the new file?

Thanks,
Alexey


^ permalink raw reply	[flat|nested] 67+ messages in thread

* Re: [PATCH v2 4/4] perf report: support record trace file decompression
  2019-02-12 13:08   ` Jiri Olsa
@ 2019-02-20 15:19     ` Alexey Budankov
  2019-02-25 15:28       ` Alexey Budankov
  0 siblings, 1 reply; 67+ messages in thread
From: Alexey Budankov @ 2019-02-20 15:19 UTC (permalink / raw)
  To: Jiri Olsa
  Cc: Arnaldo Carvalho de Melo, Ingo Molnar, Peter Zijlstra,
	Namhyung Kim, Alexander Shishkin, Andi Kleen, linux-kernel


On 12.02.2019 16:08, Jiri Olsa wrote:
> On Mon, Feb 11, 2019 at 11:25:00PM +0300, Alexey Budankov wrote:
> 
> SNIP
> 
>>  size_t perf_session__zstd_copy(void *to __maybe_unused,
>> @@ -533,6 +646,8 @@ void perf_tool__fill_defaults(struct perf_tool *tool)
>>  		tool->time_conv = process_event_op2_stub;
>>  	if (tool->feature == NULL)
>>  		tool->feature = process_event_op2_stub;
>> +	if (tool->compressed == NULL)
>> +		tool->compressed = perf_session__process_compressed_event;
>>  }
>>  
>>  static void swap_sample_id_all(union perf_event *event, void *data)
>> @@ -1469,7 +1584,8 @@ static s64 perf_session__process_user_event(struct perf_session *session,
>>  	int fd = perf_data__fd(session->data);
>>  	int err;
>>  
>> -	dump_event(session->evlist, event, file_offset, &sample);
>> +	if (event->header.type != PERF_RECORD_COMPRESSED)
>> +		dump_event(session->evlist, event, file_offset, &sample);
>>  
>>  	/* These events are processed right away */
>>  	switch (event->header.type) {
>> @@ -1522,6 +1638,11 @@ static s64 perf_session__process_user_event(struct perf_session *session,
>>  		return tool->time_conv(session, event);
>>  	case PERF_RECORD_HEADER_FEATURE:
>>  		return tool->feature(session, event);
>> +	case PERF_RECORD_COMPRESSED:
>> +		err = tool->compressed(session, event, file_offset);
>> +		if (err)
>> +			dump_event(session->evlist, event, file_offset, &sample);
> 
> I'm not sure about having compressed callback at all, but maybe
> it could be usefull for inject, to get the compressed data..?

This code implements handling thru callbacks, for some reason,
so the new part is designed the same way. Inject flow probably 
will benefit from this approach. It is required to be tested 
additionally.

Thanks,
Alexey


> 
> I assume inject is working now and it will get you the uncompressed
> perf.data?
> 
> jirka
> 

^ permalink raw reply	[flat|nested] 67+ messages in thread

* Re: [PATCH v2 2/4] perf record: implement -z=<level> and --mmap-flush=<thres> options
  2019-02-12 13:09   ` Jiri Olsa
@ 2019-02-20 15:19     ` Alexey Budankov
  0 siblings, 0 replies; 67+ messages in thread
From: Alexey Budankov @ 2019-02-20 15:19 UTC (permalink / raw)
  To: Jiri Olsa
  Cc: Arnaldo Carvalho de Melo, Ingo Molnar, Peter Zijlstra,
	Namhyung Kim, Alexander Shishkin, Andi Kleen, linux-kernel


On 12.02.2019 16:09, Jiri Olsa wrote:
> On Mon, Feb 11, 2019 at 11:22:38PM +0300, Alexey Budankov wrote:
>>
>> Implement -z,--compression_level=<n> and --mmap-flush=<dump_least_size>
>> options as well as a special PERF_RECORD_COMPRESSED record that contains
>> compressed parts of kernel data buffer.
>>
>> Because compression requires auxilary memory to implement encoding of
>> kernel data record->opts.nr_cblocks == -1 signifies to preallocate single
>> AIO data buffer aio.data[0] without accompnying AIO control blocks.
>>
>> Signed-off-by: Alexey Budankov <alexey.budankov@linux.intel.com>
>> ---
>> Changes in v2:
>> - enabled allocation aio data buffers for compression
>>
>> ---
>>  tools/perf/Documentation/perf-record.txt |   9 ++
>>  tools/perf/builtin-record.c              | 110 +++++++++++++++++++----
>>  tools/perf/perf.h                        |   2 +
>>  tools/perf/util/env.h                    |  10 +++
>>  tools/perf/util/event.c                  |   1 +
>>  tools/perf/util/event.h                  |   7 ++
>>  tools/perf/util/evlist.c                 |   6 +-
>>  tools/perf/util/evlist.h                 |   3 +-
>>  tools/perf/util/header.c                 |  45 +++++++++-
>>  tools/perf/util/header.h                 |   1 +
>>  tools/perf/util/mmap.c                   |  98 ++++++++++++--------
>>  tools/perf/util/mmap.h                   |   7 +-
>>  tools/perf/util/session.h                |   2 +
>>  13 files changed, 240 insertions(+), 61 deletions(-)
> 
> this patch is super complicated.. please think of reviewers
> when writing code ;-)
> 
> it needs to be split to small logical patches,
> where each change is explained in changelog

Yes, Will do.

Thanks,
Alexey

> 
> jirka
> 

^ permalink raw reply	[flat|nested] 67+ messages in thread

* Re: [PATCH v2 2/4] perf record: implement -z=<level> and --mmap-flush=<thres> options
  2019-02-12 13:08   ` Jiri Olsa
@ 2019-02-20 15:24     ` Alexey Budankov
  2019-02-21  9:49       ` Jiri Olsa
  0 siblings, 1 reply; 67+ messages in thread
From: Alexey Budankov @ 2019-02-20 15:24 UTC (permalink / raw)
  To: Jiri Olsa
  Cc: Arnaldo Carvalho de Melo, Ingo Molnar, Peter Zijlstra,
	Namhyung Kim, Alexander Shishkin, Andi Kleen, linux-kernel


On 12.02.2019 16:08, Jiri Olsa wrote:
> On Mon, Feb 11, 2019 at 11:22:38PM +0300, Alexey Budankov wrote:
> 
> SNIP
> 
>> +static int perf_mmap__aio_mmap_blocks(struct perf_mmap *map);
>> +
>>  static int perf_mmap__aio_mmap(struct perf_mmap *map, struct mmap_params *mp)
>>  {
>> -	int delta_max, i, prio, ret;
>> +	int i, ret = 0, init_blocks = 1;
>>  
>>  	map->aio.nr_cblocks = mp->nr_cblocks;
>> +	if (map->aio.nr_cblocks == -1) {
>> +		map->aio.nr_cblocks = 1;
>> +		init_blocks = 0;
>> +	}
>> +
>>  	if (map->aio.nr_cblocks) {
>> -		map->aio.aiocb = calloc(map->aio.nr_cblocks, sizeof(struct aiocb *));
>> -		if (!map->aio.aiocb) {
>> -			pr_debug2("failed to allocate aiocb for data buffer, error %m\n");
>> -			return -1;
>> -		}
>> -		map->aio.cblocks = calloc(map->aio.nr_cblocks, sizeof(struct aiocb));
>> -		if (!map->aio.cblocks) {
>> -			pr_debug2("failed to allocate cblocks for data buffer, error %m\n");
>> -			return -1;
>> -		}
>>  		map->aio.data = calloc(map->aio.nr_cblocks, sizeof(void *));
>>  		if (!map->aio.data) {
>>  			pr_debug2("failed to allocate data buffer, error %m\n");
>>  			return -1;
>>  		}
>> -		delta_max = sysconf(_SC_AIO_PRIO_DELTA_MAX);
>>  		for (i = 0; i < map->aio.nr_cblocks; ++i) {
>>  			ret = perf_mmap__aio_alloc(map, i);
>>  			if (ret == -1) {
>> @@ -251,29 +245,16 @@ static int perf_mmap__aio_mmap(struct perf_mmap *map, struct mmap_params *mp)
>>  			ret = perf_mmap__aio_bind(map, i, map->cpu, mp->affinity);
>>  			if (ret == -1)
>>  				return -1;
>> -			/*
>> -			 * Use cblock.aio_fildes value different from -1
>> -			 * to denote started aio write operation on the
>> -			 * cblock so it requires explicit record__aio_sync()
>> -			 * call prior the cblock may be reused again.
>> -			 */
>> -			map->aio.cblocks[i].aio_fildes = -1;
>> -			/*
>> -			 * Allocate cblocks with priority delta to have
>> -			 * faster aio write system calls because queued requests
>> -			 * are kept in separate per-prio queues and adding
>> -			 * a new request will iterate thru shorter per-prio
>> -			 * list. Blocks with numbers higher than
>> -			 *  _SC_AIO_PRIO_DELTA_MAX go with priority 0.
>> -			 */
>> -			prio = delta_max - i;
>> -			map->aio.cblocks[i].aio_reqprio = prio >= 0 ? prio : 0;
>>  		}
>> +		if (init_blocks)
>> +			ret = perf_mmap__aio_mmap_blocks(map);
>>  	}
>>  
>> -	return 0;
>> +	return ret;
>>  }
> 
> SNIP
> 
> it seems like little refactoring happened in here (up and down) for
> aio code, which is not explained and I'm unable to follow it.. please
> separate this in simple change

AIO buffers management has been taken out of HAVE_AIO_SUPPORT define
to be used for compression in case of serial streaming. It will be 
revisited after other issues are addressed.

Thanks,
Alexey

> 
> SNIP
> 
>> +#ifdef HAVE_AIO_SUPPORT
>> +static int perf_mmap__aio_mmap_blocks(struct perf_mmap *map)
>> +{
>> +	int delta_max, i, prio;
>> +
>> +	map->aio.aiocb = calloc(map->aio.nr_cblocks, sizeof(struct aiocb *));
>> +	if (!map->aio.aiocb) {
>> +		pr_debug2("failed to allocate aiocb for data buffer, error %m\n");
>> +		return -1;
>> +	}
>> +	map->aio.cblocks = calloc(map->aio.nr_cblocks, sizeof(struct aiocb));
>> +	if (!map->aio.cblocks) {
>> +		pr_debug2("failed to allocate cblocks for data buffer, error %m\n");
>> +		return -1;
>> +	}
>> +	delta_max = sysconf(_SC_AIO_PRIO_DELTA_MAX);
>> +	for (i = 0; i < map->aio.nr_cblocks; ++i) {
>> +		/*
>> +		 * Use cblock.aio_fildes value different from -1
>> +		 * to denote started aio write operation on the
>> +		 * cblock so it requires explicit record__aio_sync()
>> +		 * call prior the cblock may be reused again.
>> +		 */
>> +		map->aio.cblocks[i].aio_fildes = -1;
>> +		/*
>> +		 * Allocate cblocks with priority delta to have
>> +		 * faster aio write system calls because queued requests
>> +		 * are kept in separate per-prio queues and adding
>> +		 * a new request will iterate thru shorter per-prio
>> +		 * list. Blocks with numbers higher than
>> +		 *  _SC_AIO_PRIO_DELTA_MAX go with priority 0.
>> +		 */
>> +		prio = delta_max - i;
>> +		map->aio.cblocks[i].aio_reqprio = prio >= 0 ? prio : 0;
>> +	}
>> +
>> +	return 0;
>> +}
>> +
> 
> SNIP
> 

^ permalink raw reply	[flat|nested] 67+ messages in thread

* Re: [PATCH v2 3/4] perf record: enable runtime trace compression
  2019-02-20 15:13     ` Alexey Budankov
@ 2019-02-21  9:43       ` Jiri Olsa
  2019-02-21 11:30         ` Alexey Budankov
  0 siblings, 1 reply; 67+ messages in thread
From: Jiri Olsa @ 2019-02-21  9:43 UTC (permalink / raw)
  To: Alexey Budankov
  Cc: Arnaldo Carvalho de Melo, Ingo Molnar, Peter Zijlstra,
	Namhyung Kim, Alexander Shishkin, Andi Kleen, linux-kernel

On Wed, Feb 20, 2019 at 06:13:02PM +0300, Alexey Budankov wrote:
> 
> On 12.02.2019 16:08, Jiri Olsa wrote:
> > On Mon, Feb 11, 2019 at 11:23:40PM +0300, Alexey Budankov wrote:
> > 
> > SNIP
> > 
> >> diff --git a/tools/perf/util/session.c b/tools/perf/util/session.c
> >> index 18fb9c8cbf9c..5d406eebd058 100644
> >> --- a/tools/perf/util/session.c
> >> +++ b/tools/perf/util/session.c
> >> @@ -29,6 +29,112 @@
> >>  #include "stat.h"
> >>  #include "arch/common.h"
> >>  
> > 
> > I think this should not be in session.c but separated in new object
> 
> Do you mean the new file?

yes

jirka

^ permalink raw reply	[flat|nested] 67+ messages in thread

* Re: [PATCH v2 3/4] perf record: enable runtime trace compression
  2019-02-20 14:53     ` Alexey Budankov
@ 2019-02-21  9:43       ` Jiri Olsa
  2019-02-21 11:18         ` Alexey Budankov
  0 siblings, 1 reply; 67+ messages in thread
From: Jiri Olsa @ 2019-02-21  9:43 UTC (permalink / raw)
  To: Alexey Budankov
  Cc: Arnaldo Carvalho de Melo, Ingo Molnar, Peter Zijlstra,
	Namhyung Kim, Alexander Shishkin, Andi Kleen, linux-kernel

On Wed, Feb 20, 2019 at 05:53:17PM +0300, Alexey Budankov wrote:
> 
> On 12.02.2019 16:08, Jiri Olsa wrote:
> > On Mon, Feb 11, 2019 at 11:23:40PM +0300, Alexey Budankov wrote:
> > 
> > SNIP
> > 
> >> @@ -774,6 +775,8 @@ static int record__mmap_read_evlist(struct record *rec, struct perf_evlist *evli
> >>  	struct perf_mmap *maps;
> >>  	int trace_fd = rec->data.file.fd;
> >>  	off_t off;
> >> +	struct perf_session *session = rec->session;
> >> +	perf_mmap__compress_fn_t compress_fn;
> >>  
> >>  	if (!evlist)
> >>  		return 0;
> >> @@ -785,6 +788,9 @@ static int record__mmap_read_evlist(struct record *rec, struct perf_evlist *evli
> >>  	if (overwrite && evlist->bkw_mmap_state != BKW_MMAP_DATA_PENDING)
> >>  		return 0;
> >>  
> >> +	compress_fn = (record__comp_enabled(rec) ?
> >> +		perf_session__zstd_compress : perf_session__zstd_copy);
> >> +
> > 
> > I don't follow what's the perf_session__zstd_copy function for..?
> 
> It bridges AIO without compression case.

so current state then.. why do we need new function for that?

jirka

> 
> Thanks,
> Alexey
> 
> > 
> > for !record__comp_enabled case we seem not to use it
> > and calling the current perf_mmap__push interface
> > 
> > however I dont see point to have this function at all
> > 
> > jirka
> > 
> > 
> >>  	if (record__aio_enabled(rec))
> >>  		off = record__aio_get_pos(trace_fd);
> >>  
> >> @@ -799,11 +805,21 @@ static int record__mmap_read_evlist(struct record *rec, struct perf_evlist *evli
> >>  				map->flush = MMAP_FLUSH_DEFAULT;
> >>  			}
> >>  			if (!record__aio_enabled(rec)) {
> >> -				if (perf_mmap__push(map, rec, record__pushfn) != 0) {
> >> -					if (sync)
> >> -						map->flush = flush;
> >> -					rc = -1;
> >> -					goto out;
> >> +				if (!record__comp_enabled(rec)) {
> >> +					if (perf_mmap__push(map, rec, record__pushfn) != 0) {
> >> +						if (sync)
> >> +							map->flush = flush;
> >> +						rc = -1;
> >> +						goto out;
> >> +					}
> >> +				} else {
> >> +					if (perf_mmap__pack(map, rec, record__pushfn,
> >> +							compress_fn, session) != 0) {
> >> +						if (sync)
> >> +							map->flush = flush;
> >> +						rc = -1;
> >> +						goto out;
> >> +					}
> > 
> > SNIP
> > 

^ permalink raw reply	[flat|nested] 67+ messages in thread

* Re: [PATCH v2 2/4] perf record: implement -z=<level> and --mmap-flush=<thres> options
  2019-02-20 14:15     ` Alexey Budankov
@ 2019-02-21  9:47       ` Jiri Olsa
  2019-02-21 11:23         ` Alexey Budankov
  0 siblings, 1 reply; 67+ messages in thread
From: Jiri Olsa @ 2019-02-21  9:47 UTC (permalink / raw)
  To: Alexey Budankov
  Cc: Arnaldo Carvalho de Melo, Ingo Molnar, Peter Zijlstra,
	Namhyung Kim, Alexander Shishkin, Andi Kleen, linux-kernel

On Wed, Feb 20, 2019 at 05:15:13PM +0300, Alexey Budankov wrote:
> 
> On 12.02.2019 16:08, Jiri Olsa wrote:
> > On Mon, Feb 11, 2019 at 11:22:38PM +0300, Alexey Budankov wrote:
> > 
> > SNIP
> > 
> >> -	if (rec->opts.nr_cblocks > nr_cblocks_max)
> >> -		rec->opts.nr_cblocks = nr_cblocks_max;
> >> -	if (verbose > 0)
> >> -		pr_info("nr_cblocks: %d\n", rec->opts.nr_cblocks);
> >> +	if (rec->opts.comp_level > 22)
> >> +		rec->opts.comp_level = 0;
> >> +	if (record__comp_enabled(rec) && !rec->opts.nr_cblocks) {
> >> +		 /*
> >> +		  * Allocate aio.data[0] buffer for compression.
> >> +		  */
> >> +		rec->opts.nr_cblocks = -1;
> >> +	}
> > 
> > I assume you're using aio.data[0] as a buffer for non-aio case
> > as compression buffer.. if that's the case, please dont do that
> > and use separate buffer for that
> 
> Do you want it to be one more field in the mmap struct or 
> aio related fields still could be reworked?

I dont like it to use aio buffers if it's generic feature,
so please add new pointer for the needed buffer

jirka

^ permalink raw reply	[flat|nested] 67+ messages in thread

* Re: [PATCH v2 2/4] perf record: implement -z=<level> and --mmap-flush=<thres> options
  2019-02-20 15:24     ` Alexey Budankov
@ 2019-02-21  9:49       ` Jiri Olsa
  2019-02-21 11:24         ` Alexey Budankov
  0 siblings, 1 reply; 67+ messages in thread
From: Jiri Olsa @ 2019-02-21  9:49 UTC (permalink / raw)
  To: Alexey Budankov
  Cc: Arnaldo Carvalho de Melo, Ingo Molnar, Peter Zijlstra,
	Namhyung Kim, Alexander Shishkin, Andi Kleen, linux-kernel

On Wed, Feb 20, 2019 at 06:24:30PM +0300, Alexey Budankov wrote:
> 
> On 12.02.2019 16:08, Jiri Olsa wrote:
> > On Mon, Feb 11, 2019 at 11:22:38PM +0300, Alexey Budankov wrote:
> > 
> > SNIP
> > 
> >> +static int perf_mmap__aio_mmap_blocks(struct perf_mmap *map);
> >> +
> >>  static int perf_mmap__aio_mmap(struct perf_mmap *map, struct mmap_params *mp)
> >>  {
> >> -	int delta_max, i, prio, ret;
> >> +	int i, ret = 0, init_blocks = 1;
> >>  
> >>  	map->aio.nr_cblocks = mp->nr_cblocks;
> >> +	if (map->aio.nr_cblocks == -1) {
> >> +		map->aio.nr_cblocks = 1;
> >> +		init_blocks = 0;
> >> +	}
> >> +
> >>  	if (map->aio.nr_cblocks) {
> >> -		map->aio.aiocb = calloc(map->aio.nr_cblocks, sizeof(struct aiocb *));
> >> -		if (!map->aio.aiocb) {
> >> -			pr_debug2("failed to allocate aiocb for data buffer, error %m\n");
> >> -			return -1;
> >> -		}
> >> -		map->aio.cblocks = calloc(map->aio.nr_cblocks, sizeof(struct aiocb));
> >> -		if (!map->aio.cblocks) {
> >> -			pr_debug2("failed to allocate cblocks for data buffer, error %m\n");
> >> -			return -1;
> >> -		}
> >>  		map->aio.data = calloc(map->aio.nr_cblocks, sizeof(void *));
> >>  		if (!map->aio.data) {
> >>  			pr_debug2("failed to allocate data buffer, error %m\n");
> >>  			return -1;
> >>  		}
> >> -		delta_max = sysconf(_SC_AIO_PRIO_DELTA_MAX);
> >>  		for (i = 0; i < map->aio.nr_cblocks; ++i) {
> >>  			ret = perf_mmap__aio_alloc(map, i);
> >>  			if (ret == -1) {
> >> @@ -251,29 +245,16 @@ static int perf_mmap__aio_mmap(struct perf_mmap *map, struct mmap_params *mp)
> >>  			ret = perf_mmap__aio_bind(map, i, map->cpu, mp->affinity);
> >>  			if (ret == -1)
> >>  				return -1;
> >> -			/*
> >> -			 * Use cblock.aio_fildes value different from -1
> >> -			 * to denote started aio write operation on the
> >> -			 * cblock so it requires explicit record__aio_sync()
> >> -			 * call prior the cblock may be reused again.
> >> -			 */
> >> -			map->aio.cblocks[i].aio_fildes = -1;
> >> -			/*
> >> -			 * Allocate cblocks with priority delta to have
> >> -			 * faster aio write system calls because queued requests
> >> -			 * are kept in separate per-prio queues and adding
> >> -			 * a new request will iterate thru shorter per-prio
> >> -			 * list. Blocks with numbers higher than
> >> -			 *  _SC_AIO_PRIO_DELTA_MAX go with priority 0.
> >> -			 */
> >> -			prio = delta_max - i;
> >> -			map->aio.cblocks[i].aio_reqprio = prio >= 0 ? prio : 0;
> >>  		}
> >> +		if (init_blocks)
> >> +			ret = perf_mmap__aio_mmap_blocks(map);
> >>  	}
> >>  
> >> -	return 0;
> >> +	return ret;
> >>  }
> > 
> > SNIP
> > 
> > it seems like little refactoring happened in here (up and down) for
> > aio code, which is not explained and I'm unable to follow it.. please
> > separate this in simple change
> 
> AIO buffers management has been taken out of HAVE_AIO_SUPPORT define
> to be used for compression in case of serial streaming. It will be 
> revisited after other issues are addressed.

as I said earlier, please separate this from aio

jirka

^ permalink raw reply	[flat|nested] 67+ messages in thread

* Re: [PATCH v2 3/4] perf record: enable runtime trace compression
  2019-02-21  9:43       ` Jiri Olsa
@ 2019-02-21 11:18         ` Alexey Budankov
  0 siblings, 0 replies; 67+ messages in thread
From: Alexey Budankov @ 2019-02-21 11:18 UTC (permalink / raw)
  To: Jiri Olsa
  Cc: Arnaldo Carvalho de Melo, Ingo Molnar, Peter Zijlstra,
	Namhyung Kim, Alexander Shishkin, Andi Kleen, linux-kernel


On 21.02.2019 12:43, Jiri Olsa wrote:
> On Wed, Feb 20, 2019 at 05:53:17PM +0300, Alexey Budankov wrote:
>>
>> On 12.02.2019 16:08, Jiri Olsa wrote:
>>> On Mon, Feb 11, 2019 at 11:23:40PM +0300, Alexey Budankov wrote:
>>>
>>> SNIP
>>>
>>>> @@ -774,6 +775,8 @@ static int record__mmap_read_evlist(struct record *rec, struct perf_evlist *evli
>>>>  	struct perf_mmap *maps;
>>>>  	int trace_fd = rec->data.file.fd;
>>>>  	off_t off;
>>>> +	struct perf_session *session = rec->session;
>>>> +	perf_mmap__compress_fn_t compress_fn;
>>>>  
>>>>  	if (!evlist)
>>>>  		return 0;
>>>> @@ -785,6 +788,9 @@ static int record__mmap_read_evlist(struct record *rec, struct perf_evlist *evli
>>>>  	if (overwrite && evlist->bkw_mmap_state != BKW_MMAP_DATA_PENDING)
>>>>  		return 0;
>>>>  
>>>> +	compress_fn = (record__comp_enabled(rec) ?
>>>> +		perf_session__zstd_compress : perf_session__zstd_copy);
>>>> +
>>>
>>> I don't follow what's the perf_session__zstd_copy function for..?
>>
>> It bridges AIO without compression case.
> 
> so current state then.. why do we need new function for that?

This copy() function has the same signature as compress() 
so it can be used as the replacement in the _pack() call 
instead of memcpy() with different signature.

Thanks,
Alexey

> 
> jirka

^ permalink raw reply	[flat|nested] 67+ messages in thread

* Re: [PATCH v2 2/4] perf record: implement -z=<level> and --mmap-flush=<thres> options
  2019-02-21  9:47       ` Jiri Olsa
@ 2019-02-21 11:23         ` Alexey Budankov
  2019-02-25 15:26           ` Alexey Budankov
  0 siblings, 1 reply; 67+ messages in thread
From: Alexey Budankov @ 2019-02-21 11:23 UTC (permalink / raw)
  To: Jiri Olsa
  Cc: Arnaldo Carvalho de Melo, Ingo Molnar, Peter Zijlstra,
	Namhyung Kim, Alexander Shishkin, Andi Kleen, linux-kernel


On 21.02.2019 12:47, Jiri Olsa wrote:
> On Wed, Feb 20, 2019 at 05:15:13PM +0300, Alexey Budankov wrote:
>>
>> On 12.02.2019 16:08, Jiri Olsa wrote:
>>> On Mon, Feb 11, 2019 at 11:22:38PM +0300, Alexey Budankov wrote:
>>>
>>> SNIP
>>>
>>>> -	if (rec->opts.nr_cblocks > nr_cblocks_max)
>>>> -		rec->opts.nr_cblocks = nr_cblocks_max;
>>>> -	if (verbose > 0)
>>>> -		pr_info("nr_cblocks: %d\n", rec->opts.nr_cblocks);
>>>> +	if (rec->opts.comp_level > 22)
>>>> +		rec->opts.comp_level = 0;
>>>> +	if (record__comp_enabled(rec) && !rec->opts.nr_cblocks) {
>>>> +		 /*
>>>> +		  * Allocate aio.data[0] buffer for compression.
>>>> +		  */
>>>> +		rec->opts.nr_cblocks = -1;
>>>> +	}
>>>
>>> I assume you're using aio.data[0] as a buffer for non-aio case
>>> as compression buffer.. if that's the case, please dont do that
>>> and use separate buffer for that
>>
>> Do you want it to be one more field in the mmap struct or 
>> aio related fields still could be reworked?
> 
> I dont like it to use aio buffers if it's generic feature,
> so please add new pointer for the needed buffer

Ok.

Thanks,
Alexey

> 
> jirka
> 

^ permalink raw reply	[flat|nested] 67+ messages in thread

* Re: [PATCH v2 2/4] perf record: implement -z=<level> and --mmap-flush=<thres> options
  2019-02-21  9:49       ` Jiri Olsa
@ 2019-02-21 11:24         ` Alexey Budankov
  2019-02-25 15:27           ` Alexey Budankov
  0 siblings, 1 reply; 67+ messages in thread
From: Alexey Budankov @ 2019-02-21 11:24 UTC (permalink / raw)
  To: Jiri Olsa
  Cc: Arnaldo Carvalho de Melo, Ingo Molnar, Peter Zijlstra,
	Namhyung Kim, Alexander Shishkin, Andi Kleen, linux-kernel


On 21.02.2019 12:49, Jiri Olsa wrote:
> On Wed, Feb 20, 2019 at 06:24:30PM +0300, Alexey Budankov wrote:
>>
>> On 12.02.2019 16:08, Jiri Olsa wrote:
>>> On Mon, Feb 11, 2019 at 11:22:38PM +0300, Alexey Budankov wrote:
>>>
>>> SNIP
>>>
>>>> +static int perf_mmap__aio_mmap_blocks(struct perf_mmap *map);
>>>> +
>>>>  static int perf_mmap__aio_mmap(struct perf_mmap *map, struct mmap_params *mp)
>>>>  {
>>>> -	int delta_max, i, prio, ret;
>>>> +	int i, ret = 0, init_blocks = 1;
>>>>  
>>>>  	map->aio.nr_cblocks = mp->nr_cblocks;
>>>> +	if (map->aio.nr_cblocks == -1) {
>>>> +		map->aio.nr_cblocks = 1;
>>>> +		init_blocks = 0;
>>>> +	}
>>>> +
>>>>  	if (map->aio.nr_cblocks) {
>>>> -		map->aio.aiocb = calloc(map->aio.nr_cblocks, sizeof(struct aiocb *));
>>>> -		if (!map->aio.aiocb) {
>>>> -			pr_debug2("failed to allocate aiocb for data buffer, error %m\n");
>>>> -			return -1;
>>>> -		}
>>>> -		map->aio.cblocks = calloc(map->aio.nr_cblocks, sizeof(struct aiocb));
>>>> -		if (!map->aio.cblocks) {
>>>> -			pr_debug2("failed to allocate cblocks for data buffer, error %m\n");
>>>> -			return -1;
>>>> -		}
>>>>  		map->aio.data = calloc(map->aio.nr_cblocks, sizeof(void *));
>>>>  		if (!map->aio.data) {
>>>>  			pr_debug2("failed to allocate data buffer, error %m\n");
>>>>  			return -1;
>>>>  		}
>>>> -		delta_max = sysconf(_SC_AIO_PRIO_DELTA_MAX);
>>>>  		for (i = 0; i < map->aio.nr_cblocks; ++i) {
>>>>  			ret = perf_mmap__aio_alloc(map, i);
>>>>  			if (ret == -1) {
>>>> @@ -251,29 +245,16 @@ static int perf_mmap__aio_mmap(struct perf_mmap *map, struct mmap_params *mp)
>>>>  			ret = perf_mmap__aio_bind(map, i, map->cpu, mp->affinity);
>>>>  			if (ret == -1)
>>>>  				return -1;
>>>> -			/*
>>>> -			 * Use cblock.aio_fildes value different from -1
>>>> -			 * to denote started aio write operation on the
>>>> -			 * cblock so it requires explicit record__aio_sync()
>>>> -			 * call prior the cblock may be reused again.
>>>> -			 */
>>>> -			map->aio.cblocks[i].aio_fildes = -1;
>>>> -			/*
>>>> -			 * Allocate cblocks with priority delta to have
>>>> -			 * faster aio write system calls because queued requests
>>>> -			 * are kept in separate per-prio queues and adding
>>>> -			 * a new request will iterate thru shorter per-prio
>>>> -			 * list. Blocks with numbers higher than
>>>> -			 *  _SC_AIO_PRIO_DELTA_MAX go with priority 0.
>>>> -			 */
>>>> -			prio = delta_max - i;
>>>> -			map->aio.cblocks[i].aio_reqprio = prio >= 0 ? prio : 0;
>>>>  		}
>>>> +		if (init_blocks)
>>>> +			ret = perf_mmap__aio_mmap_blocks(map);
>>>>  	}
>>>>  
>>>> -	return 0;
>>>> +	return ret;
>>>>  }
>>>
>>> SNIP
>>>
>>> it seems like little refactoring happened in here (up and down) for
>>> aio code, which is not explained and I'm unable to follow it.. please
>>> separate this in simple change
>>
>> AIO buffers management has been taken out of HAVE_AIO_SUPPORT define
>> to be used for compression in case of serial streaming. It will be 
>> revisited after other issues are addressed.
> 
> as I said earlier, please separate this from aio

Ok.

Thanks,
Alexey


> 
> jirka
> 

^ permalink raw reply	[flat|nested] 67+ messages in thread

* Re: [PATCH v2 3/4] perf record: enable runtime trace compression
  2019-02-21  9:43       ` Jiri Olsa
@ 2019-02-21 11:30         ` Alexey Budankov
  0 siblings, 0 replies; 67+ messages in thread
From: Alexey Budankov @ 2019-02-21 11:30 UTC (permalink / raw)
  To: Jiri Olsa
  Cc: Arnaldo Carvalho de Melo, Ingo Molnar, Peter Zijlstra,
	Namhyung Kim, Alexander Shishkin, Andi Kleen, linux-kernel


On 21.02.2019 12:43, Jiri Olsa wrote:
> On Wed, Feb 20, 2019 at 06:13:02PM +0300, Alexey Budankov wrote:
>>
>> On 12.02.2019 16:08, Jiri Olsa wrote:
>>> On Mon, Feb 11, 2019 at 11:23:40PM +0300, Alexey Budankov wrote:
>>>
>>> SNIP
>>>
>>>> diff --git a/tools/perf/util/session.c b/tools/perf/util/session.c
>>>> index 18fb9c8cbf9c..5d406eebd058 100644
>>>> --- a/tools/perf/util/session.c
>>>> +++ b/tools/perf/util/session.c
>>>> @@ -29,6 +29,112 @@
>>>>  #include "stat.h"
>>>>  #include "arch/common.h"
>>>>  
>>>
>>> I think this should not be in session.c but separated in new object
>>
>> Do you mean the new file?
> 
> yes

tools/perf/util/compress.{c,h}

Thanks,
Alexey

> 
> jirka
> 

^ permalink raw reply	[flat|nested] 67+ messages in thread

* Re: [PATCH v2 2/4] perf record: implement -z=<level> and --mmap-flush=<thres> options
  2019-02-21 11:23         ` Alexey Budankov
@ 2019-02-25 15:26           ` Alexey Budankov
  0 siblings, 0 replies; 67+ messages in thread
From: Alexey Budankov @ 2019-02-25 15:26 UTC (permalink / raw)
  To: Jiri Olsa
  Cc: Arnaldo Carvalho de Melo, Ingo Molnar, Peter Zijlstra,
	Namhyung Kim, Alexander Shishkin, Andi Kleen, linux-kernel


On 21.02.2019 14:23, Alexey Budankov wrote:
> 
> On 21.02.2019 12:47, Jiri Olsa wrote:
>> On Wed, Feb 20, 2019 at 05:15:13PM +0300, Alexey Budankov wrote:
>>>
>>> On 12.02.2019 16:08, Jiri Olsa wrote:
>>>> On Mon, Feb 11, 2019 at 11:22:38PM +0300, Alexey Budankov wrote:
>>>>
>>>> SNIP
>>>>
>>>>> -	if (rec->opts.nr_cblocks > nr_cblocks_max)
>>>>> -		rec->opts.nr_cblocks = nr_cblocks_max;
>>>>> -	if (verbose > 0)
>>>>> -		pr_info("nr_cblocks: %d\n", rec->opts.nr_cblocks);
>>>>> +	if (rec->opts.comp_level > 22)
>>>>> +		rec->opts.comp_level = 0;
>>>>> +	if (record__comp_enabled(rec) && !rec->opts.nr_cblocks) {
>>>>> +		 /*
>>>>> +		  * Allocate aio.data[0] buffer for compression.
>>>>> +		  */
>>>>> +		rec->opts.nr_cblocks = -1;
>>>>> +	}
>>>>
>>>> I assume you're using aio.data[0] as a buffer for non-aio case
>>>> as compression buffer.. if that's the case, please dont do that
>>>> and use separate buffer for that
>>>
>>> Do you want it to be one more field in the mmap struct or 
>>> aio related fields still could be reworked?
>>
>> I dont like it to use aio buffers if it's generic feature,
>> so please add new pointer for the needed buffer
> 
> Ok.

Addressed.

> 
> Thanks,
> Alexey
> 
>>
>> jirka
>>
> 

^ permalink raw reply	[flat|nested] 67+ messages in thread

* Re: [PATCH v2 2/4] perf record: implement -z=<level> and --mmap-flush=<thres> options
  2019-02-21 11:24         ` Alexey Budankov
@ 2019-02-25 15:27           ` Alexey Budankov
  0 siblings, 0 replies; 67+ messages in thread
From: Alexey Budankov @ 2019-02-25 15:27 UTC (permalink / raw)
  To: Jiri Olsa
  Cc: Arnaldo Carvalho de Melo, Ingo Molnar, Peter Zijlstra,
	Namhyung Kim, Alexander Shishkin, Andi Kleen, linux-kernel


On 21.02.2019 14:24, Alexey Budankov wrote:
> 
> On 21.02.2019 12:49, Jiri Olsa wrote:
>> On Wed, Feb 20, 2019 at 06:24:30PM +0300, Alexey Budankov wrote:
>>>
>>> On 12.02.2019 16:08, Jiri Olsa wrote:
>>>> On Mon, Feb 11, 2019 at 11:22:38PM +0300, Alexey Budankov wrote:
>>>>
>>>> SNIP
>>>>
>>>>> +static int perf_mmap__aio_mmap_blocks(struct perf_mmap *map);
>>>>> +
>>>>>  static int perf_mmap__aio_mmap(struct perf_mmap *map, struct mmap_params *mp)
>>>>>  {
>>>>> -	int delta_max, i, prio, ret;
>>>>> +	int i, ret = 0, init_blocks = 1;
>>>>>  
>>>>>  	map->aio.nr_cblocks = mp->nr_cblocks;
>>>>> +	if (map->aio.nr_cblocks == -1) {
>>>>> +		map->aio.nr_cblocks = 1;
>>>>> +		init_blocks = 0;
>>>>> +	}
>>>>> +
>>>>>  	if (map->aio.nr_cblocks) {
>>>>> -		map->aio.aiocb = calloc(map->aio.nr_cblocks, sizeof(struct aiocb *));
>>>>> -		if (!map->aio.aiocb) {
>>>>> -			pr_debug2("failed to allocate aiocb for data buffer, error %m\n");
>>>>> -			return -1;
>>>>> -		}
>>>>> -		map->aio.cblocks = calloc(map->aio.nr_cblocks, sizeof(struct aiocb));
>>>>> -		if (!map->aio.cblocks) {
>>>>> -			pr_debug2("failed to allocate cblocks for data buffer, error %m\n");
>>>>> -			return -1;
>>>>> -		}
>>>>>  		map->aio.data = calloc(map->aio.nr_cblocks, sizeof(void *));
>>>>>  		if (!map->aio.data) {
>>>>>  			pr_debug2("failed to allocate data buffer, error %m\n");
>>>>>  			return -1;
>>>>>  		}
>>>>> -		delta_max = sysconf(_SC_AIO_PRIO_DELTA_MAX);
>>>>>  		for (i = 0; i < map->aio.nr_cblocks; ++i) {
>>>>>  			ret = perf_mmap__aio_alloc(map, i);
>>>>>  			if (ret == -1) {
>>>>> @@ -251,29 +245,16 @@ static int perf_mmap__aio_mmap(struct perf_mmap *map, struct mmap_params *mp)
>>>>>  			ret = perf_mmap__aio_bind(map, i, map->cpu, mp->affinity);
>>>>>  			if (ret == -1)
>>>>>  				return -1;
>>>>> -			/*
>>>>> -			 * Use cblock.aio_fildes value different from -1
>>>>> -			 * to denote started aio write operation on the
>>>>> -			 * cblock so it requires explicit record__aio_sync()
>>>>> -			 * call prior the cblock may be reused again.
>>>>> -			 */
>>>>> -			map->aio.cblocks[i].aio_fildes = -1;
>>>>> -			/*
>>>>> -			 * Allocate cblocks with priority delta to have
>>>>> -			 * faster aio write system calls because queued requests
>>>>> -			 * are kept in separate per-prio queues and adding
>>>>> -			 * a new request will iterate thru shorter per-prio
>>>>> -			 * list. Blocks with numbers higher than
>>>>> -			 *  _SC_AIO_PRIO_DELTA_MAX go with priority 0.
>>>>> -			 */
>>>>> -			prio = delta_max - i;
>>>>> -			map->aio.cblocks[i].aio_reqprio = prio >= 0 ? prio : 0;
>>>>>  		}
>>>>> +		if (init_blocks)
>>>>> +			ret = perf_mmap__aio_mmap_blocks(map);
>>>>>  	}
>>>>>  
>>>>> -	return 0;
>>>>> +	return ret;
>>>>>  }
>>>>
>>>> SNIP
>>>>
>>>> it seems like little refactoring happened in here (up and down) for
>>>> aio code, which is not explained and I'm unable to follow it.. please
>>>> separate this in simple change
>>>
>>> AIO buffers management has been taken out of HAVE_AIO_SUPPORT define
>>> to be used for compression in case of serial streaming. It will be 
>>> revisited after other issues are addressed.
>>
>> as I said earlier, please separate this from aio
> 
> Ok.

Addressed.

> 
> Thanks,
> Alexey
> 
> 
>>
>> jirka
>>
> 

^ permalink raw reply	[flat|nested] 67+ messages in thread

* Re: [PATCH v2 3/4] perf record: enable runtime trace compression
  2019-02-20 15:09     ` Alexey Budankov
@ 2019-02-25 15:27       ` Alexey Budankov
  0 siblings, 0 replies; 67+ messages in thread
From: Alexey Budankov @ 2019-02-25 15:27 UTC (permalink / raw)
  To: Jiri Olsa
  Cc: Arnaldo Carvalho de Melo, Ingo Molnar, Peter Zijlstra,
	Namhyung Kim, Alexander Shishkin, Andi Kleen, linux-kernel


On 20.02.2019 18:09, Alexey Budankov wrote:
> 
> On 12.02.2019 16:09, Jiri Olsa wrote:
>> On Mon, Feb 11, 2019 at 11:23:40PM +0300, Alexey Budankov wrote:
>>
>> SNIP
>>
>>> +	compress_fn = (record__comp_enabled(rec) ?
>>> +		perf_session__zstd_compress : perf_session__zstd_copy);
>>> +
>>>  	if (record__aio_enabled(rec))
>>>  		off = record__aio_get_pos(trace_fd);
>>>  
>>> @@ -799,11 +805,21 @@ static int record__mmap_read_evlist(struct record *rec, struct perf_evlist *evli
>>>  				map->flush = MMAP_FLUSH_DEFAULT;
>>>  			}
>>>  			if (!record__aio_enabled(rec)) {
>>> -				if (perf_mmap__push(map, rec, record__pushfn) != 0) {
>>> -					if (sync)
>>> -						map->flush = flush;
>>> -					rc = -1;
>>> -					goto out;
>>> +				if (!record__comp_enabled(rec)) {
>>> +					if (perf_mmap__push(map, rec, record__pushfn) != 0) {
>>> +						if (sync)
>>> +							map->flush = flush;
>>> +						rc = -1;
>>> +						goto out;
>>> +					}
>>> +				} else {
>>> +					if (perf_mmap__pack(map, rec, record__pushfn,
>>> +							compress_fn, session) != 0) {
>>> +						if (sync)
>>> +							map->flush = flush;
>>> +						rc = -1;
>>> +						goto out;
>>> +					}
>>
>> I really dont like this.. the non-compression flow needs to stay untouched
>>
>> why don't you overload the record__pushfn function for compression
>> processing and stash the data in there?
> 
> Yes, this becomes complicated and overloading could be an option to simplify it.

Redesigned compress() to be the param for __push() __aio_push() so this 
__comp_enabled() if could be avoided. Overloading of __pushfn() doesn't 
fit for aio case.

> 
> Thanks,
> Alexey
> 
>>
>> thanks,
>> jirka
>>
> 

^ permalink raw reply	[flat|nested] 67+ messages in thread

* Re: [PATCH v2 4/4] perf report: support record trace file decompression
  2019-02-20 15:19     ` Alexey Budankov
@ 2019-02-25 15:28       ` Alexey Budankov
  0 siblings, 0 replies; 67+ messages in thread
From: Alexey Budankov @ 2019-02-25 15:28 UTC (permalink / raw)
  To: Jiri Olsa
  Cc: Arnaldo Carvalho de Melo, Ingo Molnar, Peter Zijlstra,
	Namhyung Kim, Alexander Shishkin, Andi Kleen, linux-kernel


On 20.02.2019 18:19, Alexey Budankov wrote:
> 
> On 12.02.2019 16:08, Jiri Olsa wrote:
>> On Mon, Feb 11, 2019 at 11:25:00PM +0300, Alexey Budankov wrote:
>>
>> SNIP
>>
>>>  size_t perf_session__zstd_copy(void *to __maybe_unused,
>>> @@ -533,6 +646,8 @@ void perf_tool__fill_defaults(struct perf_tool *tool)
>>>  		tool->time_conv = process_event_op2_stub;
>>>  	if (tool->feature == NULL)
>>>  		tool->feature = process_event_op2_stub;
>>> +	if (tool->compressed == NULL)
>>> +		tool->compressed = perf_session__process_compressed_event;
>>>  }
>>>  
>>>  static void swap_sample_id_all(union perf_event *event, void *data)
>>> @@ -1469,7 +1584,8 @@ static s64 perf_session__process_user_event(struct perf_session *session,
>>>  	int fd = perf_data__fd(session->data);
>>>  	int err;
>>>  
>>> -	dump_event(session->evlist, event, file_offset, &sample);
>>> +	if (event->header.type != PERF_RECORD_COMPRESSED)
>>> +		dump_event(session->evlist, event, file_offset, &sample);
>>>  
>>>  	/* These events are processed right away */
>>>  	switch (event->header.type) {
>>> @@ -1522,6 +1638,11 @@ static s64 perf_session__process_user_event(struct perf_session *session,
>>>  		return tool->time_conv(session, event);
>>>  	case PERF_RECORD_HEADER_FEATURE:
>>>  		return tool->feature(session, event);
>>> +	case PERF_RECORD_COMPRESSED:
>>> +		err = tool->compressed(session, event, file_offset);
>>> +		if (err)
>>> +			dump_event(session->evlist, event, file_offset, &sample);
>>
>> I'm not sure about having compressed callback at all, but maybe
>> it could be usefull for inject, to get the compressed data..?
> 
> This code implements handling thru callbacks, for some reason,
> so the new part is designed the same way. Inject flow probably 
> will benefit from this approach. It is required to be tested 
> additionally.

Inject flow is now enabled too.

> 
> Thanks,
> Alexey
> 
> 
>>
>> I assume inject is working now and it will get you the uncompressed
>> perf.data?
>>
>> jirka
>>
> 

^ permalink raw reply	[flat|nested] 67+ messages in thread

* Re: [PATCH v2 2/4] perf record: implement -z=<level> and --mmap-flush=<thres> options
  2019-02-20 14:14     ` Alexey Budankov
@ 2019-02-25 15:30       ` Alexey Budankov
  0 siblings, 0 replies; 67+ messages in thread
From: Alexey Budankov @ 2019-02-25 15:30 UTC (permalink / raw)
  To: Jiri Olsa
  Cc: Arnaldo Carvalho de Melo, Ingo Molnar, Peter Zijlstra,
	Namhyung Kim, Alexander Shishkin, Andi Kleen, linux-kernel


On 20.02.2019 17:14, Alexey Budankov wrote:
> 
> On 12.02.2019 16:09, Jiri Olsa wrote:
>> On Mon, Feb 11, 2019 at 11:22:38PM +0300, Alexey Budankov wrote:
>>
>> SNIP
>>
>>> @@ -1147,6 +1193,10 @@ static int __cmd_record(struct record *rec, int argc, const char **argv)
>>>  	fd = perf_data__fd(data);
>>>  	rec->session = session;
>>>  
>>> +	rec->opts.comp_level = 0;
>>> +	session->header.env.comp_level = rec->opts.comp_level;
>>> +	session->header.env.comp_type = PERF_COMP_NONE;
>>> +
>>>  	record__init_features(rec);
>>>  
>>>  	if (rec->opts.use_clockid && rec->opts.clockid_res_ns)
>>> @@ -1176,6 +1226,7 @@ static int __cmd_record(struct record *rec, int argc, const char **argv)
>>>  		err = -1;
>>>  		goto out_child;
>>>  	}
>>> +	session->header.env.comp_mmap_len = session->evlist->mmap_len;
>>
>> so the comp_mmap_len is the max length of the compressed packet?
> 
> comp_mmap_len is the size of buffer to encompass one compressed chunk 
> of data after its decompression.
> 
>>
>> any idea if this value might have some impact on the processing speed?
> 
> It increases memory consumption at the loading and processing stages.
> 
>>
>> I see you mentioned the size reduction, could you also meassure
>> the record overhead?
> 
> Let's get back to this after the code review.

Overhead numbers are provided in v3.

> 
> Thanks,
> Alexey
> 
>>
>> thanks,
>> jirka
>>
> 

^ permalink raw reply	[flat|nested] 67+ messages in thread

* Re: [PATCH v2 4/4] perf report: support record trace file decompression
       [not found]       ` <0132ec08-e28b-4102-5053-8f8e21e7fd44@linux.intel.com>
@ 2019-02-27 10:56         ` Alexey Budankov
  2019-02-27 11:17           ` Jiri Olsa
  0 siblings, 1 reply; 67+ messages in thread
From: Alexey Budankov @ 2019-02-27 10:56 UTC (permalink / raw)
  To: Jiri Olsa
  Cc: Arnaldo Carvalho de Melo, Ingo Molnar, Peter Zijlstra,
	Namhyung Kim, Alexander Shishkin, Andi Kleen, linux-kernel

Hi,

On 25.02.2019 19:04, Alexey Budankov wrote:
> 
> On 20.02.2019 17:48, Alexey Budankov wrote:
>>
>> On 12.02.2019 16:09, Jiri Olsa wrote:
>>> On Mon, Feb 11, 2019 at 11:25:00PM +0300, Alexey Budankov wrote:
>>>>
>>>> PERF_RECORD_COMPRESSED records are decompressed from trace file into
>>>> a linked list of mmaped memory regions using streaming Zstandard API.
>>>> After that the regions are loaded fetching uncompressed events. When
>>>> dumping raw trace (e.g., perf report -D --header) file offsets of
>>>> events from compressed records are set to zero.
>>>>
>>>> Signed-off-by: Alexey Budankov <alexey.budankov@linux.intel.com>
>>>> ---
>>>> Changes in v2:
>>>> - moved compression/decompression code to session layer
>>>
>>> could you please add some automated test for this?
>>>
>>> I think it could be part of the sample synthesize test 

Do you mean this one?

tools/perf/tests/sample-parsing.c

Thanks,
Alexey

>>> or even new script test under tests/shell.. or anything
>>> else really.. just to see that we have some clean
>>> interface for this that we can test
>>
>> This is possible. Let's complete the code review first.
>>
>> Thanks,
>> Alexey
>>
>>>
>>> thanks,
>>> jirka
>>>
>>

^ permalink raw reply	[flat|nested] 67+ messages in thread

* Re: [PATCH v2 4/4] perf report: support record trace file decompression
  2019-02-27 10:56         ` Alexey Budankov
@ 2019-02-27 11:17           ` Jiri Olsa
  0 siblings, 0 replies; 67+ messages in thread
From: Jiri Olsa @ 2019-02-27 11:17 UTC (permalink / raw)
  To: Alexey Budankov
  Cc: Arnaldo Carvalho de Melo, Ingo Molnar, Peter Zijlstra,
	Namhyung Kim, Alexander Shishkin, Andi Kleen, linux-kernel

On Wed, Feb 27, 2019 at 01:56:09PM +0300, Alexey Budankov wrote:
> Hi,
> 
> On 25.02.2019 19:04, Alexey Budankov wrote:
> > 
> > On 20.02.2019 17:48, Alexey Budankov wrote:
> >>
> >> On 12.02.2019 16:09, Jiri Olsa wrote:
> >>> On Mon, Feb 11, 2019 at 11:25:00PM +0300, Alexey Budankov wrote:
> >>>>
> >>>> PERF_RECORD_COMPRESSED records are decompressed from trace file into
> >>>> a linked list of mmaped memory regions using streaming Zstandard API.
> >>>> After that the regions are loaded fetching uncompressed events. When
> >>>> dumping raw trace (e.g., perf report -D --header) file offsets of
> >>>> events from compressed records are set to zero.
> >>>>
> >>>> Signed-off-by: Alexey Budankov <alexey.budankov@linux.intel.com>
> >>>> ---
> >>>> Changes in v2:
> >>>> - moved compression/decompression code to session layer
> >>>
> >>> could you please add some automated test for this?
> >>>
> >>> I think it could be part of the sample synthesize test 
> 
> Do you mean this one?
> 
> tools/perf/tests/sample-parsing.c

yea, but on second look it does not look suitable,
more likely new shell test under tests/shell?

besically any test that would test that compression
and decompression of events

thanks,
jirka

^ permalink raw reply	[flat|nested] 67+ messages in thread

* Re: [PATCH v2 0/4] perf: enable compression of record mode trace to save storage space
  2019-02-11 13:52 ` Jiri Olsa
@ 2019-02-11 13:59   ` Alexey Budankov
  0 siblings, 0 replies; 67+ messages in thread
From: Alexey Budankov @ 2019-02-11 13:59 UTC (permalink / raw)
  To: Jiri Olsa
  Cc: Arnaldo Carvalho de Melo, Ingo Molnar, Peter Zijlstra, Jiri Olsa,
	Namhyung Kim, Alexander Shishkin, Andi Kleen, linux-kernel


On 11.02.2019 16:52, Jiri Olsa wrote:
> On Mon, Jan 28, 2019 at 10:02:49AM +0300, Alexey Budankov wrote:
>>
<SNIP>
>> The patch set is for Arnaldo's perf/core repository.
> 
> hi,
> got here late and can't apply this anymore.. which revision
> (commit id) is this based on? perf/core is moving real fast
> these days ;-)
> 
> based on errors below I suspect u need to rebase anyway :-\

Yes, I will resend shortly.

Thanks,
Alexey

> 
> sry for late response
> 
> jirka
> 

^ permalink raw reply	[flat|nested] 67+ messages in thread

* Re: [PATCH v2 0/4] perf: enable compression of record mode trace to save storage space
  2019-01-28  7:02 Alexey Budankov
@ 2019-02-11 13:52 ` Jiri Olsa
  2019-02-11 13:59   ` Alexey Budankov
  0 siblings, 1 reply; 67+ messages in thread
From: Jiri Olsa @ 2019-02-11 13:52 UTC (permalink / raw)
  To: Alexey Budankov
  Cc: Arnaldo Carvalho de Melo, Ingo Molnar, Peter Zijlstra, Jiri Olsa,
	Namhyung Kim, Alexander Shishkin, Andi Kleen, linux-kernel

On Mon, Jan 28, 2019 at 10:02:49AM +0300, Alexey Budankov wrote:
> 
> The patch set implements runtime trace compression for record mode and 
> trace file decompression for report mode. Zstandard API [1] is used for 
> compression/decompression of data that come from perf_events kernel 
> data buffers.
> 
> Realized -z,--compression_level=n option provides ~3-5x avg. trace file 
> size reduction on variety of tested workloads what saves user storage 
> space on larger server systems where trace file size can easily reach 
> several tens or even hundreds of GiBs, especially when profiling with 
> stacks for later dwarf unwinding and context-switches tracing and etc.
> 
>   $ tools/perf/perf record -z 1 -e cycles -- matrix.gcc
> 
> --mmap-flush option can be used to avoid compressing every single byte 
> of data and increase compression ratio at the same time lowering tool 
> runtime overhead.
> 
> The compression functionality can be disabled from the command line 
> using NO_LIBZSTD define and Zstandard sources can be overridden using 
> value of LIBZSTD_DIR define:
> 
>   $ make -C tools/perf NO_LIBZSTD=1 clean all
>   $ make -C tools/perf LIBZSTD_DIR=/path/to/zstd-1.3.7 clean all
> 
> The patch set is for Arnaldo's perf/core repository.

hi,
got here late and can't apply this anymore.. which revision
(commit id) is this based on? perf/core is moving real fast
these days ;-)

based on errors below I suspect u need to rebase anyway :-\

sry for late response

jirka


> 
> ---
> Alexey Budankov (4):
>   feature: realize libzstd check, LIBZSTD_DIR and NO_LIBZSTD defines
>   perf record: implement -z=<level> and --mmap-flush=<thres> options

got this one failing when applied:

patching file tools/perf/Documentation/perf-record.txt
Hunk #1 succeeded at 459 with fuzz 2 (offset 19 lines).
patching file tools/perf/builtin-record.c
Hunk #1 succeeded at 292 (offset 5 lines).
Hunk #2 succeeded at 336 (offset 5 lines).
Hunk #3 FAILED at 565.
Hunk #4 succeeded at 765 with fuzz 1 (offset 10 lines).
Hunk #5 FAILED at 778.
Hunk #6 succeeded at 806 (offset 11 lines).
Hunk #7 succeeded at 839 (offset 11 lines).
Hunk #8 succeeded at 873 (offset 11 lines).
Hunk #9 succeeded at 1185 (offset 11 lines).
Hunk #10 succeeded at 1218 (offset 11 lines).
Hunk #11 succeeded at 1354 (offset 11 lines).
Hunk #12 succeeded at 1455 (offset 11 lines).
Hunk #13 succeeded at 1866 (offset 26 lines).
Hunk #14 FAILED at 2006.
Hunk #15 succeeded at 2230 with fuzz 2 (offset 32 lines).
3 out of 15 hunks FAILED -- saving rejects to file tools/perf/builtin-record.c.rej
patching file tools/perf/perf.h
Hunk #1 FAILED at 84.
1 out of 1 hunk FAILED -- saving rejects to file tools/perf/perf.h.rej
patching file tools/perf/util/env.h
patching file tools/perf/util/event.c
Hunk #1 succeeded at 68 (offset 2 lines).
patching file tools/perf/util/event.h
Hunk #2 succeeded at 627 (offset 1 line).
Hunk #3 succeeded at 665 (offset 1 line).
patching file tools/perf/util/evlist.c
Hunk #1 FAILED at 1022.
Hunk #2 FAILED at 1032.
Hunk #3 FAILED at 1064.
3 out of 3 hunks FAILED -- saving rejects to file tools/perf/util/evlist.c.rej
patching file tools/perf/util/evlist.h
Hunk #1 FAILED at 165.
1 out of 1 hunk FAILED -- saving rejects to file tools/perf/util/evlist.h.rej
patching file tools/perf/util/header.c
patching file tools/perf/util/header.h
patching file tools/perf/util/mmap.c
Hunk #1 FAILED at 154.
Hunk #2 succeeded at 282 with fuzz 1 (offset 69 lines).
Hunk #3 succeeded at 359 (offset 69 lines).
Hunk #4 succeeded at 434 (offset 97 lines).
Hunk #5 succeeded at 484 (offset 97 lines).
1 out of 5 hunks FAILED -- saving rejects to file tools/perf/util/mmap.c.rej
patching file tools/perf/util/mmap.h
Hunk #1 FAILED at 30.
Hunk #2 FAILED at 69.
2 out of 2 hunks FAILED -- saving rejects to file tools/perf/util/mmap.h.rej
patching file tools/perf/util/session.h

^ permalink raw reply	[flat|nested] 67+ messages in thread

* Re: [PATCH v2 0/4] perf: enable compression of record mode trace to save storage space
  2019-01-29 10:45 ` Arnaldo Carvalho de Melo
  2019-01-29 10:53   ` Arnaldo Carvalho de Melo
@ 2019-01-29 17:25   ` Andi Kleen
  1 sibling, 0 replies; 67+ messages in thread
From: Andi Kleen @ 2019-01-29 17:25 UTC (permalink / raw)
  To: Arnaldo Carvalho de Melo
  Cc: Alexey Budankov, Ingo Molnar, Peter Zijlstra, Jiri Olsa,
	Namhyung Kim, Alexander Shishkin, linux-kernel

On Tue, Jan 29, 2019 at 11:45:43AM +0100, Arnaldo Carvalho de Melo wrote:
> Em Mon, Jan 28, 2019 at 09:40:28AM +0300, Alexey Budankov escreveu:
> > The patch set implements runtime trace compression for record mode and 
> > trace file decompression for report mode. Zstandard API [1] is used for 
> > compression/decompression of data that come from perf_events kernel 
> 
> Interesting, wasn't aware of this zstd library, I wonder if we can add
> it and switch the other compression libraries we link against, so that
> we're not adding one more library to the dep list of perf but removing
> some instead, do you think this would be possible?

Sadly we can't because perf needs the existing libraries to uncompress
modules, and they might be compressed with zlib or bzip2. zstd cannot
uncompress those formats.

-Andi

^ permalink raw reply	[flat|nested] 67+ messages in thread

* Re: [PATCH v2 0/4] perf: enable compression of record mode trace to save storage space
  2019-01-29 12:13       ` Arnaldo Carvalho de Melo
@ 2019-01-29 16:39         ` Alexey Budankov
  0 siblings, 0 replies; 67+ messages in thread
From: Alexey Budankov @ 2019-01-29 16:39 UTC (permalink / raw)
  To: Arnaldo Carvalho de Melo
  Cc: Ingo Molnar, Peter Zijlstra, Jiri Olsa, Namhyung Kim,
	Alexander Shishkin, Andi Kleen, linux-kernel

On 29.01.2019 15:13, Arnaldo Carvalho de Melo wrote:
> Em Tue, Jan 29, 2019 at 02:39:00PM +0300, Alexey Budankov escreveu:
>> Hi,
>> On 29.01.2019 13:53, Arnaldo Carvalho de Melo wrote:
>>> Em Tue, Jan 29, 2019 at 11:45:43AM +0100, Arnaldo Carvalho de Melo escreveu:
>>>> Em Mon, Jan 28, 2019 at 09:40:28AM +0300, Alexey Budankov escreveu:
>>>>> The patch set implements runtime trace compression for record mode and 
>>>>> trace file decompression for report mode. Zstandard API [1] is used for 
>>>>> compression/decompression of data that come from perf_events kernel 
>>>>
>>>> Interesting, wasn't aware of this zstd library, I wonder if we can add
>>>> it and switch the other compression libraries we link against, so that
>>>> we're not adding one more library to the dep list of perf but removing
>>>> some instead, do you think this would be possible?
>>
>> Replacing of incorporated compression APIs was not evaluated or tested in 
>> the scope of this patch set work. However according to their numbers in the 
>> docs and the numbers that we have got during testing Zstd API outperforms 
>> the exiting compression libraries as in terms of speed as in terms of 
>> compression ratio (at least libz). Backward compatibility needs to be taken
>> into account so that old perf files would open by newer perf tool versions.
> 
> Right, I'm not talking in the scope of this patch, its just that while
> looking at it, I notice that we're adding yet another compression
> library and its description seemed to imply it would support the other
> compression formats, which I've learned its not the case, so nevermind.
> 
> I'm not talking about using just zstd, as what we mostly do with the
> compression libraries is to decompress, not compress, for instance, we
> need to uncompress kernel modules to get to its symbols, do annotation
> with it, etc.

Well, yes, having single compression/decompression API implementation that 
would handle different compression formats and scenarios sounds reasonable 
from support cost perspective.

It looks like Zstandard library could provide such capabilities. 
The library is well supported at the moment so fixes and extensions are 
released timely enough.

-Alexey

> 
> - Arnaldo
>  
>> -Alexey
>>
>>>>
>>>>   $ ldd ~/bin/perf | wc -l
>>>>   30
>>>>   $ ldd ~/bin/perf | grep z
>>>> 	liblzma.so.5 => /lib64/liblzma.so.5 (0x00007f3dcc356000)
>>>> 	libz.so.1 => /lib64/libz.so.1 (0x00007f3dcb2aa000)
>>>> 	libbz2.so.1 => /lib64/libbz2.so.1 (0x00007f3dcb218000)
>>>>   $
>>>>
>>>> Humm, from the github page it says:
>>>>
>>>> -----
>>>> The project is provided as an open-source dual BSD and GPLv2 licensed C
>>>> library, and a command line utility producing and decoding .zst, .gz,
>>>> .xz and .lz4 files. Should your project require another programming
>>>> language, a list of known ports and bindings is provided on Zstandard
>>>> homepage.
>>>> -----
>>>>
>>>> So it would cover just liblzma and libz, right?
>>>
>>> Nevermind;
>>>
>>> [acme@quaco perf]$ zstdcat ~/git/perf/perf-5.0.0-rc2.tar.xz
>>> zstd: /home/acme/git/perf/perf-5.0.0-rc2.tar.xz: xz/lzma file cannot be uncompressed (zstd compiled without HAVE_LZMA) -- ignored
>>>
>>> So it handles those formats, _if_ linked with those libraries, duh.
>>>
>>> - Arnaldo
>>>
> 

^ permalink raw reply	[flat|nested] 67+ messages in thread

* Re: [PATCH v2 0/4] perf: enable compression of record mode trace to save storage space
  2019-01-29 11:39     ` Alexey Budankov
@ 2019-01-29 12:13       ` Arnaldo Carvalho de Melo
  2019-01-29 16:39         ` Alexey Budankov
  0 siblings, 1 reply; 67+ messages in thread
From: Arnaldo Carvalho de Melo @ 2019-01-29 12:13 UTC (permalink / raw)
  To: Alexey Budankov
  Cc: Ingo Molnar, Peter Zijlstra, Jiri Olsa, Namhyung Kim,
	Alexander Shishkin, Andi Kleen, linux-kernel

Em Tue, Jan 29, 2019 at 02:39:00PM +0300, Alexey Budankov escreveu:
> Hi,
> On 29.01.2019 13:53, Arnaldo Carvalho de Melo wrote:
> > Em Tue, Jan 29, 2019 at 11:45:43AM +0100, Arnaldo Carvalho de Melo escreveu:
> >> Em Mon, Jan 28, 2019 at 09:40:28AM +0300, Alexey Budankov escreveu:
> >>> The patch set implements runtime trace compression for record mode and 
> >>> trace file decompression for report mode. Zstandard API [1] is used for 
> >>> compression/decompression of data that come from perf_events kernel 
> >>
> >> Interesting, wasn't aware of this zstd library, I wonder if we can add
> >> it and switch the other compression libraries we link against, so that
> >> we're not adding one more library to the dep list of perf but removing
> >> some instead, do you think this would be possible?
> 
> Replacing of incorporated compression APIs was not evaluated or tested in 
> the scope of this patch set work. However according to their numbers in the 
> docs and the numbers that we have got during testing Zstd API outperforms 
> the exiting compression libraries as in terms of speed as in terms of 
> compression ratio (at least libz). Backward compatibility needs to be taken
> into account so that old perf files would open by newer perf tool versions.

Right, I'm not talking in the scope of this patch, its just that while
looking at it, I notice that we're adding yet another compression
library and its description seemed to imply it would support the other
compression formats, which I've learned its not the case, so nevermind.

I'm not talking about using just zstd, as what we mostly do with the
compression libraries is to decompress, not compress, for instance, we
need to uncompress kernel modules to get to its symbols, do annotation
with it, etc.

- Arnaldo
 
> -Alexey
> 
> >>
> >>   $ ldd ~/bin/perf | wc -l
> >>   30
> >>   $ ldd ~/bin/perf | grep z
> >> 	liblzma.so.5 => /lib64/liblzma.so.5 (0x00007f3dcc356000)
> >> 	libz.so.1 => /lib64/libz.so.1 (0x00007f3dcb2aa000)
> >> 	libbz2.so.1 => /lib64/libbz2.so.1 (0x00007f3dcb218000)
> >>   $
> >>
> >> Humm, from the github page it says:
> >>
> >> -----
> >> The project is provided as an open-source dual BSD and GPLv2 licensed C
> >> library, and a command line utility producing and decoding .zst, .gz,
> >> .xz and .lz4 files. Should your project require another programming
> >> language, a list of known ports and bindings is provided on Zstandard
> >> homepage.
> >> -----
> >>
> >> So it would cover just liblzma and libz, right?
> > 
> > Nevermind;
> > 
> > [acme@quaco perf]$ zstdcat ~/git/perf/perf-5.0.0-rc2.tar.xz
> > zstd: /home/acme/git/perf/perf-5.0.0-rc2.tar.xz: xz/lzma file cannot be uncompressed (zstd compiled without HAVE_LZMA) -- ignored
> > 
> > So it handles those formats, _if_ linked with those libraries, duh.
> > 
> > - Arnaldo
> > 

-- 

- Arnaldo

^ permalink raw reply	[flat|nested] 67+ messages in thread

* Re: [PATCH v2 0/4] perf: enable compression of record mode trace to save storage space
  2019-01-29 10:53   ` Arnaldo Carvalho de Melo
@ 2019-01-29 11:39     ` Alexey Budankov
  2019-01-29 12:13       ` Arnaldo Carvalho de Melo
  0 siblings, 1 reply; 67+ messages in thread
From: Alexey Budankov @ 2019-01-29 11:39 UTC (permalink / raw)
  To: Arnaldo Carvalho de Melo
  Cc: Ingo Molnar, Peter Zijlstra, Jiri Olsa, Namhyung Kim,
	Alexander Shishkin, Andi Kleen, linux-kernel

Hi,
On 29.01.2019 13:53, Arnaldo Carvalho de Melo wrote:
> Em Tue, Jan 29, 2019 at 11:45:43AM +0100, Arnaldo Carvalho de Melo escreveu:
>> Em Mon, Jan 28, 2019 at 09:40:28AM +0300, Alexey Budankov escreveu:
>>> The patch set implements runtime trace compression for record mode and 
>>> trace file decompression for report mode. Zstandard API [1] is used for 
>>> compression/decompression of data that come from perf_events kernel 
>>
>> Interesting, wasn't aware of this zstd library, I wonder if we can add
>> it and switch the other compression libraries we link against, so that
>> we're not adding one more library to the dep list of perf but removing
>> some instead, do you think this would be possible?

Replacing of incorporated compression APIs was not evaluated or tested in 
the scope of this patch set work. However according to their numbers in the 
docs and the numbers that we have got during testing Zstd API outperforms 
the exiting compression libraries as in terms of speed as in terms of 
compression ratio (at least libz). Backward compatibility needs to be taken
into account so that old perf files would open by newer perf tool versions.

-Alexey

>>
>>   $ ldd ~/bin/perf | wc -l
>>   30
>>   $ ldd ~/bin/perf | grep z
>> 	liblzma.so.5 => /lib64/liblzma.so.5 (0x00007f3dcc356000)
>> 	libz.so.1 => /lib64/libz.so.1 (0x00007f3dcb2aa000)
>> 	libbz2.so.1 => /lib64/libbz2.so.1 (0x00007f3dcb218000)
>>   $
>>
>> Humm, from the github page it says:
>>
>> -----
>> The project is provided as an open-source dual BSD and GPLv2 licensed C
>> library, and a command line utility producing and decoding .zst, .gz,
>> .xz and .lz4 files. Should your project require another programming
>> language, a list of known ports and bindings is provided on Zstandard
>> homepage.
>> -----
>>
>> So it would cover just liblzma and libz, right?
> 
> Nevermind;
> 
> [acme@quaco perf]$ zstdcat ~/git/perf/perf-5.0.0-rc2.tar.xz
> zstd: /home/acme/git/perf/perf-5.0.0-rc2.tar.xz: xz/lzma file cannot be uncompressed (zstd compiled without HAVE_LZMA) -- ignored
> 
> So it handles those formats, _if_ linked with those libraries, duh.
> 
> - Arnaldo
> 

^ permalink raw reply	[flat|nested] 67+ messages in thread

* Re: [PATCH v2 0/4] perf: enable compression of record mode trace to save storage space
  2019-01-29 10:45 ` Arnaldo Carvalho de Melo
@ 2019-01-29 10:53   ` Arnaldo Carvalho de Melo
  2019-01-29 11:39     ` Alexey Budankov
  2019-01-29 17:25   ` Andi Kleen
  1 sibling, 1 reply; 67+ messages in thread
From: Arnaldo Carvalho de Melo @ 2019-01-29 10:53 UTC (permalink / raw)
  To: Alexey Budankov
  Cc: Ingo Molnar, Peter Zijlstra, Jiri Olsa, Namhyung Kim,
	Alexander Shishkin, Andi Kleen, linux-kernel

Em Tue, Jan 29, 2019 at 11:45:43AM +0100, Arnaldo Carvalho de Melo escreveu:
> Em Mon, Jan 28, 2019 at 09:40:28AM +0300, Alexey Budankov escreveu:
> > The patch set implements runtime trace compression for record mode and 
> > trace file decompression for report mode. Zstandard API [1] is used for 
> > compression/decompression of data that come from perf_events kernel 
> 
> Interesting, wasn't aware of this zstd library, I wonder if we can add
> it and switch the other compression libraries we link against, so that
> we're not adding one more library to the dep list of perf but removing
> some instead, do you think this would be possible?
> 
>   $ ldd ~/bin/perf | wc -l
>   30
>   $ ldd ~/bin/perf | grep z
> 	liblzma.so.5 => /lib64/liblzma.so.5 (0x00007f3dcc356000)
> 	libz.so.1 => /lib64/libz.so.1 (0x00007f3dcb2aa000)
> 	libbz2.so.1 => /lib64/libbz2.so.1 (0x00007f3dcb218000)
>   $
> 
> Humm, from the github page it says:
> 
> -----
> The project is provided as an open-source dual BSD and GPLv2 licensed C
> library, and a command line utility producing and decoding .zst, .gz,
> .xz and .lz4 files. Should your project require another programming
> language, a list of known ports and bindings is provided on Zstandard
> homepage.
> -----
> 
> So it would cover just liblzma and libz, right?

Nevermind;

[acme@quaco perf]$ zstdcat ~/git/perf/perf-5.0.0-rc2.tar.xz
zstd: /home/acme/git/perf/perf-5.0.0-rc2.tar.xz: xz/lzma file cannot be uncompressed (zstd compiled without HAVE_LZMA) -- ignored

So it handles those formats, _if_ linked with those libraries, duh.

- Arnaldo

^ permalink raw reply	[flat|nested] 67+ messages in thread

* Re: [PATCH v2 0/4] perf: enable compression of record mode trace to save storage space
       [not found] <b834df0e-26b5-3f8c-7a43-18f675fb7434@linux.intel.com>
@ 2019-01-29 10:45 ` Arnaldo Carvalho de Melo
  2019-01-29 10:53   ` Arnaldo Carvalho de Melo
  2019-01-29 17:25   ` Andi Kleen
  0 siblings, 2 replies; 67+ messages in thread
From: Arnaldo Carvalho de Melo @ 2019-01-29 10:45 UTC (permalink / raw)
  To: Alexey Budankov
  Cc: Ingo Molnar, Peter Zijlstra, Jiri Olsa, Namhyung Kim,
	Alexander Shishkin, Andi Kleen, linux-kernel

Em Mon, Jan 28, 2019 at 09:40:28AM +0300, Alexey Budankov escreveu:
> The patch set implements runtime trace compression for record mode and 
> trace file decompression for report mode. Zstandard API [1] is used for 
> compression/decompression of data that come from perf_events kernel 

Interesting, wasn't aware of this zstd library, I wonder if we can add
it and switch the other compression libraries we link against, so that
we're not adding one more library to the dep list of perf but removing
some instead, do you think this would be possible?

  $ ldd ~/bin/perf | wc -l
  30
  $ ldd ~/bin/perf | grep z
	liblzma.so.5 => /lib64/liblzma.so.5 (0x00007f3dcc356000)
	libz.so.1 => /lib64/libz.so.1 (0x00007f3dcb2aa000)
	libbz2.so.1 => /lib64/libbz2.so.1 (0x00007f3dcb218000)
  $

Humm, from the github page it says:

-----
The project is provided as an open-source dual BSD and GPLv2 licensed C
library, and a command line utility producing and decoding .zst, .gz,
.xz and .lz4 files. Should your project require another programming
language, a list of known ports and bindings is provided on Zstandard
homepage.
-----

So it would cover just liblzma and libz, right?

- Arnaldo

^ permalink raw reply	[flat|nested] 67+ messages in thread

* [PATCH v2 0/4] perf: enable compression of record mode trace to save storage space
@ 2019-01-28  7:02 Alexey Budankov
  2019-02-11 13:52 ` Jiri Olsa
  0 siblings, 1 reply; 67+ messages in thread
From: Alexey Budankov @ 2019-01-28  7:02 UTC (permalink / raw)
  To: Arnaldo Carvalho de Melo, Ingo Molnar, Peter Zijlstra
  Cc: Jiri Olsa, Namhyung Kim, Alexander Shishkin, Andi Kleen, linux-kernel


The patch set implements runtime trace compression for record mode and 
trace file decompression for report mode. Zstandard API [1] is used for 
compression/decompression of data that come from perf_events kernel 
data buffers.

Realized -z,--compression_level=n option provides ~3-5x avg. trace file 
size reduction on variety of tested workloads what saves user storage 
space on larger server systems where trace file size can easily reach 
several tens or even hundreds of GiBs, especially when profiling with 
stacks for later dwarf unwinding and context-switches tracing and etc.

  $ tools/perf/perf record -z 1 -e cycles -- matrix.gcc

--mmap-flush option can be used to avoid compressing every single byte 
of data and increase compression ratio at the same time lowering tool 
runtime overhead.

The compression functionality can be disabled from the command line 
using NO_LIBZSTD define and Zstandard sources can be overridden using 
value of LIBZSTD_DIR define:

  $ make -C tools/perf NO_LIBZSTD=1 clean all
  $ make -C tools/perf LIBZSTD_DIR=/path/to/zstd-1.3.7 clean all

The patch set is for Arnaldo's perf/core repository.

---
Alexey Budankov (4):
  feature: realize libzstd check, LIBZSTD_DIR and NO_LIBZSTD defines
  perf record: implement -z=<level> and --mmap-flush=<thres> options
  perf record: enable runtime trace compression
  perf report: support record trace file decompression

 tools/build/Makefile.feature             |   6 +-
 tools/build/feature/Makefile             |   6 +-
 tools/build/feature/test-all.c           |   5 +
 tools/build/feature/test-libzstd.c       |  12 +
 tools/perf/Documentation/perf-record.txt |   9 +
 tools/perf/Makefile.config               |  20 ++
 tools/perf/Makefile.perf                 |   3 +
 tools/perf/builtin-record.c              | 167 +++++++++++---
 tools/perf/builtin-report.c              |   5 +-
 tools/perf/perf.h                        |   2 +
 tools/perf/util/env.h                    |  10 +
 tools/perf/util/event.c                  |   1 +
 tools/perf/util/event.h                  |   7 +
 tools/perf/util/evlist.c                 |   6 +-
 tools/perf/util/evlist.h                 |   2 +-
 tools/perf/util/header.c                 |  45 +++-
 tools/perf/util/header.h                 |   1 +
 tools/perf/util/mmap.c                   | 173 ++++++++++-----
 tools/perf/util/mmap.h                   |  31 ++-
 tools/perf/util/session.c                | 271 ++++++++++++++++++++++-
 tools/perf/util/session.h                |  26 +++
 tools/perf/util/tool.h                   |   2 +
 22 files changed, 695 insertions(+), 115 deletions(-)
 create mode 100644 tools/build/feature/test-libzstd.c

---
Changes in v2:
- moved compression/decompression code to session layer
- enabled allocation aio data buffers for compression
- enabled trace compression for serial trace streaming

---
[1] https://github.com/facebook/zstd

---
Examples:

  $ make -C tools/perf NO_LIBZSTD=1 clean all
  $ make -C tools/perf LIBZSTD_DIR=/path/to/zstd-1.3.7 clean all

  $ tools/perf/perf record -z 1 -e cycles -- matrix.gcc
  Addr of buf1 = 0x7fc266d52010
  Offs of buf1 = 0x7fc266d52180
  Addr of buf2 = 0x7fc264d51010
  Offs of buf2 = 0x7fc264d511c0
  Addr of buf3 = 0x7fc262d50010
  Offs of buf3 = 0x7fc262d50100
  Addr of buf4 = 0x7fc260d4f010
  Offs of buf4 = 0x7fc260d4f140
  Threads #: 8 Pthreads
  Matrix size: 2048
  Using multiply kernel: multiply1
  Execution time = 31.471 seconds
  [ perf record: Woken up 120 times to write data ]
  [ perf record: Compressed 38.118 MB to 7.084 MB, ratio is 5.381 ]
  [ perf record: Captured and wrote 7.100 MB perf.data (999192 samples) ]

  $ tools/perf/perf report -D --header
  # ========
  # captured on    : Sat Jan 26 11:49:55 2019
  # header version : 1
  # data offset    : 296
  # data size      : 7444119
  # feat offset    : 7444415
  # hostname : nntvtune39
  # os release : 4.19.15-300.fc29.x86_64
  # perf version : 4.13.rc5.g3cfa299
  # arch : x86_64
  # nrcpus online : 8
  # nrcpus avail : 8
  # cpudesc : Intel(R) Core(TM) i7-6700K CPU @ 4.00GHz
  # cpuid : GenuineIntel,6,94,3
  # total memory : 16153184 kB
  # cmdline : /root/abudanko/kernel/acme/tools/perf/perf record -z 1 -e cycles -- ../../matrix/linux/matrix.gcc 
  # event : name = cycles, , id = { 2171, 2172, 2173, 2174, 2175, 2176, 2177, 2178 }, size = 112, { sample_period, sample_freq } = 4000, sample_type = IP|TID|TIME|PERIOD, read_format =>
  # CPU_TOPOLOGY info available, use -I to display
  # NUMA_TOPOLOGY info available, use -I to display
  # pmu mappings: intel_pt = 8, software = 1, power = 11, uprobe = 7, uncore_imc = 12, cpu = 4, cstate_core = 18, uncore_cbox_2 = 15, breakpoint = 5, uncore_cbox_0 = 13, tracepoint = 2>
  # CACHE info available, use -I to display
  # time of first sample : 230574.239204
  # time of last sample : 230605.735403
  # sample duration :  31496.200 ms
  # MEM_TOPOLOGY info available, use -I to display
  # compressed : Zstd, level = 1, ratio = 5
  # missing features: TRACING_DATA BRANCH_STACK GROUP_DESC AUXTRACE STAT CLOCKID 
  # ========
  #

  0x128 [0x20]: event: 79
  .
  . ... raw event: size 32 bytes
  .  0000:  4f 00 00 00 00 00 20 00 1f 00 00 00 00 00 00 00  O..... .........
  .  0010:  11 a6 ef 1f 00 00 00 00 e7 16 81 83 f5 ff ff ff  ................

  0 0x128 [0x20]: PERF_RECORD_TIME_CONV: unhandled!

  0x148 [0x50]: event: 1
  .
  . ... raw event: size 80 bytes
  .  0000:  01 00 00 00 01 00 50 00 ff ff ff ff 00 00 00 00  ......P.........
  .  0010:  00 00 00 89 ff ff ff ff 00 10 31 37 00 00 00 00  ..........17....
  .  0020:  00 00 00 89 ff ff ff ff 5b 6b 65 72 6e 65 6c 2e  ........[kernel.
  .  0030:  6b 61 6c 6c 73 79 6d 73 5d 5f 74 65 78 74 00 00  kallsyms]_text..
  .  0040:  00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00  ................

  0 0x148 [0x50]: PERF_RECORD_MMAP -1/0: [0xffffffff89000000(0x37311000) @ 0xffffffff89000000]: x [kernel.kallsyms]_text

  ...

  0x6375d [0x8]: event: 68
  .
  . ... raw event: size 8 bytes
  .  0000:  44 00 00 00 00 00 08 00                          D.......        

  0 0x6375d [0x8]: PERF_RECORD_FINISHED_ROUND

  0 [0x28]: event: 9
  .
  . ... raw event: size 40 bytes
  .  0000:  09 00 00 00 01 00 28 00 76 78 06 89 ff ff ff ff  ......(.vx......
  .  0010:  d4 1d 00 00 d4 1d 00 00 26 43 9f bf b4 d1 00 00  ........&C......
  .  0020:  01 00 00 00 00 00 00 00                          ........        

  230574239204134 0 [0x28]: PERF_RECORD_SAMPLE(IP, 0x1): 7636/7636: 0xffffffff89067876 period: 1 addr: 0
   ... thread: perf:7636
   ...... dso: /proc/kcore

  0 [0x30]: event: 3
  .
  . ... raw event: size 48 bytes
  .  0000:  03 00 00 00 00 20 30 00 d4 1d 00 00 d4 1d 00 00  ..... 0.........
  .  0010:  6d 61 74 72 69 78 2e 67 63 63 00 00 00 00 00 00  matrix.gcc......
  .  0020:  d4 1d 00 00 d4 1d 00 00 34 4a 9f bf b4 d1 00 00  ........4J......

  230574239205940 0 [0x30]: PERF_RECORD_COMM exec: matrix.gcc:7636/7636

  0 [0x28]: event: 9
  .
  . ... raw event: size 40 bytes
  .  0000:  09 00 00 00 01 00 28 00 76 78 06 89 ff ff ff ff  ......(.vx......
  .  0010:  d4 1d 00 00 d4 1d 00 00 1f af 9f bf b4 d1 00 00  ................
  .  0020:  3f 0c 00 00 00 00 00 00                          ?.......        

  230574239231775 0 [0x28]: PERF_RECORD_SAMPLE(IP, 0x1): 7636/7636: 0xffffffff89067876 period: 3135 addr: 0
   ... thread: matrix.gcc:7636
   ...... dso: /proc/kcore


  Aggregated stats:
             TOTAL events:    1001434
              MMAP events:        100
              LOST events:          0
              COMM events:          2
              EXIT events:          9
          THROTTLE events:          0
        UNTHROTTLE events:          0
              FORK events:          8
              READ events:          0
            SAMPLE events:     999192
             MMAP2 events:          7
               AUX events:          0
      ITRACE_START events:          0
      LOST_SAMPLES events:          0
            SWITCH events:          0
   SWITCH_CPU_WIDE events:          0
        NAMESPACES events:          0
           KSYMBOL events:          0
         BPF_EVENT events:          0
              ATTR events:          0
        EVENT_TYPE events:          0
      TRACING_DATA events:          0
          BUILD_ID events:          0
    FINISHED_ROUND events:        319
          ID_INDEX events:          0
     AUXTRACE_INFO events:          0
          AUXTRACE events:          0
    AUXTRACE_ERROR events:          0
        THREAD_MAP events:          1
           CPU_MAP events:          1
       STAT_CONFIG events:          0
              STAT events:          0
        STAT_ROUND events:          0
      EVENT_UPDATE events:          0
         TIME_CONV events:          1
           FEATURE events:          0
        COMPRESSED events:       1794
  cycles stats:
             TOTAL events:     999192
              MMAP events:          0
              LOST events:          0
              COMM events:          0
              EXIT events:          0
          THROTTLE events:          0
        UNTHROTTLE events:          0
              FORK events:          0
              READ events:          0
            SAMPLE events:     999192
             MMAP2 events:          0
               AUX events:          0
      ITRACE_START events:          0
      LOST_SAMPLES events:          0
            SWITCH events:          0
   SWITCH_CPU_WIDE events:          0
        NAMESPACES events:          0
           KSYMBOL events:          0
         BPF_EVENT events:          0
              ATTR events:          0
        EVENT_TYPE events:          0
      TRACING_DATA events:          0
          BUILD_ID events:          0
    FINISHED_ROUND events:          0
          ID_INDEX events:          0
     AUXTRACE_INFO events:          0
          AUXTRACE events:          0
    AUXTRACE_ERROR events:          0
        THREAD_MAP events:          0
           CPU_MAP events:          0
       STAT_CONFIG events:          0
              STAT events:          0
        STAT_ROUND events:          0
      EVENT_UPDATE events:          0
         TIME_CONV events:          0
           FEATURE events:          0
        COMPRESSED events:          0
---

^ permalink raw reply	[flat|nested] 67+ messages in thread

end of thread, other threads:[~2019-02-27 11:17 UTC | newest]

Thread overview: 67+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2019-02-11 20:17 [PATCH v2 0/4] perf: enable compression of record mode trace to save storage space Alexey Budankov
2019-02-11 20:21 ` [PATCH v2 1/4] feature: realize libzstd check, LIBZSTD_DIR and NO_LIBZSTD defines Alexey Budankov
2019-02-11 20:22 ` [PATCH v2 2/4] perf record: implement -z=<level> and --mmap-flush=<thres> options Alexey Budankov
2019-02-12 13:08   ` Jiri Olsa
2019-02-20 14:13     ` Alexey Budankov
2019-02-12 13:08   ` Jiri Olsa
2019-02-20 14:13     ` Alexey Budankov
2019-02-12 13:08   ` Jiri Olsa
2019-02-20 15:24     ` Alexey Budankov
2019-02-21  9:49       ` Jiri Olsa
2019-02-21 11:24         ` Alexey Budankov
2019-02-25 15:27           ` Alexey Budankov
2019-02-12 13:08   ` Jiri Olsa
2019-02-20 14:15     ` Alexey Budankov
2019-02-21  9:47       ` Jiri Olsa
2019-02-21 11:23         ` Alexey Budankov
2019-02-25 15:26           ` Alexey Budankov
2019-02-12 13:08   ` Jiri Olsa
2019-02-20 14:13     ` Alexey Budankov
2019-02-12 13:09   ` Jiri Olsa
2019-02-20 15:19     ` Alexey Budankov
2019-02-12 13:09   ` Jiri Olsa
2019-02-20 14:25     ` Alexey Budankov
2019-02-12 13:09   ` Jiri Olsa
2019-02-20 14:14     ` Alexey Budankov
2019-02-25 15:30       ` Alexey Budankov
2019-02-11 20:23 ` [PATCH v2 3/4] perf record: enable runtime trace compression Alexey Budankov
2019-02-12 13:08   ` Jiri Olsa
2019-02-20 15:13     ` Alexey Budankov
2019-02-21  9:43       ` Jiri Olsa
2019-02-21 11:30         ` Alexey Budankov
2019-02-12 13:08   ` Jiri Olsa
2019-02-20 14:53     ` Alexey Budankov
2019-02-21  9:43       ` Jiri Olsa
2019-02-21 11:18         ` Alexey Budankov
2019-02-12 13:09   ` Jiri Olsa
2019-02-20 15:09     ` Alexey Budankov
2019-02-25 15:27       ` Alexey Budankov
2019-02-12 13:09   ` Jiri Olsa
2019-02-20 15:11     ` Alexey Budankov
2019-02-12 13:09   ` Jiri Olsa
2019-02-20 15:03     ` Alexey Budankov
2019-02-12 13:09   ` Jiri Olsa
2019-02-20 15:06     ` Alexey Budankov
2019-02-11 20:25 ` [PATCH v2 4/4] perf report: support record trace file decompression Alexey Budankov
2019-02-12 13:08   ` Jiri Olsa
2019-02-20 15:19     ` Alexey Budankov
2019-02-25 15:28       ` Alexey Budankov
2019-02-12 13:09   ` Jiri Olsa
2019-02-20 14:48     ` Alexey Budankov
     [not found]       ` <0132ec08-e28b-4102-5053-8f8e21e7fd44@linux.intel.com>
2019-02-27 10:56         ` Alexey Budankov
2019-02-27 11:17           ` Jiri Olsa
2019-02-12 13:09   ` Jiri Olsa
2019-02-20 14:46     ` Alexey Budankov
2019-02-12 13:09   ` Jiri Olsa
2019-02-20 14:44     ` Alexey Budankov
2019-02-12 12:27 ` [PATCH v2 0/4] perf: enable compression of record mode trace to save storage space Arnaldo Carvalho de Melo
2019-02-12 14:06   ` Alexey Budankov
     [not found] <b834df0e-26b5-3f8c-7a43-18f675fb7434@linux.intel.com>
2019-01-29 10:45 ` Arnaldo Carvalho de Melo
2019-01-29 10:53   ` Arnaldo Carvalho de Melo
2019-01-29 11:39     ` Alexey Budankov
2019-01-29 12:13       ` Arnaldo Carvalho de Melo
2019-01-29 16:39         ` Alexey Budankov
2019-01-29 17:25   ` Andi Kleen
  -- strict thread matches above, loose matches on Subject: below --
2019-01-28  7:02 Alexey Budankov
2019-02-11 13:52 ` Jiri Olsa
2019-02-11 13:59   ` Alexey Budankov

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).