* [PATCH v1 1/4] feature: build libzstd feature check, LIBZSTD_DIR and NO_LIBZSTD defines
2018-12-24 13:21 [PATCH v1 0/4] perf: enable compression of record mode trace to save storage space Alexey Budankov
@ 2018-12-24 13:35 ` Alexey Budankov
2018-12-24 13:45 ` [PATCH v1 2/4] perf record: introduce z, mmap-flush options and PERF_RECORD_COMPRESSED record Alexey Budankov
` (3 subsequent siblings)
4 siblings, 0 replies; 11+ messages in thread
From: Alexey Budankov @ 2018-12-24 13:35 UTC (permalink / raw)
To: Arnaldo Carvalho de Melo, Ingo Molnar, Peter Zijlstra
Cc: Jiri Olsa, Namhyung Kim, Alexander Shishkin, Andi Kleen, linux-kernel
Build feature check, LIBZSTD_DIR and NO_LIBZSTD defines to have capability
of overriding ZStandard library source and disabling of the feature and from
the command line:
$ make -C tools/perf LIBZSTD_DIR=/root/abudanko/zstd-1.3.7 clean all
$ make -C tools/perf NO_LIBZSTD=1 clean all
Signed-off-by: Alexey Budankov <alexey.budankov@linux.intel.com>
---
tools/build/Makefile.feature | 6 ++++--
tools/build/feature/Makefile | 6 +++++-
tools/build/feature/test-all.c | 5 +++++
tools/build/feature/test-libzstd.c | 12 ++++++++++++
tools/perf/Makefile.config | 20 ++++++++++++++++++++
tools/perf/Makefile.perf | 3 +++
6 files changed, 49 insertions(+), 3 deletions(-)
create mode 100644 tools/build/feature/test-libzstd.c
diff --git a/tools/build/Makefile.feature b/tools/build/Makefile.feature
index 5467c6bf9ceb..25088c8f05b2 100644
--- a/tools/build/Makefile.feature
+++ b/tools/build/Makefile.feature
@@ -71,7 +71,8 @@ FEATURE_TESTS_BASIC := \
sdt \
setns \
libopencsd \
- libaio
+ libaio \
+ libzstd
# FEATURE_TESTS_BASIC + FEATURE_TESTS_EXTRA is the complete list
# of all feature tests
@@ -118,7 +119,8 @@ FEATURE_DISPLAY ?= \
lzma \
get_cpuid \
bpf \
- libaio
+ libaio \
+ libzstd
# Set FEATURE_CHECK_(C|LD)FLAGS-all for all FEATURE_TESTS features.
# If in the future we need per-feature checks/flags for features not
diff --git a/tools/build/feature/Makefile b/tools/build/feature/Makefile
index 7ceb4441b627..4b8244ee65ce 100644
--- a/tools/build/feature/Makefile
+++ b/tools/build/feature/Makefile
@@ -62,7 +62,8 @@ FILES= \
test-clang.bin \
test-llvm.bin \
test-llvm-version.bin \
- test-libaio.bin
+ test-libaio.bin \
+ test-libzstd.bin
FILES := $(addprefix $(OUTPUT),$(FILES))
@@ -301,6 +302,9 @@ $(OUTPUT)test-clang.bin:
$(OUTPUT)test-libaio.bin:
$(BUILD) -lrt
+$(OUTPUT)test-libzstd.bin:
+ $(BUILD) -lzstd
+
###############################
clean:
diff --git a/tools/build/feature/test-all.c b/tools/build/feature/test-all.c
index 20cdaa4fc112..5af329b6ffef 100644
--- a/tools/build/feature/test-all.c
+++ b/tools/build/feature/test-all.c
@@ -178,6 +178,10 @@
# include "test-libaio.c"
#undef main
+#define main main_test_zstd
+# include "test-libzstd.c"
+#undef main
+
int main(int argc, char *argv[])
{
main_test_libpython();
@@ -219,6 +223,7 @@ int main(int argc, char *argv[])
main_test_setns();
main_test_libopencsd();
main_test_libaio();
+ main_test_libzstd();
return 0;
}
diff --git a/tools/build/feature/test-libzstd.c b/tools/build/feature/test-libzstd.c
new file mode 100644
index 000000000000..55268c01b84d
--- /dev/null
+++ b/tools/build/feature/test-libzstd.c
@@ -0,0 +1,12 @@
+// SPDX-License-Identifier: GPL-2.0
+#include <zstd.h>
+
+int main(void)
+{
+ ZSTD_CStream *cstream;
+
+ cstream = ZSTD_createCStream();
+ ZSTD_freeCStream(cstream);
+
+ return 0;
+}
diff --git a/tools/perf/Makefile.config b/tools/perf/Makefile.config
index b441c88cafa1..1dccd776a4aa 100644
--- a/tools/perf/Makefile.config
+++ b/tools/perf/Makefile.config
@@ -145,6 +145,13 @@ endif
FEATURE_CHECK_CFLAGS-libbabeltrace := $(LIBBABELTRACE_CFLAGS)
FEATURE_CHECK_LDFLAGS-libbabeltrace := $(LIBBABELTRACE_LDFLAGS) -lbabeltrace-ctf
+ifdef LIBZSTD_DIR
+ LIBZSTD_CFLAGS := -I$(LIBZSTD_DIR)/lib
+ LIBZSTD_LDFLAGS := -L$(LIBZSTD_DIR)/lib
+endif
+FEATURE_CHECK_CFLAGS-libzstd := $(LIBZSTD_CFLAGS)
+FEATURE_CHECK_LDFLAGS-libzstd := $(LIBZSTD_LDFLAGS)
+
FEATURE_CHECK_CFLAGS-bpf = -I. -I$(srctree)/tools/include -I$(srctree)/tools/arch/$(SRCARCH)/include/uapi -I$(srctree)/tools/include/uapi
# include ARCH specific config
-include $(src-perf)/arch/$(SRCARCH)/Makefile
@@ -770,6 +777,19 @@ ifndef NO_LZMA
endif
endif
+ifndef NO_LIBZSTD
+ ifeq ($(feature-libzstd), 1)
+ CFLAGS += -DHAVE_ZSTD_SUPPORT
+ CFLAGS += $(LIBZSTD_CFLAGS)
+ LDFLAGS += $(LIBZSTD_LDFLAGS)
+ EXTLIBS += -lzstd
+ $(call detected,CONFIG_ZSTD)
+ else
+ msg := $(warning No libzstd found, disables trace compression, please install libzstd-dev[el] and/or set LIBZSTD_DIR);
+ NO_LIBZSTD := 1
+ endif
+endif
+
ifndef NO_BACKTRACE
ifeq ($(feature-backtrace), 1)
CFLAGS += -DHAVE_BACKTRACE_SUPPORT
diff --git a/tools/perf/Makefile.perf b/tools/perf/Makefile.perf
index bd23e3f30895..dcac562e1d00 100644
--- a/tools/perf/Makefile.perf
+++ b/tools/perf/Makefile.perf
@@ -108,6 +108,9 @@ include ../scripts/utilities.mak
# streaming for record mode. Currently Posix AIO trace streaming is
# supported only when linking with glibc.
#
+# Define NO_LIBZSTD if you do not want support of Zstandard based runtime
+# trace compression in record mode.
+#
# As per kernel Makefile, avoid funny character set dependencies
unexport LC_ALL
^ permalink raw reply related [flat|nested] 11+ messages in thread
* [PATCH v1 2/4] perf record: introduce z, mmap-flush options and PERF_RECORD_COMPRESSED record
2018-12-24 13:21 [PATCH v1 0/4] perf: enable compression of record mode trace to save storage space Alexey Budankov
2018-12-24 13:35 ` [PATCH v1 1/4] feature: build libzstd feature check, LIBZSTD_DIR and NO_LIBZSTD defines Alexey Budankov
@ 2018-12-24 13:45 ` Alexey Budankov
2019-01-09 16:58 ` Jiri Olsa
2018-12-24 13:46 ` [PATCH v1 3/4] perf record: enable runtime trace compression Alexey Budankov
` (2 subsequent siblings)
4 siblings, 1 reply; 11+ messages in thread
From: Alexey Budankov @ 2018-12-24 13:45 UTC (permalink / raw)
To: Arnaldo Carvalho de Melo, Ingo Molnar, Peter Zijlstra
Cc: Jiri Olsa, Namhyung Kim, Alexander Shishkin, Andi Kleen, linux-kernel
Introduce --compression_level=n, --mmap-flush options and PERF_RECORD_COMPRESSED
event record that contains compressed parts of mmap kernel buffer data.
Signed-off-by: Alexey Budankov <alexey.budankov@linux.intel.com>
---
tools/perf/Documentation/perf-record.txt | 11 +++
tools/perf/builtin-record.c | 97 ++++++++++++++++++++----
tools/perf/perf.h | 2 +
tools/perf/util/env.h | 10 +++
tools/perf/util/event.c | 1 +
tools/perf/util/event.h | 7 ++
tools/perf/util/evlist.c | 6 +-
tools/perf/util/evlist.h | 2 +-
| 47 +++++++++++-
| 1 +
tools/perf/util/mmap.c | 4 +-
tools/perf/util/mmap.h | 3 +-
12 files changed, 169 insertions(+), 22 deletions(-)
diff --git a/tools/perf/Documentation/perf-record.txt b/tools/perf/Documentation/perf-record.txt
index d232b13ea713..b849dfdefefe 100644
--- a/tools/perf/Documentation/perf-record.txt
+++ b/tools/perf/Documentation/perf-record.txt
@@ -440,6 +440,17 @@ Use <n> control blocks in asynchronous (Posix AIO) trace writing mode (default:
Asynchronous mode is supported only when linking Perf tool with libc library
providing implementation for Posix AIO API.
+-z::
+--compression-level=n::
+Produce compressed trace file to save storage space using specified level n (default: 0,
+best speed: 1, best compression: 22). Compression can be activated in asynchronous trace
+writing mode (--aio) only.
+
+--mmap-flush=n::
+Minimal number of bytes accumulated in mmap buffer that is flushed to trace file (default: 1).
+When compression mode (-z) is enabled it is recommended to set --mmap-flush to 4096 or more.
+Maximal allowed value is a quater of mmap kernel buffer size.
+
--all-kernel::
Configure all used events to run in kernel space.
diff --git a/tools/perf/builtin-record.c b/tools/perf/builtin-record.c
index 882285fb9f64..cb0b880281d7 100644
--- a/tools/perf/builtin-record.c
+++ b/tools/perf/builtin-record.c
@@ -81,6 +81,8 @@ struct record {
bool timestamp_boundary;
struct switch_output switch_output;
unsigned long long samples;
+ u64 bytes_transferred;
+ u64 bytes_compressed;
};
static volatile int auxtrace_record__snapshot_started;
@@ -286,13 +288,17 @@ static int record__aio_parse(const struct option *opt,
if (unset) {
opts->nr_cblocks = 0;
- } else {
- if (str)
- opts->nr_cblocks = strtol(str, NULL, 0);
- if (!opts->nr_cblocks)
- opts->nr_cblocks = nr_cblocks_default;
+ return 0;
}
+ if (str)
+ opts->nr_cblocks = strtol(str, NULL, 0);
+ if (!opts->nr_cblocks)
+ opts->nr_cblocks = nr_cblocks_default;
+
+ if (opts->nr_cblocks > nr_cblocks_max)
+ opts->nr_cblocks = nr_cblocks_max;
+
return 0;
}
#else /* HAVE_AIO_SUPPORT */
@@ -328,6 +334,30 @@ static int record__aio_enabled(struct record *rec)
return rec->opts.nr_cblocks > 0;
}
+#define MMAP_FLUSH_DEFAULT 1
+
+static int record__mmap_flush_parse(const struct option *opt,
+ const char *str,
+ int unset)
+{
+ int mmap_len;
+ struct record_opts *opts = (struct record_opts *)opt->value;
+
+ if (unset)
+ return 0;
+
+ if (str)
+ opts->mmap_flush = strtol(str, NULL, 0);
+ if (!opts->mmap_flush)
+ opts->mmap_flush = MMAP_FLUSH_DEFAULT;
+
+ mmap_len = perf_evlist__mmap_size(opts->mmap_pages);
+ if (opts->mmap_flush > mmap_len / 4 )
+ opts->mmap_flush = mmap_len / 4;
+
+ return 0;
+}
+
static int process_synthesized_event(struct perf_tool *tool,
union perf_event *event,
struct perf_sample *sample __maybe_unused,
@@ -533,7 +563,8 @@ static int record__mmap_evlist(struct record *rec,
if (perf_evlist__mmap_ex(evlist, opts->mmap_pages,
opts->auxtrace_mmap_pages,
- opts->auxtrace_snapshot_mode, opts->nr_cblocks) < 0) {
+ opts->auxtrace_snapshot_mode,
+ opts->nr_cblocks, opts->mmap_flush) < 0) {
if (errno == EPERM) {
pr_err("Permission error mapping pages.\n"
"Consider increasing "
@@ -723,7 +754,7 @@ static struct perf_event_header finished_round_event = {
};
static int record__mmap_read_evlist(struct record *rec, struct perf_evlist *evlist,
- bool overwrite)
+ bool overwrite, bool sync)
{
u64 bytes_written = rec->bytes_written;
int i;
@@ -746,11 +777,18 @@ static int record__mmap_read_evlist(struct record *rec, struct perf_evlist *evli
off = record__aio_get_pos(trace_fd);
for (i = 0; i < evlist->nr_mmaps; i++) {
+ u64 flush;
struct perf_mmap *map = &maps[i];
if (map->base) {
+ if (sync) {
+ flush = map->flush;
+ map->flush = MMAP_FLUSH_DEFAULT;
+ }
if (!record__aio_enabled(rec)) {
if (perf_mmap__push(map, rec, record__pushfn) != 0) {
+ if (sync)
+ map->flush = flush;
rc = -1;
goto out;
}
@@ -763,10 +801,14 @@ static int record__mmap_read_evlist(struct record *rec, struct perf_evlist *evli
idx = record__aio_sync(map, false);
if (perf_mmap__aio_push(map, rec, idx, record__aio_pushfn, &off) != 0) {
record__aio_set_pos(trace_fd, off);
+ if (sync)
+ map->flush = flush;
rc = -1;
goto out;
}
}
+ if (sync)
+ map->flush = flush;
}
if (map->auxtrace_mmap.base && !rec->opts.auxtrace_snapshot_mode &&
@@ -792,15 +834,15 @@ static int record__mmap_read_evlist(struct record *rec, struct perf_evlist *evli
return rc;
}
-static int record__mmap_read_all(struct record *rec)
+static int record__mmap_read_all(struct record *rec, bool sync)
{
int err;
- err = record__mmap_read_evlist(rec, rec->evlist, false);
+ err = record__mmap_read_evlist(rec, rec->evlist, false, sync);
if (err)
return err;
- return record__mmap_read_evlist(rec, rec->evlist, true);
+ return record__mmap_read_evlist(rec, rec->evlist, true, sync);
}
static void record__init_features(struct record *rec)
@@ -826,6 +868,9 @@ static void record__init_features(struct record *rec)
if (!(rec->opts.use_clockid && rec->opts.clockid_res_ns))
perf_header__clear_feat(&session->header, HEADER_CLOCKID);
+ if (!(rec->opts.comp_level && rec->opts.nr_cblocks))
+ perf_header__clear_feat(&session->header, HEADER_COMPRESSED);
+
perf_header__clear_feat(&session->header, HEADER_STAT);
}
@@ -1130,6 +1175,10 @@ static int __cmd_record(struct record *rec, int argc, const char **argv)
fd = perf_data__fd(data);
rec->session = session;
+ session->header.env.comp_type = PERF_COMP_NONE;
+ rec->opts.comp_level = 0;
+ session->header.env.comp_level = rec->opts.comp_level;
+
record__init_features(rec);
if (rec->opts.use_clockid && rec->opts.clockid_res_ns)
@@ -1159,6 +1208,7 @@ static int __cmd_record(struct record *rec, int argc, const char **argv)
err = -1;
goto out_child;
}
+ session->header.env.comp_mmap_len = session->evlist->mmap_len;
err = bpf__apply_obj_config();
if (err) {
@@ -1294,7 +1344,7 @@ static int __cmd_record(struct record *rec, int argc, const char **argv)
if (trigger_is_hit(&switch_output_trigger) || done || draining)
perf_evlist__toggle_bkw_mmap(rec->evlist, BKW_MMAP_DATA_PENDING);
- if (record__mmap_read_all(rec) < 0) {
+ if (record__mmap_read_all(rec, false) < 0) {
trigger_error(&auxtrace_snapshot_trigger);
trigger_error(&switch_output_trigger);
err = -1;
@@ -1395,8 +1445,16 @@ static int __cmd_record(struct record *rec, int argc, const char **argv)
record__synthesize_workload(rec, true);
out_child:
+ record__mmap_read_all(rec, true);
record__aio_mmap_read_sync(rec);
+ if (!quiet && rec->bytes_transferred && rec->bytes_compressed) {
+ float ratio = (float)rec->bytes_transferred/(float)rec->bytes_compressed;
+ session->header.env.comp_ratio = ratio + 0.5;
+ fprintf(stderr, "[ perf record: Compressed %.3f MB to %.3f MB, ratio is %.3f ]\n",
+ rec->bytes_transferred / 1024.0 / 1024.0, rec->bytes_compressed / 1024.0 / 1024.0, ratio);
+ }
+
if (forks) {
int exit_status;
@@ -1782,6 +1840,7 @@ static struct record record = {
.uses_mmap = true,
.default_per_cpu = true,
},
+ .mmap_flush = MMAP_FLUSH_DEFAULT,
},
.tool = {
.sample = process_sample_event,
@@ -1945,7 +2004,12 @@ static struct option __record_options[] = {
OPT_CALLBACK_OPTARG(0, "aio", &record.opts,
&nr_cblocks_default, "n", "Use <n> control blocks in asynchronous trace writing mode (default: 1, max: 4)",
record__aio_parse),
+ OPT_UINTEGER('z', "compression-level", &record.opts.comp_level,
+ "Produce compressed trace file (default: 0, best speed: 1, best compression: 22)"),
#endif
+ OPT_CALLBACK(0, "mmap-flush", &record.opts, "num",
+ "Minimal number of bytes in mmap buffer that is flushed to trace file (default: 1)",
+ record__mmap_flush_parse),
OPT_END()
};
@@ -2138,10 +2202,13 @@ int cmd_record(int argc, const char **argv)
goto out;
}
- if (rec->opts.nr_cblocks > nr_cblocks_max)
- rec->opts.nr_cblocks = nr_cblocks_max;
- if (verbose > 0)
- pr_info("nr_cblocks: %d\n", rec->opts.nr_cblocks);
+ pr_debug("nr_cblocks: %d\n", rec->opts.nr_cblocks);
+
+ if (rec->opts.comp_level > 22)
+ rec->opts.comp_level = 0;
+ pr_debug("Compression level: %d\n", rec->opts.comp_level);
+
+ pr_debug("mmap flush (B): %d\n", rec->opts.mmap_flush);
err = __cmd_record(&record, argc, argv);
out:
diff --git a/tools/perf/perf.h b/tools/perf/perf.h
index 388c6dd128b8..0352b5a5b9d5 100644
--- a/tools/perf/perf.h
+++ b/tools/perf/perf.h
@@ -83,6 +83,8 @@ struct record_opts {
clockid_t clockid;
u64 clockid_res_ns;
int nr_cblocks;
+ unsigned int comp_level;
+ int mmap_flush;
};
struct option;
diff --git a/tools/perf/util/env.h b/tools/perf/util/env.h
index d01b8355f4ca..3d1ab2ccc128 100644
--- a/tools/perf/util/env.h
+++ b/tools/perf/util/env.h
@@ -64,6 +64,16 @@ struct perf_env {
struct memory_node *memory_nodes;
unsigned long long memory_bsize;
u64 clockid_res_ns;
+ u32 comp_type;
+ u32 comp_level;
+ u32 comp_ratio;
+ u32 comp_mmap_len;
+};
+
+enum perf_compress_type {
+ PERF_COMP_NONE = 0,
+ PERF_COMP_ZSTD,
+ PERF_COMP_EOF
};
extern struct perf_env perf_env;
diff --git a/tools/perf/util/event.c b/tools/perf/util/event.c
index 937a5a4f71cc..20730ba2a08b 100644
--- a/tools/perf/util/event.c
+++ b/tools/perf/util/event.c
@@ -62,6 +62,7 @@ static const char *perf_event__names[] = {
[PERF_RECORD_EVENT_UPDATE] = "EVENT_UPDATE",
[PERF_RECORD_TIME_CONV] = "TIME_CONV",
[PERF_RECORD_HEADER_FEATURE] = "FEATURE",
+ [PERF_RECORD_COMPRESSED] = "COMPRESSED",
};
static const char *perf_ns__names[] = {
diff --git a/tools/perf/util/event.h b/tools/perf/util/event.h
index eb95f3384958..03960cfbe8d3 100644
--- a/tools/perf/util/event.h
+++ b/tools/perf/util/event.h
@@ -249,6 +249,7 @@ enum perf_user_event_type { /* above any possible kernel type */
PERF_RECORD_EVENT_UPDATE = 78,
PERF_RECORD_TIME_CONV = 79,
PERF_RECORD_HEADER_FEATURE = 80,
+ PERF_RECORD_COMPRESSED = 81,
PERF_RECORD_HEADER_MAX
};
@@ -620,6 +621,11 @@ struct feature_event {
char data[];
};
+struct compressed_event {
+ struct perf_event_header header;
+ char data[];
+};
+
union perf_event {
struct perf_event_header header;
struct mmap_event mmap;
@@ -651,6 +657,7 @@ union perf_event {
struct stat_round_event stat_round;
struct time_conv_event time_conv;
struct feature_event feat;
+ struct compressed_event pack;
};
void perf_event__print_totals(void);
diff --git a/tools/perf/util/evlist.c b/tools/perf/util/evlist.c
index 8c902276d4b4..c82d4fd32dcf 100644
--- a/tools/perf/util/evlist.c
+++ b/tools/perf/util/evlist.c
@@ -1022,7 +1022,7 @@ int perf_evlist__parse_mmap_pages(const struct option *opt, const char *str,
*/
int perf_evlist__mmap_ex(struct perf_evlist *evlist, unsigned int pages,
unsigned int auxtrace_pages,
- bool auxtrace_overwrite, int nr_cblocks)
+ bool auxtrace_overwrite, int nr_cblocks, int flush)
{
struct perf_evsel *evsel;
const struct cpu_map *cpus = evlist->cpus;
@@ -1032,7 +1032,7 @@ int perf_evlist__mmap_ex(struct perf_evlist *evlist, unsigned int pages,
* Its value is decided by evsel's write_backward.
* So &mp should not be passed through const pointer.
*/
- struct mmap_params mp = { .nr_cblocks = nr_cblocks };
+ struct mmap_params mp = { .nr_cblocks = nr_cblocks, .flush = flush };
if (!evlist->mmap)
evlist->mmap = perf_evlist__alloc_mmap(evlist, false);
@@ -1064,7 +1064,7 @@ int perf_evlist__mmap_ex(struct perf_evlist *evlist, unsigned int pages,
int perf_evlist__mmap(struct perf_evlist *evlist, unsigned int pages)
{
- return perf_evlist__mmap_ex(evlist, pages, 0, false, 0);
+ return perf_evlist__mmap_ex(evlist, pages, 0, false, 0, 1);
}
int perf_evlist__create_maps(struct perf_evlist *evlist, struct target *target)
diff --git a/tools/perf/util/evlist.h b/tools/perf/util/evlist.h
index 868294491194..33af704e55a2 100644
--- a/tools/perf/util/evlist.h
+++ b/tools/perf/util/evlist.h
@@ -162,7 +162,7 @@ unsigned long perf_event_mlock_kb_in_pages(void);
int perf_evlist__mmap_ex(struct perf_evlist *evlist, unsigned int pages,
unsigned int auxtrace_pages,
- bool auxtrace_overwrite, int nr_cblocks);
+ bool auxtrace_overwrite, int nr_cblocks, int flush);
int perf_evlist__mmap(struct perf_evlist *evlist, unsigned int pages);
void perf_evlist__munmap(struct perf_evlist *evlist);
--git a/tools/perf/util/header.c b/tools/perf/util/header.c
index dec6d218c31c..37ab460b6f06 100644
--- a/tools/perf/util/header.c
+++ b/tools/perf/util/header.c
@@ -1463,6 +1463,23 @@ static int write_mem_topology(struct feat_fd *ff __maybe_unused,
return ret;
}
+static int write_compressed(struct feat_fd *ff __maybe_unused,
+ struct perf_evlist *evlist __maybe_unused)
+{
+ int ret;
+ u64 compression_info = ((u64)ff->ph->env.comp_type << 32) |
+ ff->ph->env.comp_level;
+
+ ret = do_write(ff, &compression_info, sizeof(compression_info));
+ if (ret)
+ return ret;
+
+ compression_info = ((u64)ff->ph->env.comp_ratio << 32) |
+ ff->ph->env.comp_mmap_len;
+
+ return do_write(ff, &compression_info, sizeof(compression_info));
+}
+
static void print_hostname(struct feat_fd *ff, FILE *fp)
{
fprintf(fp, "# hostname : %s\n", ff->ph->env.hostname);
@@ -1750,6 +1767,13 @@ static void print_cache(struct feat_fd *ff, FILE *fp __maybe_unused)
}
}
+static void print_compressed(struct feat_fd *ff, FILE *fp)
+{
+ fprintf(fp, "# compressed : %s, level = %d, ratio = %d\n",
+ ff->ph->env.comp_type == PERF_COMP_ZSTD ? "Zstd" : "Unknown",
+ ff->ph->env.comp_level, ff->ph->env.comp_ratio);
+}
+
static void print_pmu_mappings(struct feat_fd *ff, FILE *fp)
{
const char *delimiter = "# pmu mappings: ";
@@ -2592,6 +2616,26 @@ static int process_clockid(struct feat_fd *ff,
return 0;
}
+static int process_compressed(struct feat_fd *ff,
+ void *data __maybe_unused)
+{
+ u64 compression_info;
+
+ if (do_read_u64(ff, &compression_info))
+ return -1;
+
+ ff->ph->env.comp_type = (compression_info >> 32) & 0xffffffffULL;
+ ff->ph->env.comp_level = compression_info & 0xffffffffULL;
+
+ if (do_read_u64(ff, &compression_info))
+ return -1;
+
+ ff->ph->env.comp_ratio = (compression_info >> 32) & 0xffffffffULL;
+ ff->ph->env.comp_mmap_len = compression_info & 0xffffffffULL;
+
+ return 0;
+}
+
struct feature_ops {
int (*write)(struct feat_fd *ff, struct perf_evlist *evlist);
void (*print)(struct feat_fd *ff, FILE *fp);
@@ -2651,7 +2695,8 @@ static const struct feature_ops feat_ops[HEADER_LAST_FEATURE] = {
FEAT_OPN(CACHE, cache, true),
FEAT_OPR(SAMPLE_TIME, sample_time, false),
FEAT_OPR(MEM_TOPOLOGY, mem_topology, true),
- FEAT_OPR(CLOCKID, clockid, false)
+ FEAT_OPR(CLOCKID, clockid, false),
+ FEAT_OPR(COMPRESSED, compressed, false)
};
struct header_print_data {
--git a/tools/perf/util/header.h b/tools/perf/util/header.h
index 0d553ddca0a3..ee867075dc64 100644
--- a/tools/perf/util/header.h
+++ b/tools/perf/util/header.h
@@ -39,6 +39,7 @@ enum {
HEADER_SAMPLE_TIME,
HEADER_MEM_TOPOLOGY,
HEADER_CLOCKID,
+ HEADER_COMPRESSED,
HEADER_LAST_FEATURE,
HEADER_FEAT_BITS = 256,
};
diff --git a/tools/perf/util/mmap.c b/tools/perf/util/mmap.c
index 8fc39311a30d..5e71b0183e33 100644
--- a/tools/perf/util/mmap.c
+++ b/tools/perf/util/mmap.c
@@ -347,6 +347,8 @@ int perf_mmap__mmap(struct perf_mmap *map, struct mmap_params *mp, int fd, int c
&mp->auxtrace_mp, map->base, fd))
return -1;
+ map->flush = mp->flush;
+
return perf_mmap__aio_mmap(map, mp);
}
@@ -395,7 +397,7 @@ static int __perf_mmap__read_init(struct perf_mmap *md)
md->start = md->overwrite ? head : old;
md->end = md->overwrite ? old : head;
- if (md->start == md->end)
+ if ((md->end - md->start) < md->flush)
return -EAGAIN;
size = md->end - md->start;
diff --git a/tools/perf/util/mmap.h b/tools/perf/util/mmap.h
index aeb6942fdb00..afbfb8b58d45 100644
--- a/tools/perf/util/mmap.h
+++ b/tools/perf/util/mmap.h
@@ -30,6 +30,7 @@ struct perf_mmap {
bool overwrite;
struct auxtrace_mmap auxtrace_mmap;
char event_copy[PERF_SAMPLE_MAX_SIZE] __aligned(8);
+ u64 flush;
#ifdef HAVE_AIO_SUPPORT
struct {
void **data;
@@ -69,7 +70,7 @@ enum bkw_mmap_state {
};
struct mmap_params {
- int prot, mask, nr_cblocks;
+ int prot, mask, nr_cblocks, flush;
struct auxtrace_mmap_params auxtrace_mp;
};
^ permalink raw reply related [flat|nested] 11+ messages in thread
* Re: [PATCH v1 2/4] perf record: introduce z, mmap-flush options and PERF_RECORD_COMPRESSED record
2018-12-24 13:45 ` [PATCH v1 2/4] perf record: introduce z, mmap-flush options and PERF_RECORD_COMPRESSED record Alexey Budankov
@ 2019-01-09 16:58 ` Jiri Olsa
2019-01-14 8:46 ` Alexey Budankov
0 siblings, 1 reply; 11+ messages in thread
From: Jiri Olsa @ 2019-01-09 16:58 UTC (permalink / raw)
To: Alexey Budankov
Cc: Arnaldo Carvalho de Melo, Ingo Molnar, Peter Zijlstra,
Namhyung Kim, Alexander Shishkin, Andi Kleen, linux-kernel
On Mon, Dec 24, 2018 at 04:45:21PM +0300, Alexey Budankov wrote:
>
> Introduce --compression_level=n, --mmap-flush options and PERF_RECORD_COMPRESSED
> event record that contains compressed parts of mmap kernel buffer data.
>
> Signed-off-by: Alexey Budankov <alexey.budankov@linux.intel.com>
> ---
> tools/perf/Documentation/perf-record.txt | 11 +++
> tools/perf/builtin-record.c | 97 ++++++++++++++++++++----
> tools/perf/perf.h | 2 +
> tools/perf/util/env.h | 10 +++
> tools/perf/util/event.c | 1 +
> tools/perf/util/event.h | 7 ++
> tools/perf/util/evlist.c | 6 +-
> tools/perf/util/evlist.h | 2 +-
> tools/perf/util/header.c | 47 +++++++++++-
> tools/perf/util/header.h | 1 +
> tools/perf/util/mmap.c | 4 +-
> tools/perf/util/mmap.h | 3 +-
> 12 files changed, 169 insertions(+), 22 deletions(-)
also I'm getting here similar error (like for the affinity patchset)
[jolsa@krava perf]$ git am /tmp/comp
Applying: feature: build libzstd feature check, LIBZSTD_DIR and NO_LIBZSTD defines
Applying: perf record: introduce z, mmap-flush options and PERF_RECORD_COMPRESSED record
error: corrupt patch at line 526
Patch failed at 0002 perf record: introduce z, mmap-flush options and PERF_RECORD_COMPRESSED record
again the raw patch applies correctly:
[jolsa@krava perf]$ patch -p3 < /tmp/comp2
patching file Documentation/perf-record.txt
patching file builtin-record.c
patching file perf.h
patching file util/env.h
patching file util/event.c
patching file util/event.h
patching file util/evlist.c
patching file util/evlist.h
patching file util/header.c
patching file util/header.h
patching file util/mmap.c
patching file util/mmap.h
jirka
^ permalink raw reply [flat|nested] 11+ messages in thread
* Re: [PATCH v1 2/4] perf record: introduce z, mmap-flush options and PERF_RECORD_COMPRESSED record
2019-01-09 16:58 ` Jiri Olsa
@ 2019-01-14 8:46 ` Alexey Budankov
0 siblings, 0 replies; 11+ messages in thread
From: Alexey Budankov @ 2019-01-14 8:46 UTC (permalink / raw)
To: Jiri Olsa
Cc: Arnaldo Carvalho de Melo, Ingo Molnar, Peter Zijlstra,
Namhyung Kim, Alexander Shishkin, Andi Kleen, linux-kernel
Hi,
On 09.01.2019 19:58, Jiri Olsa wrote:
> On Mon, Dec 24, 2018 at 04:45:21PM +0300, Alexey Budankov wrote:
>>
>> Introduce --compression_level=n, --mmap-flush options and PERF_RECORD_COMPRESSED
>> event record that contains compressed parts of mmap kernel buffer data.
>>
>> Signed-off-by: Alexey Budankov <alexey.budankov@linux.intel.com>
>> ---
>> tools/perf/Documentation/perf-record.txt | 11 +++
>> tools/perf/builtin-record.c | 97 ++++++++++++++++++++----
>> tools/perf/perf.h | 2 +
>> tools/perf/util/env.h | 10 +++
>> tools/perf/util/event.c | 1 +
>> tools/perf/util/event.h | 7 ++
>> tools/perf/util/evlist.c | 6 +-
>> tools/perf/util/evlist.h | 2 +-
>> tools/perf/util/header.c | 47 +++++++++++-
>> tools/perf/util/header.h | 1 +
>> tools/perf/util/mmap.c | 4 +-
>> tools/perf/util/mmap.h | 3 +-
>> 12 files changed, 169 insertions(+), 22 deletions(-)
>
> also I'm getting here similar error (like for the affinity patchset)
Really weird. Hopefully it can be resolved in the next version of the patch set.
Thanks,
Alexey
>
> [jolsa@krava perf]$ git am /tmp/comp
> Applying: feature: build libzstd feature check, LIBZSTD_DIR and NO_LIBZSTD defines
> Applying: perf record: introduce z, mmap-flush options and PERF_RECORD_COMPRESSED record
> error: corrupt patch at line 526
> Patch failed at 0002 perf record: introduce z, mmap-flush options and PERF_RECORD_COMPRESSED record
>
> again the raw patch applies correctly:
>
> [jolsa@krava perf]$ patch -p3 < /tmp/comp2
> patching file Documentation/perf-record.txt
> patching file builtin-record.c
> patching file perf.h
> patching file util/env.h
> patching file util/event.c
> patching file util/event.h
> patching file util/evlist.c
> patching file util/evlist.h
> patching file util/header.c
> patching file util/header.h
> patching file util/mmap.c
> patching file util/mmap.h
>
>
> jirka
>
^ permalink raw reply [flat|nested] 11+ messages in thread
* [PATCH v1 3/4] perf record: enable runtime trace compression
2018-12-24 13:21 [PATCH v1 0/4] perf: enable compression of record mode trace to save storage space Alexey Budankov
2018-12-24 13:35 ` [PATCH v1 1/4] feature: build libzstd feature check, LIBZSTD_DIR and NO_LIBZSTD defines Alexey Budankov
2018-12-24 13:45 ` [PATCH v1 2/4] perf record: introduce z, mmap-flush options and PERF_RECORD_COMPRESSED record Alexey Budankov
@ 2018-12-24 13:46 ` Alexey Budankov
2018-12-24 14:00 ` [PATCH v1 4/4] perf report: support record trace file decompression Alexey Budankov
2019-01-09 17:28 ` [PATCH v1 0/4] perf: enable compression of record mode trace to save storage space Jiri Olsa
4 siblings, 0 replies; 11+ messages in thread
From: Alexey Budankov @ 2018-12-24 13:46 UTC (permalink / raw)
To: Arnaldo Carvalho de Melo, Ingo Molnar, Peter Zijlstra
Cc: Jiri Olsa, Namhyung Kim, Alexander Shishkin, Andi Kleen, linux-kernel
Compression is implemented using Zstandard API and employs AIO buffers
as the memory to operate on so memcpy() is substituted by the API call.
If the API call fails for some reason copying falls back to memcpy().
Data chunks are split and packed into PERF_RECORD_COMPRESSED records by
64KB at max. mmap-flush option value can be used to avoid compression of
every single byte of data and increase compression ratio.
Signed-off-by: Alexey Budankov <alexey.budankov@linux.intel.com>
---
tools/perf/builtin-record.c | 122 ++++++++++++++++++++++++++++++++++--
tools/perf/util/mmap.c | 13 ++--
tools/perf/util/mmap.h | 2 +
3 files changed, 127 insertions(+), 10 deletions(-)
diff --git a/tools/perf/builtin-record.c b/tools/perf/builtin-record.c
index cb0b880281d7..0ef1878967f8 100644
--- a/tools/perf/builtin-record.c
+++ b/tools/perf/builtin-record.c
@@ -53,6 +53,9 @@
#include <sys/mman.h>
#include <sys/wait.h>
#include <linux/time64.h>
+#ifdef HAVE_ZSTD_SUPPORT
+#include <zstd.h>
+#endif
struct switch_output {
bool enabled;
@@ -83,6 +86,9 @@ struct record {
unsigned long long samples;
u64 bytes_transferred;
u64 bytes_compressed;
+#ifdef HAVE_ZSTD_SUPPORT
+ ZSTD_CStream *zstd_cstream;
+#endif
};
static volatile int auxtrace_record__snapshot_started;
@@ -358,6 +364,109 @@ static int record__mmap_flush_parse(const struct option *opt,
return 0;
}
+#ifdef HAVE_ZSTD_SUPPORT
+static int record__zstd_init(struct record *rec)
+{
+ size_t ret;
+
+ if (rec->opts.comp_level == 0)
+ return 0;
+
+ rec->zstd_cstream = ZSTD_createCStream();
+ if (rec->zstd_cstream == NULL) {
+ pr_err("Couldn't create compression stream, disables trace compression\n");
+ return -1;
+ }
+
+ ret = ZSTD_initCStream(rec->zstd_cstream, rec->opts.comp_level);
+ if (ZSTD_isError(ret)) {
+ pr_err("Failed to initialize compression stream: %s\n", ZSTD_getErrorName(ret));
+ return -1;
+ }
+
+ return 0;
+}
+
+static int record__zstd_fini(struct record *rec)
+{
+ if (rec->zstd_cstream) {
+ ZSTD_freeCStream(rec->zstd_cstream);
+ rec->zstd_cstream = NULL;
+ }
+
+ return 0;
+}
+
+static size_t record__zstd_compress(void *to, void *dst, size_t dst_size,
+ void *src, size_t src_size)
+{
+ void *dst_head = dst;
+ struct record *rec = to;
+ size_t ret, size, compressed = 0;
+ struct compressed_event *event = NULL;
+ /* maximum size of record data size (2^16 - 1 - header) */
+ const size_t max_data_size = (1 << 8 * sizeof(event->header.size)) -
+ 1 - sizeof(struct compressed_event);
+ ZSTD_inBuffer input = { src, src_size, 0 };
+ ZSTD_outBuffer output;
+
+ if (rec->opts.comp_level == 0) {
+ memcpy(dst_head, src, src_size);
+ return src_size;
+ }
+
+ while (input.pos < input.size) {
+ event = dst;
+
+ event->header.type = PERF_RECORD_COMPRESSED;
+ event->header.size = size = sizeof(struct compressed_event);
+ compressed += size;
+ dst += size;
+ dst_size -= size;
+
+ output = (ZSTD_outBuffer){ dst, (dst_size > max_data_size) ?
+ max_data_size : dst_size, 0 };
+ ret = ZSTD_compressStream(rec->zstd_cstream, &output, &input);
+ ZSTD_flushStream(rec->zstd_cstream, &output);
+ if (ZSTD_isError(ret)) {
+ pr_err("failed to compress %ld bytes: %s\n",
+ (long)src_size, ZSTD_getErrorName(ret));
+ memcpy(dst_head, src, src_size);
+ return src_size;
+ }
+ size = output.pos;
+
+ event->header.size += size;
+ compressed += size;
+ dst += size;
+ dst_size -= size;
+ }
+
+ rec->bytes_transferred += src_size;
+ rec->bytes_compressed += compressed;
+
+ return compressed;
+}
+#else /* !HAVE_ZSTD_SUPPORT */
+static int record__zstd_init(struct record *rec __maybe_unused)
+{
+ return -1;
+}
+
+static int record__zstd_fini(struct record *rec __maybe_unused)
+{
+ return 0;
+}
+
+static size_t record__zstd_compress(void *to __maybe_unused,
+ void *dst, size_t dst_size __maybe_unused,
+ void *src, size_t src_size)
+{
+ memcpy(dst, src, src_size);
+ return src_size;
+}
+#endif
+
static int process_synthesized_event(struct perf_tool *tool,
union perf_event *event,
struct perf_sample *sample __maybe_unused,
@@ -799,7 +908,8 @@ static int record__mmap_read_evlist(struct record *rec, struct perf_evlist *evli
* becomes available after previous aio write request.
*/
idx = record__aio_sync(map, false);
- if (perf_mmap__aio_push(map, rec, idx, record__aio_pushfn, &off) != 0) {
+ if (perf_mmap__aio_push(map, rec, idx,
+ record__zstd_compress, record__aio_pushfn, &off) != 0) {
record__aio_set_pos(trace_fd, off);
if (sync)
map->flush = flush;
@@ -1175,8 +1285,12 @@ static int __cmd_record(struct record *rec, int argc, const char **argv)
fd = perf_data__fd(data);
rec->session = session;
- session->header.env.comp_type = PERF_COMP_NONE;
- rec->opts.comp_level = 0;
+ if (record__zstd_init(rec) == 0) {
+ session->header.env.comp_type = PERF_COMP_ZSTD;
+ } else {
+ session->header.env.comp_type = PERF_COMP_NONE;
+ rec->opts.comp_level = 0;
+ }
session->header.env.comp_level = rec->opts.comp_level;
record__init_features(rec);
@@ -1447,7 +1561,7 @@ static int __cmd_record(struct record *rec, int argc, const char **argv)
out_child:
record__mmap_read_all(rec, true);
record__aio_mmap_read_sync(rec);
-
+ record__zstd_fini(rec);
if (!quiet && rec->bytes_transferred && rec->bytes_compressed) {
float ratio = (float)rec->bytes_transferred/(float)rec->bytes_compressed;
session->header.env.comp_ratio = ratio + 0.5;
diff --git a/tools/perf/util/mmap.c b/tools/perf/util/mmap.c
index 5e71b0183e33..58a71ca77df5 100644
--- a/tools/perf/util/mmap.c
+++ b/tools/perf/util/mmap.c
@@ -218,14 +218,16 @@ static void perf_mmap__aio_munmap(struct perf_mmap *map)
}
int perf_mmap__aio_push(struct perf_mmap *md, void *to, int idx,
+ size_t compress(void *to, void *dst, size_t dst_size, void *src, size_t src_size),
int push(void *to, struct aiocb *cblock, void *buf, size_t size, off_t off),
off_t *off)
{
u64 head = perf_mmap__read_head(md);
unsigned char *data = md->base + page_size;
- unsigned long size, size0 = 0;
+ size_t size, size0 = 0, size1 = 0;
void *buf;
int rc = 0;
+ size_t mmap_len = perf_mmap__mmap_len(md);
rc = perf_mmap__read_init(md);
if (rc < 0)
@@ -254,14 +256,13 @@ int perf_mmap__aio_push(struct perf_mmap *md, void *to, int idx,
buf = &data[md->start & md->mask];
size = md->mask + 1 - (md->start & md->mask);
md->start += size;
- memcpy(md->aio.data[idx], buf, size);
- size0 = size;
+ size0 = compress(to, md->aio.data[idx], mmap_len, buf, size);
}
buf = &data[md->start & md->mask];
size = md->end - md->start;
md->start += size;
- memcpy(md->aio.data[idx] + size0, buf, size);
+ size1 = compress(to, md->aio.data[idx] + size0, mmap_len - size0, buf, size);
/*
* Increment md->refcount to guard md->data[idx] buffer
@@ -277,9 +278,9 @@ int perf_mmap__aio_push(struct perf_mmap *md, void *to, int idx,
md->prev = head;
perf_mmap__consume(md);
- rc = push(to, &md->aio.cblocks[idx], md->aio.data[idx], size0 + size, *off);
+ rc = push(to, &md->aio.cblocks[idx], md->aio.data[idx], size0 + size1, *off);
if (!rc) {
- *off += size0 + size;
+ *off += size0 + size1;
} else {
/*
* Decrement md->refcount back if aio write
diff --git a/tools/perf/util/mmap.h b/tools/perf/util/mmap.h
index afbfb8b58d45..0b3b8b46410a 100644
--- a/tools/perf/util/mmap.h
+++ b/tools/perf/util/mmap.h
@@ -100,10 +100,12 @@ int perf_mmap__push(struct perf_mmap *md, void *to,
int push(struct perf_mmap *map, void *to, void *buf, size_t size));
#ifdef HAVE_AIO_SUPPORT
int perf_mmap__aio_push(struct perf_mmap *md, void *to, int idx,
+ size_t compress(void *to, void *dst, size_t dst_size, void *src, size_t src_size),
int push(void *to, struct aiocb *cblock, void *buf, size_t size, off_t off),
off_t *off);
#else
static inline int perf_mmap__aio_push(struct perf_mmap *md __maybe_unused, void *to __maybe_unused, int idx __maybe_unused,
+ size_t compress(void *to, void *dst, size_t dst_size, void *src, size_t src_size) __maybe_unused,
int push(void *to, struct aiocb *cblock, void *buf, size_t size, off_t off) __maybe_unused,
off_t *off __maybe_unused)
{
^ permalink raw reply related [flat|nested] 11+ messages in thread
* [PATCH v1 4/4] perf report: support record trace file decompression
2018-12-24 13:21 [PATCH v1 0/4] perf: enable compression of record mode trace to save storage space Alexey Budankov
` (2 preceding siblings ...)
2018-12-24 13:46 ` [PATCH v1 3/4] perf record: enable runtime trace compression Alexey Budankov
@ 2018-12-24 14:00 ` Alexey Budankov
2019-01-09 17:28 ` [PATCH v1 0/4] perf: enable compression of record mode trace to save storage space Jiri Olsa
4 siblings, 0 replies; 11+ messages in thread
From: Alexey Budankov @ 2018-12-24 14:00 UTC (permalink / raw)
To: Arnaldo Carvalho de Melo, Ingo Molnar, Peter Zijlstra
Cc: Jiri Olsa, Namhyung Kim, Alexander Shishkin, Andi Kleen, linux-kernel
PERF_RECORD_COMPRESSED records are decompressed from trace file into a
linked list of mmaped memory regions using Zstandard API. After that the
region is loaded fetching uncompressed events. When dumping raw trace
like perf report -D file offsets of events from compressed records are
set to zero.
Signed-off-by: Alexey Budankov <alexey.budankov@linux.intel.com>
---
tools/perf/builtin-report.c | 151 +++++++++++++++++++++++++++++++++++-
tools/perf/util/machine.c | 4 +
tools/perf/util/session.c | 59 +++++++++++++-
tools/perf/util/session.h | 16 ++++
tools/perf/util/tool.h | 2 +
5 files changed, 230 insertions(+), 2 deletions(-)
diff --git a/tools/perf/builtin-report.c b/tools/perf/builtin-report.c
index 4958095be4fc..1c45e674743d 100644
--- a/tools/perf/builtin-report.c
+++ b/tools/perf/builtin-report.c
@@ -52,7 +52,10 @@
#include <sys/types.h>
#include <sys/stat.h>
#include <unistd.h>
-#include <linux/mman.h>
+#include <sys/mman.h>
+#ifdef HAVE_ZSTD_SUPPORT
+#include <zstd.h>
+#endif
struct report {
struct perf_tool tool;
@@ -118,6 +121,94 @@ static int report__config(const char *var, const char *value, void *cb)
return 0;
}
+#ifdef HAVE_ZSTD_SUPPORT
+static int report__zstd_init(struct perf_session *session)
+{
+ size_t ret;
+
+ session->zstd_dstream = ZSTD_createDStream();
+ if (session->zstd_dstream == NULL)
+ {
+ pr_err("Couldn't create decompression stream, disables trace compression\n");
+ return -1;
+ }
+
+ ret = ZSTD_initDStream(session->zstd_dstream);
+ if (ZSTD_isError(ret))
+ {
+ pr_err("Failed to initialize decompression stream: %s\n", ZSTD_getErrorName(ret));
+ return -1;
+ }
+
+ return 0;
+}
+
+static int report__zstd_fini(struct perf_session *session)
+{
+ struct decomp *next = session->decomp, *decomp;
+ size_t decomp_len = session->header.env.comp_mmap_len;
+
+ if (session->zstd_dstream) {
+ ZSTD_freeDStream(session->zstd_dstream);
+ session->zstd_dstream = NULL;
+ }
+
+ do {
+ decomp = next;
+ if (decomp == NULL)
+ break;
+ next = decomp->next;
+ munmap(decomp, decomp_len + sizeof(struct decomp));
+ } while (1);
+
+ return 0;
+}
+
+static size_t report__zstd_decompress(struct perf_session *session,
+ void *src, size_t src_size,
+ void *dst, size_t dst_size)
+{
+ size_t ret;
+ ZSTD_inBuffer input = { src, src_size, 0 };
+ ZSTD_outBuffer output = { dst, dst_size, 0 };
+
+ if (session->zstd_dstream == NULL)
+ return 0;
+
+ while (input.pos < input.size) {
+ ret = ZSTD_decompressStream(session->zstd_dstream, &output, &input);
+ if (ZSTD_isError(ret))
+ {
+ pr_err("failed to decompress (B): %ld -> %ld : %s\n",
+ src_size, output.size, ZSTD_getErrorName(ret));
+ break;
+ }
+ output.dst = dst + output.pos;
+ output.size = dst_size - output.pos;
+ }
+
+ return output.pos;
+}
+
+#else /* !HAVE_ZSTD_SUPPORT */
+static int report__zstd_init(struct perf_session *session __maybe_unused)
+{
+ return -1;
+}
+
+static int report__zstd_fini(struct perf_session *session __maybe_unused)
+{
+ return 0;
+}
+
+static size_t report__zstd_decompress(struct perf_session *session __maybe_unused,
+ void *src __maybe_unused, size_t src_size __maybe_unused,
+ void *dst __maybe_unused, size_t dst_size __maybe_unused)
+{
+ return 0;
+}
+#endif
+
static int hist_iter__report_callback(struct hist_entry_iter *iter,
struct addr_location *al, bool single,
void *arg)
@@ -225,6 +316,57 @@ static int process_feature_event(struct perf_session *session,
return 0;
}
+static int process_compressed_event(struct perf_session *session,
+ union perf_event *event, u64 file_offset)
+{
+ void *src;
+ size_t decomp_size, src_size;
+ u64 decomp_last_rem = 0;
+ size_t decomp_len = session->header.env.comp_mmap_len;
+ struct decomp *decomp, *decomp_last = session->decomp_last;
+
+ decomp = mmap(NULL, sizeof(struct decomp) + decomp_len, PROT_READ|PROT_WRITE,
+ MAP_ANONYMOUS|MAP_PRIVATE, -1, 0);
+ if (decomp == MAP_FAILED) {
+ pr_err("Couldn't allocate memory for decompression\n");
+ return -1;
+ }
+
+ decomp->file_pos = file_offset;
+ decomp->head = 0;
+
+ if (decomp_last) {
+ decomp_last_rem = decomp_last->size - decomp_last->head;
+ memcpy(decomp->data, &(decomp_last->data[decomp_last->head]), decomp_last_rem);
+ decomp->size = decomp_last_rem;
+ }
+
+ src = (void*)event + sizeof(struct compressed_event);
+ src_size = event->pack.header.size - sizeof(struct compressed_event);
+
+ decomp_size = report__zstd_decompress(session, src, src_size,
+ &(decomp->data[decomp_last_rem]), decomp_len - decomp_last_rem);
+ if (!decomp_size) {
+ munmap(decomp, sizeof(struct decomp) + decomp_len);
+ pr_err("Couldn't decompress data\n");
+ return -1;
+ }
+
+ decomp->size += decomp_size;
+
+ if (session->decomp == NULL) {
+ session->decomp = decomp;
+ session->decomp_last = decomp;
+ } else {
+ session->decomp_last->next = decomp;
+ session->decomp_last = decomp;
+ }
+
+ pr_debug("decomp (B): %ld to %ld\n", src_size, decomp_size);
+
+ return 0;
+}
+
static int process_sample_event(struct perf_tool *tool,
union perf_event *event,
struct perf_sample *sample,
@@ -983,6 +1125,7 @@ int cmd_report(int argc, const char **argv)
.auxtrace = perf_event__process_auxtrace,
.event_update = perf_event__process_event_update,
.feature = process_feature_event,
+ .compressed = process_compressed_event,
.ordered_events = true,
.ordering_requires_timestamps = true,
},
@@ -1205,6 +1348,10 @@ int cmd_report(int argc, const char **argv)
report.session = session;
+ if (session->header.env.comp_type == PERF_COMP_ZSTD &&
+ session->header.env.comp_level)
+ report__zstd_init(session);
+
has_br_stack = perf_header__has_feat(&session->header,
HEADER_BRANCH_STACK);
@@ -1409,6 +1556,8 @@ int cmd_report(int argc, const char **argv)
error:
zfree(&report.ptime_range);
+ report__zstd_fini(session);
+
perf_session__delete(session);
return ret;
}
diff --git a/tools/perf/util/machine.c b/tools/perf/util/machine.c
index 6fcb3bce0442..66d1ed7e7a80 100644
--- a/tools/perf/util/machine.c
+++ b/tools/perf/util/machine.c
@@ -972,6 +972,10 @@ int machine__map_x86_64_entry_trampolines(struct machine *machine,
continue;
dest_map = map_groups__find(kmaps, map->pgoff);
+ if (!dest_map) {
+ pr_debug("dest_map for %lx is NULL\n", map->pgoff);
+ continue;
+ }
if (dest_map != map)
map->pgoff = dest_map->map_ip(dest_map, map->pgoff);
found = true;
diff --git a/tools/perf/util/session.c b/tools/perf/util/session.c
index 78a067777144..be717ebcdb85 100644
--- a/tools/perf/util/session.c
+++ b/tools/perf/util/session.c
@@ -296,6 +296,13 @@ static int process_event_op2_stub(struct perf_session *session __maybe_unused,
return 0;
}
+static int process_event_op4_stub(struct perf_session *session __maybe_unused,
+ union perf_event *event __maybe_unused,
+ u64 data __maybe_unused)
+{
+ dump_printf(": unhandled!\n");
+ return 0;
+}
static
int process_event_thread_map_stub(struct perf_session *session __maybe_unused,
@@ -418,6 +425,8 @@ void perf_tool__fill_defaults(struct perf_tool *tool)
tool->time_conv = process_event_op2_stub;
if (tool->feature == NULL)
tool->feature = process_event_op2_stub;
+ if (tool->compressed == NULL)
+ tool->compressed = process_event_op4_stub;
}
static void swap_sample_id_all(union perf_event *event, void *data)
@@ -1345,7 +1354,8 @@ static s64 perf_session__process_user_event(struct perf_session *session,
int fd = perf_data__fd(session->data);
int err;
- dump_event(session->evlist, event, file_offset, &sample);
+ if (event->header.type != PERF_RECORD_COMPRESSED)
+ dump_event(session->evlist, event, file_offset, &sample);
/* These events are processed right away */
switch (event->header.type) {
@@ -1398,6 +1408,11 @@ static s64 perf_session__process_user_event(struct perf_session *session,
return tool->time_conv(session, event);
case PERF_RECORD_HEADER_FEATURE:
return tool->feature(session, event);
+ case PERF_RECORD_COMPRESSED:
+ err = tool->compressed(session, event, file_offset);
+ if (err)
+ dump_event(session->evlist, event, file_offset, &sample);
+ return 0;
default:
return -EINVAL;
}
@@ -1673,6 +1688,8 @@ static int perf_session__flush_thread_stacks(struct perf_session *session)
volatile int session_done;
+static int __perf_session__process_decomp_events(struct perf_session *session);
+
static int __perf_session__process_pipe_events(struct perf_session *session)
{
struct ordered_events *oe = &session->ordered_events;
@@ -1753,6 +1770,10 @@ static int __perf_session__process_pipe_events(struct perf_session *session)
if (skip > 0)
head += skip;
+ err = __perf_session__process_decomp_events(session);
+ if (err)
+ goto out_err;
+
if (!session_done())
goto more;
done:
@@ -1801,6 +1822,38 @@ fetch_mmaped_event(struct perf_session *session,
return event;
}
+static int __perf_session__process_decomp_events(struct perf_session *session)
+{
+ s64 skip;
+ u64 size, file_pos = 0;
+ union perf_event *event;
+ struct decomp *decomp = session->decomp_last;
+
+ if (!decomp)
+ return 0;
+
+ while (decomp->head < decomp->size && !session_done()) {
+ event = fetch_mmaped_event(session, decomp->head, decomp->size, decomp->data);
+ if (!event)
+ break;
+
+ size = event->header.size;
+ if (size < sizeof(struct perf_event_header) ||
+ (skip = perf_session__process_event(session, event, file_pos)) < 0) {
+ pr_err("%#" PRIx64 " [%#x]: failed to process type: %d\n",
+ decomp->file_pos + decomp->head, event->header.size, event->header.type);
+ return -EINVAL;
+ }
+
+ if (skip)
+ size += skip;
+
+ decomp->head += size;
+ }
+
+ return 0;
+}
+
/*
* On 64bit we can mmap the data file in one go. No need for tiny mmap
* slices. On 32bit we use 32MB.
@@ -1904,6 +1957,10 @@ static int __perf_session__process_events(struct perf_session *session,
head += size;
file_pos += size;
+ err = __perf_session__process_decomp_events(session);
+ if (err)
+ goto out_err;
+
ui_progress__update(&prog, size);
if (session_done())
diff --git a/tools/perf/util/session.h b/tools/perf/util/session.h
index d96eccd7d27f..8ecda50efc6b 100644
--- a/tools/perf/util/session.h
+++ b/tools/perf/util/session.h
@@ -11,6 +11,9 @@
#include <linux/kernel.h>
#include <linux/rbtree.h>
#include <linux/perf_event.h>
+#ifdef HAVE_ZSTD_SUPPORT
+#include <zstd.h>
+#endif
struct ip_callchain;
struct symbol;
@@ -35,6 +38,19 @@ struct perf_session {
struct ordered_events ordered_events;
struct perf_data *data;
struct perf_tool *tool;
+ struct decomp *decomp;
+ struct decomp *decomp_last;
+#ifdef HAVE_ZSTD_SUPPORT
+ ZSTD_DStream *zstd_dstream;
+#endif
+};
+
+struct decomp {
+ struct decomp *next;
+ u64 file_pos;
+ u64 head;
+ size_t size;
+ char data[];
};
struct perf_tool;
diff --git a/tools/perf/util/tool.h b/tools/perf/util/tool.h
index 56e4ca54020a..65ec84dfc5eb 100644
--- a/tools/perf/util/tool.h
+++ b/tools/perf/util/tool.h
@@ -28,6 +28,7 @@ typedef int (*event_attr_op)(struct perf_tool *tool,
typedef int (*event_op2)(struct perf_session *session, union perf_event *event);
typedef s64 (*event_op3)(struct perf_session *session, union perf_event *event);
+typedef int (*event_op4)(struct perf_session *session, union perf_event *event, u64 data);
typedef int (*event_oe)(struct perf_tool *tool, union perf_event *event,
struct ordered_events *oe);
@@ -69,6 +70,7 @@ struct perf_tool {
stat,
stat_round,
feature;
+ event_op4 compressed;
event_op3 auxtrace;
bool ordered_events;
bool ordering_requires_timestamps;
^ permalink raw reply related [flat|nested] 11+ messages in thread
* Re: [PATCH v1 0/4] perf: enable compression of record mode trace to save storage space
2018-12-24 13:21 [PATCH v1 0/4] perf: enable compression of record mode trace to save storage space Alexey Budankov
` (3 preceding siblings ...)
2018-12-24 14:00 ` [PATCH v1 4/4] perf report: support record trace file decompression Alexey Budankov
@ 2019-01-09 17:28 ` Jiri Olsa
2019-01-14 8:43 ` Alexey Budankov
4 siblings, 1 reply; 11+ messages in thread
From: Jiri Olsa @ 2019-01-09 17:28 UTC (permalink / raw)
To: Alexey Budankov
Cc: Arnaldo Carvalho de Melo, Ingo Molnar, Peter Zijlstra,
Namhyung Kim, Alexander Shishkin, Andi Kleen, linux-kernel
On Mon, Dec 24, 2018 at 04:21:33PM +0300, Alexey Budankov wrote:
>
> The patch set implements runtime record trace compression accompanied by
> trace file decompression implemented in the tool report mode. Zstandard
> library API [1] is used for compression/decompression of data that come
> from perf_events kernel data buffers.
>
> Realized -z,--compression_level=n option provides ~3-5x avg. trace file
> size reduction on the tested workloads what significantly saves user's
> storage space on larger server systems where trace file size can easily
> reach several tens or even hundreds of GiBs, especially when profiling
> with stacks for later dwarf unwinding, context-switches tracing and etc.
>
> The option is effective jointly with asynchronous trace writing because
> compression requires auxiliary memory buffers to operate on and memory
> buffers for asynchronous trace writing serve that purpose.
I dont like that it's onlt for aio only, I can't really see why it's
a problem for normal data.. can't we just have one layer before and
stream the data to the compress function instead of the file (or aio
buffers).. and that compress functions would spit out 64K size COMPRESSED
events, which would go to file (or aio buffers)
the report side would process them (decompress) on the session layer
before the tool callbacks are called
jirka
^ permalink raw reply [flat|nested] 11+ messages in thread
* Re: [PATCH v1 0/4] perf: enable compression of record mode trace to save storage space
2019-01-09 17:28 ` [PATCH v1 0/4] perf: enable compression of record mode trace to save storage space Jiri Olsa
@ 2019-01-14 8:43 ` Alexey Budankov
2019-01-14 11:03 ` Jiri Olsa
0 siblings, 1 reply; 11+ messages in thread
From: Alexey Budankov @ 2019-01-14 8:43 UTC (permalink / raw)
To: Jiri Olsa
Cc: Arnaldo Carvalho de Melo, Ingo Molnar, Peter Zijlstra,
Namhyung Kim, Alexander Shishkin, Andi Kleen, linux-kernel
Hi,
On 09.01.2019 20:28, Jiri Olsa wrote:
> On Mon, Dec 24, 2018 at 04:21:33PM +0300, Alexey Budankov wrote:
>>
>> The patch set implements runtime record trace compression accompanied by
>> trace file decompression implemented in the tool report mode. Zstandard
>> library API [1] is used for compression/decompression of data that come
>> from perf_events kernel data buffers.
>>
>> Realized -z,--compression_level=n option provides ~3-5x avg. trace file
>> size reduction on the tested workloads what significantly saves user's
>> storage space on larger server systems where trace file size can easily
>> reach several tens or even hundreds of GiBs, especially when profiling
>> with stacks for later dwarf unwinding, context-switches tracing and etc.
>>
>> The option is effective jointly with asynchronous trace writing because
>> compression requires auxiliary memory buffers to operate on and memory
>> buffers for asynchronous trace writing serve that purpose.
>
> I dont like that it's onlt for aio only, I can't really see why it's
For serial streaming, on CPU bound codes, under full system utilization it
can induce more runtime overhead and increase data loss because amount of
code on performance critical path grows, of course size of written data
reduces but still. Feeding kernel buffer content by user space code to a
syscall is extended with intermediate copying to user space memory with
doing some math on it in the middle.
> a problem for normal data.. can't we just have one layer before and
> stream the data to the compress function instead of the file (or aio
> buffers).. and that compress functions would spit out 64K size COMPRESSED
> events, which would go to file (or aio buffers)
It is already almost like that. Compression could be bridged using AIO
buffers but then still streamed to file serially using record__pushfn()
and that would make some sense for moderate profiling cases on systems
without AIO support and trace streaming based on it.
>
> the report side would process them (decompress) on the session layer
> before the tool callbacks are called
It is already pretty similar to that.
Thanks,
Alexey
>
> jirka
>
^ permalink raw reply [flat|nested] 11+ messages in thread
* Re: [PATCH v1 0/4] perf: enable compression of record mode trace to save storage space
2019-01-14 8:43 ` Alexey Budankov
@ 2019-01-14 11:03 ` Jiri Olsa
2019-01-14 11:26 ` Alexey Budankov
0 siblings, 1 reply; 11+ messages in thread
From: Jiri Olsa @ 2019-01-14 11:03 UTC (permalink / raw)
To: Alexey Budankov
Cc: Arnaldo Carvalho de Melo, Ingo Molnar, Peter Zijlstra,
Namhyung Kim, Alexander Shishkin, Andi Kleen, linux-kernel
On Mon, Jan 14, 2019 at 11:43:31AM +0300, Alexey Budankov wrote:
> Hi,
> On 09.01.2019 20:28, Jiri Olsa wrote:
> > On Mon, Dec 24, 2018 at 04:21:33PM +0300, Alexey Budankov wrote:
> >>
> >> The patch set implements runtime record trace compression accompanied by
> >> trace file decompression implemented in the tool report mode. Zstandard
> >> library API [1] is used for compression/decompression of data that come
> >> from perf_events kernel data buffers.
> >>
> >> Realized -z,--compression_level=n option provides ~3-5x avg. trace file
> >> size reduction on the tested workloads what significantly saves user's
> >> storage space on larger server systems where trace file size can easily
> >> reach several tens or even hundreds of GiBs, especially when profiling
> >> with stacks for later dwarf unwinding, context-switches tracing and etc.
> >>
> >> The option is effective jointly with asynchronous trace writing because
> >> compression requires auxiliary memory buffers to operate on and memory
> >> buffers for asynchronous trace writing serve that purpose.
> >
> > I dont like that it's onlt for aio only, I can't really see why it's
>
> For serial streaming, on CPU bound codes, under full system utilization it
> can induce more runtime overhead and increase data loss because amount of
> code on performance critical path grows, of course size of written data
> reduces but still. Feeding kernel buffer content by user space code to a
> syscall is extended with intermediate copying to user space memory with
> doing some math on it in the middle.
>
> > a problem for normal data.. can't we just have one layer before and
> > stream the data to the compress function instead of the file (or aio
> > buffers).. and that compress functions would spit out 64K size COMPRESSED
> > events, which would go to file (or aio buffers)
>
> It is already almost like that. Compression could be bridged using AIO
> buffers but then still streamed to file serially using record__pushfn()
> and that would make some sense for moderate profiling cases on systems
> without AIO support and trace streaming based on it.
>
> >
> > the report side would process them (decompress) on the session layer
> > before the tool callbacks are called
>
> It is already pretty similar to that.
hum, AFAICS you do that in report code not in on the session layer
jirka
^ permalink raw reply [flat|nested] 11+ messages in thread
* Re: [PATCH v1 0/4] perf: enable compression of record mode trace to save storage space
2019-01-14 11:03 ` Jiri Olsa
@ 2019-01-14 11:26 ` Alexey Budankov
0 siblings, 0 replies; 11+ messages in thread
From: Alexey Budankov @ 2019-01-14 11:26 UTC (permalink / raw)
To: Jiri Olsa
Cc: Arnaldo Carvalho de Melo, Ingo Molnar, Peter Zijlstra,
Namhyung Kim, Alexander Shishkin, Andi Kleen, linux-kernel
On 14.01.2019 14:03, Jiri Olsa wrote:
> On Mon, Jan 14, 2019 at 11:43:31AM +0300, Alexey Budankov wrote:
>> Hi,
>> On 09.01.2019 20:28, Jiri Olsa wrote:
>>> On Mon, Dec 24, 2018 at 04:21:33PM +0300, Alexey Budankov wrote:
>>>>
>>>> buffers for asynchronous trace writing serve that purpose.
<SNIP>
>>>
>>> I dont like that it's onlt for aio only, I can't really see why it's
>>
>> For serial streaming, on CPU bound codes, under full system utilization it
>> can induce more runtime overhead and increase data loss because amount of
>> code on performance critical path grows, of course size of written data
>> reduces but still. Feeding kernel buffer content by user space code to a
>> syscall is extended with intermediate copying to user space memory with
>> doing some math on it in the middle.
>>
>>> a problem for normal data.. can't we just have one layer before and
>>> stream the data to the compress function instead of the file (or aio
>>> buffers).. and that compress functions would spit out 64K size COMPRESSED
>>> events, which would go to file (or aio buffers)
>>
>> It is already almost like that. Compression could be bridged using AIO
>> buffers but then still streamed to file serially using record__pushfn()
>> and that would make some sense for moderate profiling cases on systems
>> without AIO support and trace streaming based on it.
>>
>>>
>>> the report side would process them (decompress) on the session layer
>>> before the tool callbacks are called
>>
>> It is already pretty similar to that.
>
> hum, AFAICS you do that in report code not in on the session layer
Correct. Decompressor and handling of compressed data chunks could be
moved to session related code.
Thanks,
Alexey
>
> jirka
>
^ permalink raw reply [flat|nested] 11+ messages in thread