All of lore.kernel.org
 help / color / mirror / Atom feed
* [PATCH 00/46] perf tools: Fix and improvements (bpf and overwrite)
@ 2016-02-26  9:31 Wang Nan
  2016-02-26  9:31 ` [PATCH 01/46] perf tools: Record text offset in dso to calculate objdump address Wang Nan
                   ` (45 more replies)
  0 siblings, 46 replies; 60+ messages in thread
From: Wang Nan @ 2016-02-26  9:31 UTC (permalink / raw)
  To: Alexei Starovoitov, Arnaldo Carvalho de Melo, Arnaldo Carvalho de Melo
  Cc: Jiri Olsa, Li Zefan, Peter Zijlstra, pi3orama, Wang Nan, linux-kernel

Hi Arnaldo,

The following changes since commit bb109acc4adeae425147ca87b84d312ea40f24f1:

  perf tools: Fix parsing of pmu events with empty list of modifiers (2016-02-25 10:56:21 -0300)

are available in the git repository at:

  https://git.kernel.org/pub/scm/linux/kernel/git/pi3orama/linux.git tags/perf-core-for-acme

for you to fetch changes up to 29b1d94f369d1e7b16416e8944489430073be01d:

  perf tools: Don't warn about out of order event if write_backward is used (2016-02-26 09:11:20 +0000)

----------------------------------------------------------------
 - BPF related: 'perf trace' support bpf-output event

 - perf overwrite support: Code improvements based on Jiri's suggestion

Signed-off-by: Wang Nan <wangnan0@huawei.com>

----------------------------------------------------------------
Wang Nan (46):
      perf tools: Record text offset in dso to calculate objdump address
      perf tools: Adjust symbol for shared objects
      perf config: Bring perf_default_config to the very beginning at main()
      perf trace: Improve error message when receive non-tracepoint events
      perf tools: Only set filter for tracepoints events
      perf trace: Call bpf__apply_obj_config in 'perf trace'
      perf trace: Print content of bpf-output event
      perf data: Support converting data from bpf_perf_event_output()
      perf data: Explicitly set byte order for integer types
      perf core: Introduce new ioctl options to pause and resume ring buffer
      perf core: Set event's default overflow_handler
      perf core: Prepare writing into ring buffer from end
      perf core: Add backward attribute to perf event
      perf core: Reduce perf event output overhead by new overflow handler
      perf tools: Only validate is_pos for tracking evsels
      perf tools: Print write_backward value in perf_event_attr__fprintf
      perf tools: Make ordered_events reusable
      perf record: Use WARN_ONCE to replace 'if' condition
      perf record: Extract synthesize code to record__synthesize()
      perf tools: Add perf_data_file__switch() helper
      perf record: Turns auxtrace_snapshot_enable into 3 states
      perf record: Introduce record__finish_output() to finish a perf.data
      perf record: Add '--timestamp-filename' option to append timestamp to output filename
      perf record: Split output into multiple files via '--switch-output'
      perf record: Force enable --timestamp-filename when --switch-output is provided
      perf record: Disable buildid cache options by default in switch output mode
      perf record: Re-synthesize tracking events after output switching
      perf record: Generate tracking events for process forked by perf
      perf record: Ensure return non-zero rc when mmap fail
      perf record: Prevent reading invalid data in record__mmap_read
      perf tools: Add evlist channel helpers
      perf tools: Automatically add new channel according to evlist
      perf tools: Operate multiple channels
      perf tools: Squash overwrite setting into channel
      perf record: Don't read from and poll overwrite channel
      perf record: Don't poll on overwrite channel
      perf tools: Detect avalibility of write_backward
      perf tools: Enable overwrite settings
      perf tools: Set write_backward attribut bit for overwrite events
      perf tools: Record fd into perf_mmap
      perf tools: Add API to pause a channel
      perf record: Toggle overwrite ring buffer for reading
      perf record: Rename variable to make code clear
      perf record: Read from backward ring buffer
      perf record: Allow generate tracking events at the end of output
      perf tools: Don't warn about out of order event if write_backward is used

 include/linux/perf_event.h         |  22 +-
 include/uapi/linux/perf_event.h    |   4 +-
 kernel/events/core.c               |  73 ++++-
 kernel/events/internal.h           |  11 +
 kernel/events/ring_buffer.c        |  63 +++-
 tools/perf/builtin-diff.c          |   2 -
 tools/perf/builtin-help.c          |   2 +-
 tools/perf/builtin-kmem.c          |   4 +-
 tools/perf/builtin-record.c        | 602 +++++++++++++++++++++++++++++++------
 tools/perf/builtin-report.c        |   2 +-
 tools/perf/builtin-top.c           |   4 +-
 tools/perf/builtin-trace.c         |  69 ++++-
 tools/perf/perf.c                  |   2 +
 tools/perf/perf.h                  |   2 +
 tools/perf/tests/llvm.c            |   8 -
 tools/perf/util/color.c            |   5 +-
 tools/perf/util/data-convert-bt.c  | 120 +++++++-
 tools/perf/util/data.c             |  38 +++
 tools/perf/util/data.h             |  11 +-
 tools/perf/util/dso.h              |   1 +
 tools/perf/util/evlist.c           | 342 ++++++++++++++++++---
 tools/perf/util/evlist.h           |  67 ++++-
 tools/perf/util/evsel.c            |  18 ++
 tools/perf/util/evsel.h            |   3 +
 tools/perf/util/help-unknown-cmd.c |   5 +-
 tools/perf/util/map.c              |  14 +
 tools/perf/util/ordered-events.c   |   9 +
 tools/perf/util/ordered-events.h   |   1 +
 tools/perf/util/parse-events.c     |  14 +
 tools/perf/util/parse-events.h     |   2 +
 tools/perf/util/parse-events.l     |   2 +
 tools/perf/util/record.c           |  11 +
 tools/perf/util/session.c          |  28 +-
 tools/perf/util/symbol-elf.c       |  25 +-
 34 files changed, 1385 insertions(+), 201 deletions(-)

^ permalink raw reply	[flat|nested] 60+ messages in thread

* [PATCH 01/46] perf tools: Record text offset in dso to calculate objdump address
  2016-02-26  9:31 [PATCH 00/46] perf tools: Fix and improvements (bpf and overwrite) Wang Nan
@ 2016-02-26  9:31 ` Wang Nan
  2016-03-24  7:37   ` [tip:perf/urgent] perf symbols: " tip-bot for Wang Nan
  2016-02-26  9:31 ` [PATCH 02/46] perf tools: Adjust symbol for shared objects Wang Nan
                   ` (44 subsequent siblings)
  45 siblings, 1 reply; 60+ messages in thread
From: Wang Nan @ 2016-02-26  9:31 UTC (permalink / raw)
  To: Alexei Starovoitov, Arnaldo Carvalho de Melo, Arnaldo Carvalho de Melo
  Cc: Jiri Olsa, Li Zefan, Peter Zijlstra, pi3orama, Wang Nan,
	linux-kernel, Adrian Hunter, Cody P Schafer, He Kuang,
	Kirill Smelkov, Masami Hiramatsu, Namhyung Kim

In this patch, the offset of '.text' section is stored into dso
and used here to re-calculate address to objdump.

In most of the cases, executable code is in '.text' section, so the
adjustment made to a symbol in dso__load_sym (using
sym.st_value -= shdr.sh_addr - shdr.sh_offset) should equal to
'sym.st_value -= dso->text_offset'. Therefore, adding text_offset back
get objdump address from symbol address (rip). However, it is not true
for kernel and kernel module since there could be multiple executable
sections with different offset. Exclude kernel for this reason.

After this patch, even dso->adjust_symbols is set to true for shared
objects, map__rip_2objdump() and map__objdump_2mem() would return
correct result, so perf behavior of annotate won't be changed.

Signed-off-by: Wang Nan <wangnan0@huawei.com>
Cc: Arnaldo Carvalho de Melo <acme@redhat.com>
Cc: Adrian Hunter <adrian.hunter@intel.com>
Cc: Cody P Schafer <dev@codyps.com>
Cc: He Kuang <hekuang@huawei.com>
Cc: Jiri Olsa <jolsa@kernel.org>
Cc: Kirill Smelkov <kirr@nexedi.com>
Cc: Masami Hiramatsu <masami.hiramatsu.pt@hitachi.com>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Li Zefan <lizefan@huawei.com>
Cc: pi3orama@163.com
---
 tools/perf/util/dso.h        |  1 +
 tools/perf/util/map.c        | 14 ++++++++++++++
 tools/perf/util/symbol-elf.c | 12 ++++++------
 3 files changed, 21 insertions(+), 6 deletions(-)

diff --git a/tools/perf/util/dso.h b/tools/perf/util/dso.h
index 45ec4d0..ef3dbc9 100644
--- a/tools/perf/util/dso.h
+++ b/tools/perf/util/dso.h
@@ -162,6 +162,7 @@ struct dso {
 	u8		 loaded;
 	u8		 rel;
 	u8		 build_id[BUILD_ID_SIZE];
+	u64		 text_offset;
 	const char	 *short_name;
 	const char	 *long_name;
 	u16		 long_name_len;
diff --git a/tools/perf/util/map.c b/tools/perf/util/map.c
index 171b6d1..02c3186 100644
--- a/tools/perf/util/map.c
+++ b/tools/perf/util/map.c
@@ -431,6 +431,13 @@ u64 map__rip_2objdump(struct map *map, u64 rip)
 	if (map->dso->rel)
 		return rip - map->pgoff;
 
+	/*
+	 * kernel modules also have DSO_TYPE_USER in dso->kernel,
+	 * but all kernel modules are ET_REL, so won't get here.
+	 */
+	if (map->dso->kernel == DSO_TYPE_USER)
+		return rip + map->dso->text_offset;
+
 	return map->unmap_ip(map, rip) - map->reloc;
 }
 
@@ -454,6 +461,13 @@ u64 map__objdump_2mem(struct map *map, u64 ip)
 	if (map->dso->rel)
 		return map->unmap_ip(map, ip + map->pgoff);
 
+	/*
+	 * kernel modules also have DSO_TYPE_USER in dso->kernel,
+	 * but all kernel modules are ET_REL, so won't get here.
+	 */
+	if (map->dso->kernel == DSO_TYPE_USER)
+		return map->unmap_ip(map, ip - map->dso->text_offset);
+
 	return ip + map->reloc;
 }
 
diff --git a/tools/perf/util/symbol-elf.c b/tools/perf/util/symbol-elf.c
index b1dd68f..bc229a7 100644
--- a/tools/perf/util/symbol-elf.c
+++ b/tools/perf/util/symbol-elf.c
@@ -793,6 +793,7 @@ int dso__load_sym(struct dso *dso, struct map *map,
 	uint32_t idx;
 	GElf_Ehdr ehdr;
 	GElf_Shdr shdr;
+	GElf_Shdr tshdr;
 	Elf_Data *syms, *opddata = NULL;
 	GElf_Sym sym;
 	Elf_Scn *sec, *sec_strndx;
@@ -832,6 +833,9 @@ int dso__load_sym(struct dso *dso, struct map *map,
 	sec = syms_ss->symtab;
 	shdr = syms_ss->symshdr;
 
+	if (elf_section_by_name(elf, &ehdr, &tshdr, ".text", NULL))
+		dso->text_offset = tshdr.sh_addr - tshdr.sh_offset;
+
 	if (runtime_ss->opdsec)
 		opddata = elf_rawdata(runtime_ss->opdsec, NULL);
 
@@ -880,12 +884,8 @@ int dso__load_sym(struct dso *dso, struct map *map,
 	 * Handle any relocation of vdso necessary because older kernels
 	 * attempted to prelink vdso to its virtual address.
 	 */
-	if (dso__is_vdso(dso)) {
-		GElf_Shdr tshdr;
-
-		if (elf_section_by_name(elf, &ehdr, &tshdr, ".text", NULL))
-			map->reloc = map->start - tshdr.sh_addr + tshdr.sh_offset;
-	}
+	if (dso__is_vdso(dso))
+		map->reloc = map->start - dso->text_offset;
 
 	dso->adjust_symbols = runtime_ss->adjust_symbols || ref_reloc(kmap);
 	/*
-- 
1.8.3.4

^ permalink raw reply related	[flat|nested] 60+ messages in thread

* [PATCH 02/46] perf tools: Adjust symbol for shared objects
  2016-02-26  9:31 [PATCH 00/46] perf tools: Fix and improvements (bpf and overwrite) Wang Nan
  2016-02-26  9:31 ` [PATCH 01/46] perf tools: Record text offset in dso to calculate objdump address Wang Nan
@ 2016-02-26  9:31 ` Wang Nan
  2016-02-26  9:31 ` [PATCH 03/46] perf config: Bring perf_default_config to the very beginning at main() Wang Nan
                   ` (43 subsequent siblings)
  45 siblings, 0 replies; 60+ messages in thread
From: Wang Nan @ 2016-02-26  9:31 UTC (permalink / raw)
  To: Alexei Starovoitov, Arnaldo Carvalho de Melo, Arnaldo Carvalho de Melo
  Cc: Jiri Olsa, Li Zefan, Peter Zijlstra, pi3orama, Wang Nan,
	linux-kernel, Adrian Hunter, Cody P Schafer, He Kuang,
	Kirill Smelkov, Masami Hiramatsu, Namhyung Kim

He Kuang reported a problem that perf fails to get correct symbol on
Android platform in [1]. The problem can be reproduced on normal x86_64
platform. I will describe the reproducing steps in detail at the end of
commit message.

The reason of this problem is the missing of symbol adjustment for normal
shared objects. In most of the cases it is okay skipping adjustment. However,
the result is wrong when '.text' section have different 'address' and 'offset'.
I checked all shared objects in my working platform, only wine dll objects and
debug objects (in .debug) have this problem. However, it is common on Android.
For example:

 $ readelf -S ./libsurfaceflinger.so | grep \.text
   [10] .text             PROGBITS         0000000000029030  00012030

This patch enables symbol adjustment for dynamic objects so the symbol
address got from elfutils would be adjusted correctly.

Now nearly all type of ELF file should adjust symbols. Makes
ss->adjust_symbols default to true.

Steps to reproduce the problem:

 $ cat ./Makefile
PWD := $(shell pwd)
LDFLAGS += "-Wl,-rpath=$(PWD)"
CFLAGS += -g
main: main.c libbuggy.so
libbuggy.so: buggy.c
	gcc -g -shared -fPIC -Wl,-Ttext-segment=0x200000 $< -o $@
clean:
	rm -rf main libbuggy.so *.o

 $ cat ./buggy.c
 int fib(int x)
 {
     return (x == 0) ? 1 : (x == 1) ? 1 : fib(x - 1) + fib(x - 2);
 }

 $ cat ./main.c
 #include <stdio.h>

 extern int fib(int x);
 int main()
 {
     int i;

     for (i = 0; i < 40; i++)
         printf("%d\n", fib(i));
     return 0;
 }

 $ make
 $ perf record ./main
 ...
 $ perf report --stdio
 # Overhead  Command  Shared Object      Symbol
 # ........  .......  .................  ...............................
 #
     14.97%  main     libbuggy.so        [.] 0x000000000000066c
      8.68%  main     libbuggy.so        [.] 0x00000000000006aa
      8.52%  main     libbuggy.so        [.] fib@plt
      7.95%  main     libbuggy.so        [.] 0x0000000000000664
      5.94%  main     libbuggy.so        [.] 0x00000000000006a9
      5.35%  main     libbuggy.so        [.] 0x0000000000000678
 ...

The correct result should be (after this patch):

 # Overhead  Command  Shared Object      Symbol
 # ........  .......  .................  ...............................
 #
     91.47%  main     libbuggy.so        [.] fib
      8.52%  main     libbuggy.so        [.] fib@plt
      0.00%  main     [kernel.kallsyms]  [k] kmem_cache_free

[1] http://lkml.kernel.org/g/1452567507-54013-1-git-send-email-hekuang@huawei.com

Signed-off-by: Wang Nan <wangnan0@huawei.com>
Cc: Adrian Hunter <adrian.hunter@intel.com>
Cc: Arnaldo Carvalho de Melo <acme@redhat.com>
Cc: Cody P Schafer <dev@codyps.com>
Cc: He Kuang <hekuang@huawei.com>
Cc: Jiri Olsa <jolsa@kernel.org>
Cc: Kirill Smelkov <kirr@nexedi.com>
Cc: Masami Hiramatsu <masami.hiramatsu.pt@hitachi.com>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Li Zefan <lizefan@huawei.com>
Cc: pi3orama@163.com
---
 tools/perf/util/symbol-elf.c | 13 +++----------
 1 file changed, 3 insertions(+), 10 deletions(-)

diff --git a/tools/perf/util/symbol-elf.c b/tools/perf/util/symbol-elf.c
index bc229a7..3f9d679 100644
--- a/tools/perf/util/symbol-elf.c
+++ b/tools/perf/util/symbol-elf.c
@@ -709,17 +709,10 @@ int symsrc__init(struct symsrc *ss, struct dso *dso, const char *name,
 	if (ss->opdshdr.sh_type != SHT_PROGBITS)
 		ss->opdsec = NULL;
 
-	if (dso->kernel == DSO_TYPE_USER) {
-		GElf_Shdr shdr;
-		ss->adjust_symbols = (ehdr.e_type == ET_EXEC ||
-				ehdr.e_type == ET_REL ||
-				dso__is_vdso(dso) ||
-				elf_section_by_name(elf, &ehdr, &shdr,
-						     ".gnu.prelink_undo",
-						     NULL) != NULL);
-	} else {
+	if (dso->kernel == DSO_TYPE_USER)
+		ss->adjust_symbols = true;
+	else
 		ss->adjust_symbols = elf__needs_adjust_symbols(ehdr);
-	}
 
 	ss->name   = strdup(name);
 	if (!ss->name) {
-- 
1.8.3.4

^ permalink raw reply related	[flat|nested] 60+ messages in thread

* [PATCH 03/46] perf config: Bring perf_default_config to the very beginning at main()
  2016-02-26  9:31 [PATCH 00/46] perf tools: Fix and improvements (bpf and overwrite) Wang Nan
  2016-02-26  9:31 ` [PATCH 01/46] perf tools: Record text offset in dso to calculate objdump address Wang Nan
  2016-02-26  9:31 ` [PATCH 02/46] perf tools: Adjust symbol for shared objects Wang Nan
@ 2016-02-26  9:31 ` Wang Nan
  2016-02-27  9:44   ` [tip:perf/core] " tip-bot for Wang Nan
  2016-02-26  9:31 ` [PATCH 04/46] perf trace: Improve error message when receive non-tracepoint events Wang Nan
                   ` (42 subsequent siblings)
  45 siblings, 1 reply; 60+ messages in thread
From: Wang Nan @ 2016-02-26  9:31 UTC (permalink / raw)
  To: Alexei Starovoitov, Arnaldo Carvalho de Melo, Arnaldo Carvalho de Melo
  Cc: Jiri Olsa, Li Zefan, Peter Zijlstra, pi3orama, Wang Nan,
	linux-kernel, Masami Hiramatsu, Namhyung Kim

Before this patch each subcommand calls perf_config() by themselvs, set
default configuration together with subcommand specific options. If a
subcommand doesn't have it own options, it needs to call
'perf_config(perf_default_config, NULL)' to ensure .perfconfig is
loaded.

This patch brings perf_config(perf_default_config, NULL) to the
very beginning of main(), so subcommand don't need to consider
default options.

After this patch, 'llvm.clang-path' works for 'perf trace'.

Signed-off-by: Wang Nan <wangnan0@huawei.com>
Cc: Arnaldo Carvalho de Melo <acme@redhat.com>
Cc: Jiri Olsa <jolsa@kernel.org>
Cc: Masami Hiramatsu <masami.hiramatsu.pt@hitachi.com>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Li Zefan <lizefan@huawei.com>
Cc: pi3orama@163.com
---
 tools/perf/builtin-diff.c          | 2 --
 tools/perf/builtin-help.c          | 2 +-
 tools/perf/builtin-kmem.c          | 4 ++--
 tools/perf/builtin-report.c        | 2 +-
 tools/perf/builtin-top.c           | 4 ++--
 tools/perf/perf.c                  | 2 ++
 tools/perf/tests/llvm.c            | 8 --------
 tools/perf/util/color.c            | 5 +++--
 tools/perf/util/data-convert-bt.c  | 2 +-
 tools/perf/util/help-unknown-cmd.c | 5 +++--
 10 files changed, 15 insertions(+), 21 deletions(-)

diff --git a/tools/perf/builtin-diff.c b/tools/perf/builtin-diff.c
index 36ccc2b..4d72359 100644
--- a/tools/perf/builtin-diff.c
+++ b/tools/perf/builtin-diff.c
@@ -1264,8 +1264,6 @@ int cmd_diff(int argc, const char **argv, const char *prefix __maybe_unused)
 	if (ret < 0)
 		return ret;
 
-	perf_config(perf_default_config, NULL);
-
 	argc = parse_options(argc, argv, options, diff_usage, 0);
 
 	if (symbol__init(NULL) < 0)
diff --git a/tools/perf/builtin-help.c b/tools/perf/builtin-help.c
index f4dd2b4..49d55e2 100644
--- a/tools/perf/builtin-help.c
+++ b/tools/perf/builtin-help.c
@@ -272,7 +272,7 @@ static int perf_help_config(const char *var, const char *value, void *cb)
 	if (!prefixcmp(var, "man."))
 		return add_man_viewer_info(var, value);
 
-	return perf_default_config(var, value, cb);
+	return 0;
 }
 
 static struct cmdnames main_cmds, other_cmds;
diff --git a/tools/perf/builtin-kmem.c b/tools/perf/builtin-kmem.c
index 1180105..4d3340c 100644
--- a/tools/perf/builtin-kmem.c
+++ b/tools/perf/builtin-kmem.c
@@ -1834,7 +1834,7 @@ static int __cmd_record(int argc, const char **argv)
 	return cmd_record(i, rec_argv, NULL);
 }
 
-static int kmem_config(const char *var, const char *value, void *cb)
+static int kmem_config(const char *var, const char *value, void *cb __maybe_unused)
 {
 	if (!strcmp(var, "kmem.default")) {
 		if (!strcmp(value, "slab"))
@@ -1847,7 +1847,7 @@ static int kmem_config(const char *var, const char *value, void *cb)
 		return 0;
 	}
 
-	return perf_default_config(var, value, cb);
+	return 0;
 }
 
 int cmd_kmem(int argc, const char **argv, const char *prefix __maybe_unused)
diff --git a/tools/perf/builtin-report.c b/tools/perf/builtin-report.c
index f4d8244..7eea49f 100644
--- a/tools/perf/builtin-report.c
+++ b/tools/perf/builtin-report.c
@@ -90,7 +90,7 @@ static int report__config(const char *var, const char *value, void *cb)
 		return 0;
 	}
 
-	return perf_default_config(var, value, cb);
+	return 0;
 }
 
 static int hist_iter__report_callback(struct hist_entry_iter *iter,
diff --git a/tools/perf/builtin-top.c b/tools/perf/builtin-top.c
index b86b623..94af190 100644
--- a/tools/perf/builtin-top.c
+++ b/tools/perf/builtin-top.c
@@ -1065,7 +1065,7 @@ parse_callchain_opt(const struct option *opt, const char *arg, int unset)
 	return parse_callchain_top_opt(arg);
 }
 
-static int perf_top_config(const char *var, const char *value, void *cb)
+static int perf_top_config(const char *var, const char *value, void *cb __maybe_unused)
 {
 	if (!strcmp(var, "top.call-graph"))
 		var = "call-graph.record-mode"; /* fall-through */
@@ -1074,7 +1074,7 @@ static int perf_top_config(const char *var, const char *value, void *cb)
 		return 0;
 	}
 
-	return perf_default_config(var, value, cb);
+	return 0;
 }
 
 static int
diff --git a/tools/perf/perf.c b/tools/perf/perf.c
index f6321194..aaee0a7 100644
--- a/tools/perf/perf.c
+++ b/tools/perf/perf.c
@@ -548,6 +548,8 @@ int main(int argc, const char **argv)
 
 	srandom(time(NULL));
 
+	perf_config(perf_default_config, NULL);
+
 	/* get debugfs/tracefs mount point from /proc/mounts */
 	tracing_path_mount();
 
diff --git a/tools/perf/tests/llvm.c b/tools/perf/tests/llvm.c
index 70edcdf..cff564f 100644
--- a/tools/perf/tests/llvm.c
+++ b/tools/perf/tests/llvm.c
@@ -6,12 +6,6 @@
 #include "tests.h"
 #include "debug.h"
 
-static int perf_config_cb(const char *var, const char *val,
-			  void *arg __maybe_unused)
-{
-	return perf_default_config(var, val, arg);
-}
-
 #ifdef HAVE_LIBBPF_SUPPORT
 static int test__bpf_parsing(void *obj_buf, size_t obj_buf_sz)
 {
@@ -77,8 +71,6 @@ test_llvm__fetch_bpf_obj(void **p_obj_buf,
 	if (should_load_fail)
 		*should_load_fail = bpf_source_table[idx].should_load_fail;
 
-	perf_config(perf_config_cb, NULL);
-
 	/*
 	 * Skip this test if user's .perfconfig doesn't set [llvm] section
 	 * and clang is not found in $PATH, and this is not perf test -v
diff --git a/tools/perf/util/color.c b/tools/perf/util/color.c
index e5fb88b..43e84aa 100644
--- a/tools/perf/util/color.c
+++ b/tools/perf/util/color.c
@@ -32,14 +32,15 @@ int perf_config_colorbool(const char *var, const char *value, int stdout_is_tty)
 	return 0;
 }
 
-int perf_color_default_config(const char *var, const char *value, void *cb)
+int perf_color_default_config(const char *var, const char *value,
+			      void *cb __maybe_unused)
 {
 	if (!strcmp(var, "color.ui")) {
 		perf_use_color_default = perf_config_colorbool(var, value, -1);
 		return 0;
 	}
 
-	return perf_default_config(var, value, cb);
+	return 0;
 }
 
 static int __color_vsnprintf(char *bf, size_t size, const char *color,
diff --git a/tools/perf/util/data-convert-bt.c b/tools/perf/util/data-convert-bt.c
index b722e57..6729f4d 100644
--- a/tools/perf/util/data-convert-bt.c
+++ b/tools/perf/util/data-convert-bt.c
@@ -1117,7 +1117,7 @@ static int convert__config(const char *var, const char *value, void *cb)
 		return 0;
 	}
 
-	return perf_default_config(var, value, cb);
+	return 0;
 }
 
 int bt_convert__perf2ctf(const char *input, const char *path, bool force)
diff --git a/tools/perf/util/help-unknown-cmd.c b/tools/perf/util/help-unknown-cmd.c
index dc1e41c..43a98a4 100644
--- a/tools/perf/util/help-unknown-cmd.c
+++ b/tools/perf/util/help-unknown-cmd.c
@@ -6,7 +6,8 @@
 static int autocorrect;
 static struct cmdnames aliases;
 
-static int perf_unknown_cmd_config(const char *var, const char *value, void *cb)
+static int perf_unknown_cmd_config(const char *var, const char *value,
+				   void *cb __maybe_unused)
 {
 	if (!strcmp(var, "help.autocorrect"))
 		autocorrect = perf_config_int(var,value);
@@ -14,7 +15,7 @@ static int perf_unknown_cmd_config(const char *var, const char *value, void *cb)
 	if (!prefixcmp(var, "alias."))
 		add_cmdname(&aliases, var + 6, strlen(var + 6));
 
-	return perf_default_config(var, value, cb);
+	return 0;
 }
 
 static int levenshtein_compare(const void *p1, const void *p2)
-- 
1.8.3.4

^ permalink raw reply related	[flat|nested] 60+ messages in thread

* [PATCH 04/46] perf trace: Improve error message when receive non-tracepoint events
  2016-02-26  9:31 [PATCH 00/46] perf tools: Fix and improvements (bpf and overwrite) Wang Nan
                   ` (2 preceding siblings ...)
  2016-02-26  9:31 ` [PATCH 03/46] perf config: Bring perf_default_config to the very beginning at main() Wang Nan
@ 2016-02-26  9:31 ` Wang Nan
  2016-02-26  9:31 ` [PATCH 05/46] perf tools: Only set filter for tracepoints events Wang Nan
                   ` (41 subsequent siblings)
  45 siblings, 0 replies; 60+ messages in thread
From: Wang Nan @ 2016-02-26  9:31 UTC (permalink / raw)
  To: Alexei Starovoitov, Arnaldo Carvalho de Melo, Arnaldo Carvalho de Melo
  Cc: Jiri Olsa, Li Zefan, Peter Zijlstra, pi3orama, Wang Nan, linux-kernel

Before this patch, strange error message is provided if passed a
non-tracepoint event to 'perf trace':

 # perf trace -a  --ev cycles sleep 1
 Failed to set filter "common_pid != 27500" on event cycles with 22 (Invalid argument)

This is because 'perf trace' accepts all valid event during cmdline
parsing, but since it need setting filter to events, actually user
can pass tracepoints with '--ev' only.

This patch validate evlist, report error earlier:

 # ./perf trace -a  --ev cycles sleep 1
 Only support tracepoint events!

Signed-off-by: Wang Nan <wangnan0@huawei.com>
Cc: Arnaldo Carvalho de Melo <acme@redhat.com>
Cc: Jiri Olsa <jolsa@kernel.org>
Cc: Li Zefan <lizefan@huawei.com>
Cc: pi3orama@163.com
---
 tools/perf/builtin-trace.c | 16 ++++++++++++++++
 1 file changed, 16 insertions(+)

diff --git a/tools/perf/builtin-trace.c b/tools/perf/builtin-trace.c
index 20916dd..9b6e7c4 100644
--- a/tools/perf/builtin-trace.c
+++ b/tools/perf/builtin-trace.c
@@ -2525,6 +2525,17 @@ out_enomem:
 	goto out;
 }
 
+static int validate_evlist(struct perf_evlist *evlist)
+{
+	struct perf_evsel *evsel;
+
+	evlist__for_each(evlist, evsel) {
+		if (evsel->attr.type != PERF_TYPE_TRACEPOINT)
+			return -EINVAL;
+	}
+	return 0;
+}
+
 static int trace__run(struct trace *trace, int argc, const char **argv)
 {
 	struct perf_evlist *evlist = trace->evlist;
@@ -3111,6 +3122,11 @@ int cmd_trace(int argc, const char **argv, const char *prefix __maybe_unused)
 	argc = parse_options_subcommand(argc, argv, trace_options, trace_subcommands,
 				 trace_usage, PARSE_OPT_STOP_AT_NON_OPTION);
 
+	if (validate_evlist(trace.evlist)) {
+		pr_err("Only support tracepoint events!\n");
+		return -EINVAL;
+	}
+
 	if (trace.trace_pgfaults) {
 		trace.opts.sample_address = true;
 		trace.opts.sample_time = true;
-- 
1.8.3.4

^ permalink raw reply related	[flat|nested] 60+ messages in thread

* [PATCH 05/46] perf tools: Only set filter for tracepoints events
  2016-02-26  9:31 [PATCH 00/46] perf tools: Fix and improvements (bpf and overwrite) Wang Nan
                   ` (3 preceding siblings ...)
  2016-02-26  9:31 ` [PATCH 04/46] perf trace: Improve error message when receive non-tracepoint events Wang Nan
@ 2016-02-26  9:31 ` Wang Nan
  2016-02-27  9:45   ` [tip:perf/core] " tip-bot for Wang Nan
  2016-02-26  9:31 ` [PATCH 06/46] perf trace: Call bpf__apply_obj_config in 'perf trace' Wang Nan
                   ` (40 subsequent siblings)
  45 siblings, 1 reply; 60+ messages in thread
From: Wang Nan @ 2016-02-26  9:31 UTC (permalink / raw)
  To: Alexei Starovoitov, Arnaldo Carvalho de Melo, Arnaldo Carvalho de Melo
  Cc: Jiri Olsa, Li Zefan, Peter Zijlstra, pi3orama, Wang Nan, linux-kernel

perf_evlist__set_filter() tries to set filter to every evsels linked in
the evlist. However, since filters can only be applied to tracepoints,
checking type of evsel before calling perf_evsel__set_filter() would be
better.

Signed-off-by: Wang Nan <wangnan0@huawei.com>
Cc: Arnaldo Carvalho de Melo <acme@redhat.com>
Cc: Jiri Olsa <jolsa@kernel.org>
Cc: Li Zefan <lizefan@huawei.com>
Cc: pi3orama@163.com
---
 tools/perf/util/evlist.c | 3 +++
 1 file changed, 3 insertions(+)

diff --git a/tools/perf/util/evlist.c b/tools/perf/util/evlist.c
index c42e196..86a0383 100644
--- a/tools/perf/util/evlist.c
+++ b/tools/perf/util/evlist.c
@@ -1223,6 +1223,9 @@ int perf_evlist__set_filter(struct perf_evlist *evlist, const char *filter)
 	int err = 0;
 
 	evlist__for_each(evlist, evsel) {
+		if (evsel->attr.type != PERF_TYPE_TRACEPOINT)
+			continue;
+
 		err = perf_evsel__set_filter(evsel, filter);
 		if (err)
 			break;
-- 
1.8.3.4

^ permalink raw reply related	[flat|nested] 60+ messages in thread

* [PATCH 06/46] perf trace: Call bpf__apply_obj_config in 'perf trace'
  2016-02-26  9:31 [PATCH 00/46] perf tools: Fix and improvements (bpf and overwrite) Wang Nan
                   ` (4 preceding siblings ...)
  2016-02-26  9:31 ` [PATCH 05/46] perf tools: Only set filter for tracepoints events Wang Nan
@ 2016-02-26  9:31 ` Wang Nan
  2016-02-27  9:45   ` [tip:perf/core] " tip-bot for Wang Nan
  2016-02-26  9:31 ` [PATCH 07/46] perf trace: Print content of bpf-output event Wang Nan
                   ` (39 subsequent siblings)
  45 siblings, 1 reply; 60+ messages in thread
From: Wang Nan @ 2016-02-26  9:31 UTC (permalink / raw)
  To: Alexei Starovoitov, Arnaldo Carvalho de Melo, Arnaldo Carvalho de Melo
  Cc: Jiri Olsa, Li Zefan, Peter Zijlstra, pi3orama, Wang Nan, linux-kernel

Without this patch BPF map configuration is not applied.

Command like this:
 # ./perf trace --ev bpf-output/no-inherit,name=evt/ \
                --ev ./test_bpf_trace.c/map:channel.event=evt/ \
                usleep 100000

Load BPF files without error, but since map:channel.event=evt is not
applied, bpf-output event not work.

This patch allows 'perf trace' load and run BPF scripts.

Signed-off-by: Wang Nan <wangnan0@huawei.com>
Cc: Arnaldo Carvalho de Melo <acme@redhat.com>
Cc: Jiri Olsa <jolsa@kernel.org>
Cc: Li Zefan <lizefan@huawei.com>
Cc: pi3orama@163.com
---
 tools/perf/builtin-trace.c | 11 +++++++++++
 1 file changed, 11 insertions(+)

diff --git a/tools/perf/builtin-trace.c b/tools/perf/builtin-trace.c
index 9b6e7c4..32392c4 100644
--- a/tools/perf/builtin-trace.c
+++ b/tools/perf/builtin-trace.c
@@ -33,6 +33,7 @@
 #include "util/stat.h"
 #include "trace-event.h"
 #include "util/parse-events.h"
+#include "util/bpf-loader.h"
 
 #include <libaudit.h>
 #include <stdlib.h>
@@ -2597,6 +2598,16 @@ static int trace__run(struct trace *trace, int argc, const char **argv)
 	if (err < 0)
 		goto out_error_open;
 
+	err = bpf__apply_obj_config();
+	if (err) {
+		char errbuf[BUFSIZ];
+
+		bpf__strerror_apply_obj_config(err, errbuf, sizeof(errbuf));
+		pr_err("ERROR: Apply config to BPF failed: %s\n",
+			 errbuf);
+		goto out_error_open;
+	}
+
 	/*
 	 * Better not use !target__has_task() here because we need to cover the
 	 * case where no threads were specified in the command line, but a
-- 
1.8.3.4

^ permalink raw reply related	[flat|nested] 60+ messages in thread

* [PATCH 07/46] perf trace: Print content of bpf-output event
  2016-02-26  9:31 [PATCH 00/46] perf tools: Fix and improvements (bpf and overwrite) Wang Nan
                   ` (5 preceding siblings ...)
  2016-02-26  9:31 ` [PATCH 06/46] perf trace: Call bpf__apply_obj_config in 'perf trace' Wang Nan
@ 2016-02-26  9:31 ` Wang Nan
  2016-02-27  9:45   ` [tip:perf/core] " tip-bot for Wang Nan
  2016-02-26  9:31 ` [PATCH 08/46] perf data: Support converting data from bpf_perf_event_output() Wang Nan
                   ` (38 subsequent siblings)
  45 siblings, 1 reply; 60+ messages in thread
From: Wang Nan @ 2016-02-26  9:31 UTC (permalink / raw)
  To: Alexei Starovoitov, Arnaldo Carvalho de Melo, Arnaldo Carvalho de Melo
  Cc: Jiri Olsa, Li Zefan, Peter Zijlstra, pi3orama, Wang Nan, linux-kernel

With this patch the contend of BPF output event is printed by
'perf trace'. For example:

 # ./perf trace -a --ev bpf-output/no-inherit,name=evt/ \
                   --ev ./test_bpf_trace.c/map:channel.event=evt/ \
                   usleep 100000
  ...
    1.787 ( 0.004 ms): usleep/3832 nanosleep(rqtp: 0x7ffc78b18980                                        ) ...
    1.787 (         ): evt:Raise a BPF event!..)
    1.788 (         ): perf_bpf_probe:func_begin:(ffffffff810e97d0))
  ...
  101.866 (87.038 ms): gmain/1654 poll(ufds: 0x7f57a80008c0, nfds: 2, timeout_msecs: 1000               ) ...
  101.866 (         ): evt:Raise a BPF event!..)
  101.867 (         ): perf_bpf_probe:func_end:(ffffffff810e97d0 <- ffffffff81796173))
  101.869 (100.087 ms): usleep/3832  ... [continued]: nanosleep()) = 0
  ...

 (There is an extra ')' at the end of several lines. However, it is
  another problem, unrelated to this commit.)

Where test_bpf_trace.c is:
 /************************ BEGIN **************************/
 #include <uapi/linux/bpf.h>
 struct bpf_map_def {
        unsigned int type;
        unsigned int key_size;
        unsigned int value_size;
        unsigned int max_entries;
 };
 #define SEC(NAME) __attribute__((section(NAME), used))
 static u64 (*ktime_get_ns)(void) =
        (void *)BPF_FUNC_ktime_get_ns;
 static int (*trace_printk)(const char *fmt, int fmt_size, ...) =
        (void *)BPF_FUNC_trace_printk;
 static int (*get_smp_processor_id)(void) =
        (void *)BPF_FUNC_get_smp_processor_id;
 static int (*perf_event_output)(void *, struct bpf_map_def *, int, void *, unsigned long) =
        (void *)BPF_FUNC_perf_event_output;

 struct bpf_map_def SEC("maps") channel = {
        .type = BPF_MAP_TYPE_PERF_EVENT_ARRAY,
        .key_size = sizeof(int),
        .value_size = sizeof(u32),
        .max_entries = __NR_CPUS__,
 };

 static inline int __attribute__((always_inline))
 func(void *ctx, int type)
 {
	char output_str[] = "Raise a BPF event!";
	char err_str[] = "BAD %d\n";
	int err;

        err = perf_event_output(ctx, &channel, get_smp_processor_id(),
			        &output_str, sizeof(output_str));
	if (err)
		trace_printk(err_str, sizeof(err_str), err);
        return 1;
 }
 SEC("func_begin=sys_nanosleep")
 int func_begin(void *ctx) {return func(ctx, 1);}
 SEC("func_end=sys_nanosleep%return")
 int func_end(void *ctx) { return func(ctx, 2);}
 char _license[] SEC("license") = "GPL";
 int _version SEC("version") = LINUX_VERSION_CODE;
 /************************* END ***************************/

Signed-off-by: Wang Nan <wangnan0@huawei.com>
Cc: Arnaldo Carvalho de Melo <acme@redhat.com>
Cc: Jiri Olsa <jolsa@kernel.org>
Cc: Li Zefan <lizefan@huawei.com>
Cc: pi3orama@163.com
---
 tools/perf/builtin-trace.c | 48 +++++++++++++++++++++++++++++++++++++++++-----
 1 file changed, 43 insertions(+), 5 deletions(-)

diff --git a/tools/perf/builtin-trace.c b/tools/perf/builtin-trace.c
index 32392c4..b7984ad 100644
--- a/tools/perf/builtin-trace.c
+++ b/tools/perf/builtin-trace.c
@@ -2178,6 +2178,37 @@ out_dump:
 	return 0;
 }
 
+static void bpf_output__printer(enum binary_printer_ops op,
+				unsigned int val, void *extra)
+{
+	FILE *output = extra;
+	unsigned char ch = (unsigned char)val;
+
+	switch (op) {
+	case BINARY_PRINT_CHAR_DATA:
+		fprintf(output, "%c", isprint(ch) ? ch : '.');
+		break;
+	case BINARY_PRINT_DATA_BEGIN:
+	case BINARY_PRINT_LINE_BEGIN:
+	case BINARY_PRINT_ADDR:
+	case BINARY_PRINT_NUM_DATA:
+	case BINARY_PRINT_NUM_PAD:
+	case BINARY_PRINT_SEP:
+	case BINARY_PRINT_CHAR_PAD:
+	case BINARY_PRINT_LINE_END:
+	case BINARY_PRINT_DATA_END:
+	default:
+		break;
+	}
+}
+
+static void bpf_output__fprintf(struct trace *trace,
+				struct perf_sample *sample)
+{
+	print_binary(sample->raw_data, sample->raw_size, 8,
+		     bpf_output__printer, trace->output);
+}
+
 static int trace__event_handler(struct trace *trace, struct perf_evsel *evsel,
 				union perf_event *event __maybe_unused,
 				struct perf_sample *sample)
@@ -2190,7 +2221,9 @@ static int trace__event_handler(struct trace *trace, struct perf_evsel *evsel,
 
 	fprintf(trace->output, "%s:", evsel->name);
 
-	if (evsel->tp_format) {
+	if (perf_evsel__is_bpf_output(evsel)) {
+		bpf_output__fprintf(trace, sample);
+	} else if (evsel->tp_format) {
 		event_format__fprintf(evsel->tp_format, sample->cpu,
 				      sample->raw_data, sample->raw_size,
 				      trace->output);
@@ -2526,11 +2559,15 @@ out_enomem:
 	goto out;
 }
 
-static int validate_evlist(struct perf_evlist *evlist)
+static int validate_evlist(struct perf_evlist *evlist, bool *has_bpf_output)
 {
 	struct perf_evsel *evsel;
 
 	evlist__for_each(evlist, evsel) {
+		if (perf_evsel__is_bpf_output(evsel)) {
+			*has_bpf_output = true;
+			continue;
+		}
 		if (evsel->attr.type != PERF_TYPE_TRACEPOINT)
 			return -EINVAL;
 	}
@@ -3118,6 +3155,7 @@ int cmd_trace(int argc, const char **argv, const char *prefix __maybe_unused)
 	const char * const trace_subcommands[] = { "record", NULL };
 	int err;
 	char bf[BUFSIZ];
+	bool has_bpf_output = false;
 
 	signal(SIGSEGV, sighandler_dump_stack);
 	signal(SIGFPE, sighandler_dump_stack);
@@ -3133,12 +3171,12 @@ int cmd_trace(int argc, const char **argv, const char *prefix __maybe_unused)
 	argc = parse_options_subcommand(argc, argv, trace_options, trace_subcommands,
 				 trace_usage, PARSE_OPT_STOP_AT_NON_OPTION);
 
-	if (validate_evlist(trace.evlist)) {
-		pr_err("Only support tracepoint events!\n");
+	if (validate_evlist(trace.evlist, &has_bpf_output)) {
+		pr_err("Only support tracepoint and bpf-output events!\n");
 		return -EINVAL;
 	}
 
-	if (trace.trace_pgfaults) {
+	if (trace.trace_pgfaults || has_bpf_output) {
 		trace.opts.sample_address = true;
 		trace.opts.sample_time = true;
 	}
-- 
1.8.3.4

^ permalink raw reply related	[flat|nested] 60+ messages in thread

* [PATCH 08/46] perf data: Support converting data from bpf_perf_event_output()
  2016-02-26  9:31 [PATCH 00/46] perf tools: Fix and improvements (bpf and overwrite) Wang Nan
                   ` (6 preceding siblings ...)
  2016-02-26  9:31 ` [PATCH 07/46] perf trace: Print content of bpf-output event Wang Nan
@ 2016-02-26  9:31 ` Wang Nan
  2016-03-05  8:15   ` [tip:perf/core] " tip-bot for Wang Nan
  2016-02-26  9:31 ` [PATCH 09/46] perf data: Explicitly set byte order for integer types Wang Nan
                   ` (37 subsequent siblings)
  45 siblings, 1 reply; 60+ messages in thread
From: Wang Nan @ 2016-02-26  9:31 UTC (permalink / raw)
  To: Alexei Starovoitov, Arnaldo Carvalho de Melo, Arnaldo Carvalho de Melo
  Cc: Jiri Olsa, Li Zefan, Peter Zijlstra, pi3orama, Wang Nan,
	linux-kernel, Brendan Gregg, David S. Miller, Masami Hiramatsu,
	Namhyung Kim

bpf_perf_event_output() outputs data through sample->raw_data. This
patch adds support to convert those data into CTF. A python script
then can be used to process output data from BPF programs.

Test result:

 # cat ./test_bpf_output_2.c
 /************************ BEGIN **************************/
 #include <uapi/linux/bpf.h>
 struct bpf_map_def {
 	unsigned int type;
 	unsigned int key_size;
 	unsigned int value_size;
 	unsigned int max_entries;
 };
 #define SEC(NAME) __attribute__((section(NAME), used))
 static u64 (*ktime_get_ns)(void) =
 	(void *)BPF_FUNC_ktime_get_ns;
 static int (*trace_printk)(const char *fmt, int fmt_size, ...) =
 	(void *)BPF_FUNC_trace_printk;
 static int (*get_smp_processor_id)(void) =
 	(void *)BPF_FUNC_get_smp_processor_id;
 static int (*perf_event_output)(void *, struct bpf_map_def *, int, void *, unsigned long) =
 	(void *)BPF_FUNC_perf_event_output;

 struct bpf_map_def SEC("maps") channel = {
 	.type = BPF_MAP_TYPE_PERF_EVENT_ARRAY,
 	.key_size = sizeof(int),
 	.value_size = sizeof(u32),
 	.max_entries = __NR_CPUS__,
 };

 static inline int __attribute__((always_inline))
 func(void *ctx, int type)
 {
 	struct {
 		u64 ktime;
 		int type;
 	} __attribute__((packed)) output_data;
 	char error_data[] = "Error: failed to output\n";
 	int err;

 	output_data.type = type;
 	output_data.ktime = ktime_get_ns();
 	err = perf_event_output(ctx, &channel, get_smp_processor_id(),
 				&output_data, sizeof(output_data));
 	if (err)
 		trace_printk(error_data, sizeof(error_data));
 	return 0;
 }
 SEC("func_begin=sys_nanosleep")
 int func_begin(void *ctx) {return func(ctx, 1);}
 SEC("func_end=sys_nanosleep%return")
 int func_end(void *ctx) { return func(ctx, 2);}
 char _license[] SEC("license") = "GPL";
 int _version SEC("version") = LINUX_VERSION_CODE;
 /************************* END ***************************/

 # ./perf record -e bpf-output/no-inherit,name=evt/ \
                 -e ./test_bpf_output_2.c/map:channel.event=evt/ \
                 usleep 100000
 [ perf record: Woken up 1 times to write data ]
 [ perf record: Captured and wrote 0.012 MB perf.data (2 samples) ]

 # ./perf script
          usleep 14942 92503.198504: evt:  ffffffff810e0ba1 sys_nanosleep (/lib/modules/4.3.0....
          usleep 14942 92503.298562: evt:  ffffffff810585e9 kretprobe_trampoline_holder (/lib....

 # ./perf data convert --to-ctf ./out.ctf
 [ perf data convert: Converted 'perf.data' into CTF data './out.ctf' ]
 [ perf data convert: Converted and wrote 0.000 MB (2 samples) ]

 # babeltrace ./out.ctf
 [01:41:43.198504134] (+?.?????????) evt: { cpu_id = 0 }, { perf_ip = 0xFFFFFFFF810E0BA1, perf_tid = 14942, perf_pid = 14942, perf_id = 1044, raw_len = 3, raw_data = [ [0] = 0x32C0C07B, [1] = 0x5421, [2] = 0x1 ] }
 [01:41:43.298562257] (+0.100058123) evt: { cpu_id = 0 }, { perf_ip = 0xFFFFFFFF810585E9, perf_tid = 14942, perf_pid = 14942, perf_id = 1044, raw_len = 3, raw_data = [ [0] = 0x38B77FAA, [1] = 0x5421, [2] = 0x2 ] }

 # cat ./test_bpf_output_2.py
 from babeltrace import TraceCollection
 tc = TraceCollection()
 tc.add_trace('./out.ctf', 'ctf')
 d = {1:[], 2:[]}
 for event in tc.events:
     if not event.name.startswith('evt'):
         continue
     raw_data = event['raw_data']
     (time, type) = ((raw_data[0] + (raw_data[1] << 32)), raw_data[2])
     d[type].append(time)
 print(list(map(lambda i: d[2][i] - d[1][i], range(len(d[1])))));

 # python3 ./test_bpf_output_2.py
 [100056879]

Signed-off-by: Wang Nan <wangnan0@huawei.com>
Acked-by: Jiri Olsa <jolsa@kernel.org>
Cc: Alexei Starovoitov <ast@kernel.org>
Cc: Arnaldo Carvalho de Melo <acme@redhat.com>
Cc: Brendan Gregg <brendan.d.gregg@gmail.com>
Cc: David S. Miller <davem@davemloft.net>
Cc: Masami Hiramatsu <masami.hiramatsu.pt@hitachi.com>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Zefan Li <lizefan@huawei.com>
Cc: pi3orama@163.com
---
 tools/perf/util/data-convert-bt.c | 112 +++++++++++++++++++++++++++++++++++++-
 1 file changed, 111 insertions(+), 1 deletion(-)

diff --git a/tools/perf/util/data-convert-bt.c b/tools/perf/util/data-convert-bt.c
index 6729f4d..1f608a6 100644
--- a/tools/perf/util/data-convert-bt.c
+++ b/tools/perf/util/data-convert-bt.c
@@ -352,6 +352,84 @@ static int add_tracepoint_values(struct ctf_writer *cw,
 	return ret;
 }
 
+static int
+add_bpf_output_values(struct bt_ctf_event_class *event_class,
+		      struct bt_ctf_event *event,
+		      struct perf_sample *sample)
+{
+	struct bt_ctf_field_type *len_type, *seq_type;
+	struct bt_ctf_field *len_field, *seq_field;
+	unsigned int raw_size = sample->raw_size;
+	unsigned int nr_elements = raw_size / sizeof(u32);
+	unsigned int i;
+	int ret;
+
+	if (nr_elements * sizeof(u32) != raw_size)
+		pr_warning("Incorrect raw_size (%u) in bpf output event, skip %lu bytes\n",
+			   raw_size, nr_elements * sizeof(u32) - raw_size);
+
+	len_type = bt_ctf_event_class_get_field_by_name(event_class, "raw_len");
+	len_field = bt_ctf_field_create(len_type);
+	if (!len_field) {
+		pr_err("failed to create 'raw_len' for bpf output event\n");
+		ret = -1;
+		goto put_len_type;
+	}
+
+	ret = bt_ctf_field_unsigned_integer_set_value(len_field, nr_elements);
+	if (ret) {
+		pr_err("failed to set field value for raw_len\n");
+		goto put_len_field;
+	}
+	ret = bt_ctf_event_set_payload(event, "raw_len", len_field);
+	if (ret) {
+		pr_err("failed to set payload to raw_len\n");
+		goto put_len_field;
+	}
+
+	seq_type = bt_ctf_event_class_get_field_by_name(event_class, "raw_data");
+	seq_field = bt_ctf_field_create(seq_type);
+	if (!seq_field) {
+		pr_err("failed to create 'raw_data' for bpf output event\n");
+		ret = -1;
+		goto put_seq_type;
+	}
+
+	ret = bt_ctf_field_sequence_set_length(seq_field, len_field);
+	if (ret) {
+		pr_err("failed to set length of 'raw_data'\n");
+		goto put_seq_field;
+	}
+
+	for (i = 0; i < nr_elements; i++) {
+		struct bt_ctf_field *elem_field =
+			bt_ctf_field_sequence_get_field(seq_field, i);
+
+		ret = bt_ctf_field_unsigned_integer_set_value(elem_field,
+				((u32 *)(sample->raw_data))[i]);
+
+		bt_ctf_field_put(elem_field);
+		if (ret) {
+			pr_err("failed to set raw_data[%d]\n", i);
+			goto put_seq_field;
+		}
+	}
+
+	ret = bt_ctf_event_set_payload(event, "raw_data", seq_field);
+	if (ret)
+		pr_err("failed to set payload for raw_data\n");
+
+put_seq_field:
+	bt_ctf_field_put(seq_field);
+put_seq_type:
+	bt_ctf_field_type_put(seq_type);
+put_len_field:
+	bt_ctf_field_put(len_field);
+put_len_type:
+	bt_ctf_field_type_put(len_type);
+	return ret;
+}
+
 static int add_generic_values(struct ctf_writer *cw,
 			      struct bt_ctf_event *event,
 			      struct perf_evsel *evsel,
@@ -597,6 +675,12 @@ static int process_sample_event(struct perf_tool *tool,
 			return -1;
 	}
 
+	if (perf_evsel__is_bpf_output(evsel)) {
+		ret = add_bpf_output_values(event_class, event, sample);
+		if (ret)
+			return -1;
+	}
+
 	cs = ctf_stream(cw, get_sample_cpu(cw, sample, evsel));
 	if (cs) {
 		if (is_flush_needed(cs))
@@ -744,6 +828,25 @@ static int add_tracepoint_types(struct ctf_writer *cw,
 	return ret;
 }
 
+static int add_bpf_output_types(struct ctf_writer *cw,
+				struct bt_ctf_event_class *class)
+{
+	struct bt_ctf_field_type *len_type = cw->data.u32;
+	struct bt_ctf_field_type *seq_base_type = cw->data.u32_hex;
+	struct bt_ctf_field_type *seq_type;
+	int ret;
+
+	ret = bt_ctf_event_class_add_field(class, len_type, "raw_len");
+	if (ret)
+		return ret;
+
+	seq_type = bt_ctf_field_type_sequence_create(seq_base_type, "raw_len");
+	if (!seq_type)
+		return -1;
+
+	return bt_ctf_event_class_add_field(class, seq_type, "raw_data");
+}
+
 static int add_generic_types(struct ctf_writer *cw, struct perf_evsel *evsel,
 			     struct bt_ctf_event_class *event_class)
 {
@@ -755,7 +858,8 @@ static int add_generic_types(struct ctf_writer *cw, struct perf_evsel *evsel,
 	 *                              ctf event header
 	 *   PERF_SAMPLE_READ         - TODO
 	 *   PERF_SAMPLE_CALLCHAIN    - TODO
-	 *   PERF_SAMPLE_RAW          - tracepoint fields are handled separately
+	 *   PERF_SAMPLE_RAW          - tracepoint fields and BPF output
+	 *                              are handled separately
 	 *   PERF_SAMPLE_BRANCH_STACK - TODO
 	 *   PERF_SAMPLE_REGS_USER    - TODO
 	 *   PERF_SAMPLE_STACK_USER   - TODO
@@ -824,6 +928,12 @@ static int add_event(struct ctf_writer *cw, struct perf_evsel *evsel)
 			goto err;
 	}
 
+	if (perf_evsel__is_bpf_output(evsel)) {
+		ret = add_bpf_output_types(cw, event_class);
+		if (ret)
+			goto err;
+	}
+
 	ret = bt_ctf_stream_class_add_event_class(cw->stream_class, event_class);
 	if (ret) {
 		pr("Failed to add event class into stream.\n");
-- 
1.8.3.4

^ permalink raw reply related	[flat|nested] 60+ messages in thread

* [PATCH 09/46] perf data: Explicitly set byte order for integer types
  2016-02-26  9:31 [PATCH 00/46] perf tools: Fix and improvements (bpf and overwrite) Wang Nan
                   ` (7 preceding siblings ...)
  2016-02-26  9:31 ` [PATCH 08/46] perf data: Support converting data from bpf_perf_event_output() Wang Nan
@ 2016-02-26  9:31 ` Wang Nan
  2016-03-05  8:15   ` [tip:perf/core] " tip-bot for Wang Nan
  2016-02-26  9:31 ` [PATCH 10/46] perf core: Introduce new ioctl options to pause and resume ring buffer Wang Nan
                   ` (36 subsequent siblings)
  45 siblings, 1 reply; 60+ messages in thread
From: Wang Nan @ 2016-02-26  9:31 UTC (permalink / raw)
  To: Alexei Starovoitov, Arnaldo Carvalho de Melo, Arnaldo Carvalho de Melo
  Cc: Jiri Olsa, Li Zefan, Peter Zijlstra, pi3orama, Wang Nan,
	linux-kernel, Brendan Gregg, David S. Miller,
	Jérémie Galarneau, Masami Hiramatsu, Namhyung Kim

After babeltrace commit 5cec03e402aa ("ir: copy variants and
sequences when setting a field path"), 'perf data convert' gets
incorrect result if there's bpf output data. For example:

 # perf data convert --to-ctf ./out.ctf
 # babeltrace ./out.ctf
 [10:44:31.186045346] (+?.?????????) evt: { cpu_id = 0 }, { perf_ip = 0xFFFFFFFF810E7DD1, perf_tid = 23819, perf_pid = 23819, perf_id = 518, raw_len = 3, raw_data = [ [0] = 0xC028E32F, [1] = 0x815D0100, [2] = 0x1000000 ] }
 [10:44:31.286101003] (+0.100055657) evt: { cpu_id = 0 }, { perf_ip = 0xFFFFFFFF8105B609, perf_tid = 23819, perf_pid = 23819, perf_id = 518, raw_len = 3, raw_data = [ [0] = 0x35D9F1EB, [1] = 0x15D81, [2] = 0x2 ] }

The expected result of the first sample should be:

 raw_data = [ [0] = 0x2FE328C0, [1] = 0x15D81, [2] = 0x1 ] }

however, 'perf data convert' output big endian value to resuling CTF
file.

The reason is a internal change (or a bug?) of babeltrace.

Before this patch, at the first add_bpf_output_values(), byte order of
all integer type is uncertain (is 0, neither 1234 (le) nor 4321 (be)).
It would be fixed by:

perf_evlist__deliver_sample
 -> process_sample_event
   -> ctf_stream
      ...
      ->bt_ctf_trace_add_stream_class
        ->bt_ctf_field_type_structure_set_byte_order
          ->bt_ctf_field_type_integer_set_byte_order

during creating the stream.

However, the babeltrace commit mentioned above duplicates types in
sequence to prevent potential conflict in following call stack and
link the newly allocated type into the 'raw_data' sequence:

perf_evlist__deliver_sample
 -> process_sample_event
   -> ctf_stream
      ...
      -> bt_ctf_trace_add_stream_class
        -> bt_ctf_stream_class_resolve_types
           ...
           -> bt_ctf_field_type_sequence_copy
             ->bt_ctf_field_type_integer_copy

This happens before byte order setting, so only the newly allocated
type is initialized, the byte order of original type perf choose to
create the first raw_data is still uncertain.

Byte order in CTF output is not related to byte order in perf.data.
Setting it to anything other than BT_CTF_BYTE_ORDER_NATIVE solves this
problem (only BT_CTF_BYTE_ORDER_NATIVE needs to be fixed). To reduce
behavior changing, set byte order according to compiling options.

Signed-off-by: Wang Nan <wangnan0@huawei.com>
Cc: Alexei Starovoitov <ast@kernel.org>
Cc: Arnaldo Carvalho de Melo <acme@redhat.com>
Cc: Brendan Gregg <brendan.d.gregg@gmail.com>
Cc: David S. Miller <davem@davemloft.net>
Cc: Jérémie Galarneau <jeremie.galarneau@efficios.com>
Cc: Jiri Olsa <jolsa@kernel.org>
Cc: Masami Hiramatsu <masami.hiramatsu.pt@hitachi.com>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Zefan Li <lizefan@huawei.com>
Cc: pi3orama@163.com
---
 tools/perf/util/data-convert-bt.c | 6 ++++++
 1 file changed, 6 insertions(+)

diff --git a/tools/perf/util/data-convert-bt.c b/tools/perf/util/data-convert-bt.c
index 1f608a6..811af89 100644
--- a/tools/perf/util/data-convert-bt.c
+++ b/tools/perf/util/data-convert-bt.c
@@ -1080,6 +1080,12 @@ static struct bt_ctf_field_type *create_int_type(int size, bool sign, bool hex)
 	    bt_ctf_field_type_integer_set_base(type, BT_CTF_INTEGER_BASE_HEXADECIMAL))
 		goto err;
 
+#if __BYTE_ORDER == __BIG_ENDIAN
+	bt_ctf_field_type_set_byte_order(type, BT_CTF_BYTE_ORDER_BIG_ENDIAN);
+#else
+	bt_ctf_field_type_set_byte_order(type, BT_CTF_BYTE_ORDER_LITTLE_ENDIAN);
+#endif
+
 	pr2("Created type: INTEGER %d-bit %ssigned %s\n",
 	    size, sign ? "un" : "", hex ? "hex" : "");
 	return type;
-- 
1.8.3.4

^ permalink raw reply related	[flat|nested] 60+ messages in thread

* [PATCH 10/46] perf core: Introduce new ioctl options to pause and resume ring buffer
  2016-02-26  9:31 [PATCH 00/46] perf tools: Fix and improvements (bpf and overwrite) Wang Nan
                   ` (8 preceding siblings ...)
  2016-02-26  9:31 ` [PATCH 09/46] perf data: Explicitly set byte order for integer types Wang Nan
@ 2016-02-26  9:31 ` Wang Nan
  2016-02-29 15:39   ` Arnaldo Carvalho de Melo
  2016-02-26  9:31 ` [PATCH 11/46] perf core: Set event's default overflow_handler Wang Nan
                   ` (35 subsequent siblings)
  45 siblings, 1 reply; 60+ messages in thread
From: Wang Nan @ 2016-02-26  9:31 UTC (permalink / raw)
  To: Alexei Starovoitov, Arnaldo Carvalho de Melo, Arnaldo Carvalho de Melo
  Cc: Jiri Olsa, Li Zefan, Peter Zijlstra, pi3orama, Wang Nan,
	linux-kernel, He Kuang, Brendan Gregg, Masami Hiramatsu,
	Namhyung Kim

Add new ioctl() to pause/resume ring-buffer output.

In some situations we want to read from ring buffer only when we
ensure nothing can write to the ring buffer during reading. Without
this patch we have to turn off all events attached to this ring buffer
to achieve this.

This patch is for supporting overwrite ring buffer. Following
commits will introduce new methods support reading from overwrite ring
buffer. Before reading caller must ensure the ring buffer is frozen, or
the reading is unreliable.

Signed-off-by: Wang Nan <wangnan0@huawei.com>
Cc: He Kuang <hekuang@huawei.com>
Cc: Alexei Starovoitov <ast@kernel.org>
Cc: Arnaldo Carvalho de Melo <acme@redhat.com>
Cc: Brendan Gregg <brendan.d.gregg@gmail.com>
Cc: Jiri Olsa <jolsa@kernel.org>
Cc: Masami Hiramatsu <masami.hiramatsu.pt@hitachi.com>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Zefan Li <lizefan@huawei.com>
Cc: pi3orama@163.com
---
 include/uapi/linux/perf_event.h |  1 +
 kernel/events/core.c            | 13 +++++++++++++
 kernel/events/internal.h        | 11 +++++++++++
 kernel/events/ring_buffer.c     |  7 ++++++-
 4 files changed, 31 insertions(+), 1 deletion(-)

diff --git a/include/uapi/linux/perf_event.h b/include/uapi/linux/perf_event.h
index 1afe962..a3c1903 100644
--- a/include/uapi/linux/perf_event.h
+++ b/include/uapi/linux/perf_event.h
@@ -401,6 +401,7 @@ struct perf_event_attr {
 #define PERF_EVENT_IOC_SET_FILTER	_IOW('$', 6, char *)
 #define PERF_EVENT_IOC_ID		_IOR('$', 7, __u64 *)
 #define PERF_EVENT_IOC_SET_BPF		_IOW('$', 8, __u32)
+#define PERF_EVENT_IOC_PAUSE_OUTPUT	_IOW('$', 9, __u32)
 
 enum perf_event_ioc_flags {
 	PERF_IOC_FLAG_GROUP		= 1U << 0,
diff --git a/kernel/events/core.c b/kernel/events/core.c
index 94c47e3..a7075ae 100644
--- a/kernel/events/core.c
+++ b/kernel/events/core.c
@@ -4231,6 +4231,19 @@ static long _perf_ioctl(struct perf_event *event, unsigned int cmd, unsigned lon
 	case PERF_EVENT_IOC_SET_BPF:
 		return perf_event_set_bpf_prog(event, arg);
 
+	case PERF_EVENT_IOC_PAUSE_OUTPUT: {
+		struct ring_buffer *rb;
+
+		rcu_read_lock();
+		rb = rcu_dereference(event->rb);
+		if (!event->rb) {
+			rcu_read_unlock();
+			return -EINVAL;
+		}
+		rb_toggle_paused(rb, !!arg);
+		rcu_read_unlock();
+		return 0;
+	}
 	default:
 		return -ENOTTY;
 	}
diff --git a/kernel/events/internal.h b/kernel/events/internal.h
index 2bbad9c..6a93d1b 100644
--- a/kernel/events/internal.h
+++ b/kernel/events/internal.h
@@ -18,6 +18,7 @@ struct ring_buffer {
 #endif
 	int				nr_pages;	/* nr of data pages  */
 	int				overwrite;	/* can overwrite itself */
+	int				paused;		/* can write into ring buffer */
 
 	atomic_t			poll;		/* POLL_ for wakeups */
 
@@ -65,6 +66,16 @@ static inline void rb_free_rcu(struct rcu_head *rcu_head)
 	rb_free(rb);
 }
 
+static inline void
+rb_toggle_paused(struct ring_buffer *rb,
+		 bool pause)
+{
+	if (!pause && rb->nr_pages)
+		rb->paused = 0;
+	else
+		rb->paused = 1;
+}
+
 extern struct ring_buffer *
 rb_alloc(int nr_pages, long watermark, int cpu, int flags);
 extern void perf_event_wakeup(struct perf_event *event);
diff --git a/kernel/events/ring_buffer.c b/kernel/events/ring_buffer.c
index 1faad2c..22e1a47 100644
--- a/kernel/events/ring_buffer.c
+++ b/kernel/events/ring_buffer.c
@@ -125,8 +125,11 @@ int perf_output_begin(struct perf_output_handle *handle,
 	if (unlikely(!rb))
 		goto out;
 
-	if (unlikely(!rb->nr_pages))
+	if (unlikely(rb->paused)) {
+		if (rb->nr_pages)
+			local_inc(&rb->lost);
 		goto out;
+	}
 
 	handle->rb    = rb;
 	handle->event = event;
@@ -244,6 +247,8 @@ ring_buffer_init(struct ring_buffer *rb, long watermark, int flags)
 	INIT_LIST_HEAD(&rb->event_list);
 	spin_lock_init(&rb->event_lock);
 	init_irq_work(&rb->irq_work, rb_irq_work);
+
+	rb->paused = rb->nr_pages ? 0 : 1;
 }
 
 static void ring_buffer_put_async(struct ring_buffer *rb)
-- 
1.8.3.4

^ permalink raw reply related	[flat|nested] 60+ messages in thread

* [PATCH 11/46] perf core: Set event's default overflow_handler
  2016-02-26  9:31 [PATCH 00/46] perf tools: Fix and improvements (bpf and overwrite) Wang Nan
                   ` (9 preceding siblings ...)
  2016-02-26  9:31 ` [PATCH 10/46] perf core: Introduce new ioctl options to pause and resume ring buffer Wang Nan
@ 2016-02-26  9:31 ` Wang Nan
  2016-02-26  9:32 ` [PATCH 12/46] perf core: Prepare writing into ring buffer from end Wang Nan
                   ` (34 subsequent siblings)
  45 siblings, 0 replies; 60+ messages in thread
From: Wang Nan @ 2016-02-26  9:31 UTC (permalink / raw)
  To: Alexei Starovoitov, Arnaldo Carvalho de Melo, Arnaldo Carvalho de Melo
  Cc: Jiri Olsa, Li Zefan, Peter Zijlstra, pi3orama, Wang Nan,
	linux-kernel, He Kuang, Brendan Gregg, Masami Hiramatsu,
	Namhyung Kim

Set a default event->overflow_handler in perf_event_alloc() so don't
need checking event->overflow_handler in __perf_event_overflow().
Following commits can give a different default overflow_handler.

No extra performance introduced into hot path because in the original
code we still need reading this handler from memory. A conditional branch
is avoided so actually we remove some instructions.

Initial idea comes from Peter at [1].

[1] http://lkml.kernel.org/r/20130708121557.GA17211@twins.programming.kicks-ass.net

Signed-off-by: Wang Nan <wangnan0@huawei.com>
Cc: He Kuang <hekuang@huawei.com>
Cc: Alexei Starovoitov <ast@kernel.org>
Cc: Arnaldo Carvalho de Melo <acme@redhat.com>
Cc: Brendan Gregg <brendan.d.gregg@gmail.com>
Cc: Jiri Olsa <jolsa@kernel.org>
Cc: Masami Hiramatsu <masami.hiramatsu.pt@hitachi.com>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Zefan Li <lizefan@huawei.com>
Cc: pi3orama@163.com
---
 kernel/events/core.c | 14 ++++++++------
 1 file changed, 8 insertions(+), 6 deletions(-)

diff --git a/kernel/events/core.c b/kernel/events/core.c
index a7075ae..ae34061 100644
--- a/kernel/events/core.c
+++ b/kernel/events/core.c
@@ -6392,10 +6392,7 @@ static int __perf_event_overflow(struct perf_event *event,
 		irq_work_queue(&event->pending);
 	}
 
-	if (event->overflow_handler)
-		event->overflow_handler(event, data, regs);
-	else
-		perf_event_output(event, data, regs);
+	event->overflow_handler(event, data, regs);
 
 	if (*perf_event_fasync(event) && event->pending_kill) {
 		event->pending_wakeup = 1;
@@ -7868,8 +7865,13 @@ perf_event_alloc(struct perf_event_attr *attr, int cpu,
 		context = parent_event->overflow_handler_context;
 	}
 
-	event->overflow_handler	= overflow_handler;
-	event->overflow_handler_context = context;
+	if (overflow_handler) {
+		event->overflow_handler	= overflow_handler;
+		event->overflow_handler_context = context;
+	} else {
+		event->overflow_handler = perf_event_output;
+		event->overflow_handler_context = NULL;
+	}
 
 	perf_event__state_init(event);
 
-- 
1.8.3.4

^ permalink raw reply related	[flat|nested] 60+ messages in thread

* [PATCH 12/46] perf core: Prepare writing into ring buffer from end
  2016-02-26  9:31 [PATCH 00/46] perf tools: Fix and improvements (bpf and overwrite) Wang Nan
                   ` (10 preceding siblings ...)
  2016-02-26  9:31 ` [PATCH 11/46] perf core: Set event's default overflow_handler Wang Nan
@ 2016-02-26  9:32 ` Wang Nan
  2016-02-26  9:32 ` [PATCH 13/46] perf core: Add backward attribute to perf event Wang Nan
                   ` (33 subsequent siblings)
  45 siblings, 0 replies; 60+ messages in thread
From: Wang Nan @ 2016-02-26  9:32 UTC (permalink / raw)
  To: Alexei Starovoitov, Arnaldo Carvalho de Melo, Arnaldo Carvalho de Melo
  Cc: Jiri Olsa, Li Zefan, Peter Zijlstra, pi3orama, Wang Nan,
	linux-kernel, He Kuang, Brendan Gregg, Masami Hiramatsu,
	Namhyung Kim

Convert perf_output_begin to __perf_output_begin and make the later
function able to write records from the end of the ring buffer.
Following commits will utilize the 'backward' flag.

This patch doesn't introduce any extra performance overhead since we
use always_inline.

Signed-off-by: Wang Nan <wangnan0@huawei.com>
Cc: He Kuang <hekuang@huawei.com>
Cc: Alexei Starovoitov <ast@kernel.org>
Cc: Arnaldo Carvalho de Melo <acme@redhat.com>
Cc: Brendan Gregg <brendan.d.gregg@gmail.com>
Cc: Jiri Olsa <jolsa@kernel.org>
Cc: Masami Hiramatsu <masami.hiramatsu.pt@hitachi.com>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Zefan Li <lizefan@huawei.com>
Cc: pi3orama@163.com
---
 kernel/events/ring_buffer.c | 42 ++++++++++++++++++++++++++++++++++++------
 1 file changed, 36 insertions(+), 6 deletions(-)

diff --git a/kernel/events/ring_buffer.c b/kernel/events/ring_buffer.c
index 22e1a47..37c11c6 100644
--- a/kernel/events/ring_buffer.c
+++ b/kernel/events/ring_buffer.c
@@ -102,8 +102,21 @@ out:
 	preempt_enable();
 }
 
-int perf_output_begin(struct perf_output_handle *handle,
-		      struct perf_event *event, unsigned int size)
+static bool __always_inline
+ring_buffer_has_space(unsigned long head, unsigned long tail,
+		      unsigned long data_size, unsigned int size,
+		      bool backward)
+{
+	if (!backward)
+		return CIRC_SPACE(head, tail, data_size) >= size;
+	else
+		return CIRC_SPACE(tail, head, data_size) >= size;
+}
+
+static int __always_inline
+__perf_output_begin(struct perf_output_handle *handle,
+		    struct perf_event *event, unsigned int size,
+		    bool backward)
 {
 	struct ring_buffer *rb;
 	unsigned long tail, offset, head;
@@ -146,9 +159,12 @@ int perf_output_begin(struct perf_output_handle *handle,
 	do {
 		tail = READ_ONCE(rb->user_page->data_tail);
 		offset = head = local_read(&rb->head);
-		if (!rb->overwrite &&
-		    unlikely(CIRC_SPACE(head, tail, perf_data_size(rb)) < size))
-			goto fail;
+		if (!rb->overwrite) {
+			if (unlikely(!ring_buffer_has_space(head, tail,
+							    perf_data_size(rb),
+							    size, backward)))
+				goto fail;
+		}
 
 		/*
 		 * The above forms a control dependency barrier separating the
@@ -162,9 +178,17 @@ int perf_output_begin(struct perf_output_handle *handle,
 		 * See perf_output_put_handle().
 		 */
 
-		head += size;
+		if (!backward)
+			head += size;
+		else
+			head -= size;
 	} while (local_cmpxchg(&rb->head, offset, head) != offset);
 
+	if (backward) {
+		offset = head;
+		head = (u64)(-head);
+	}
+
 	/*
 	 * We rely on the implied barrier() by local_cmpxchg() to ensure
 	 * none of the data stores below can be lifted up by the compiler.
@@ -206,6 +230,12 @@ out:
 	return -ENOSPC;
 }
 
+int perf_output_begin(struct perf_output_handle *handle,
+		      struct perf_event *event, unsigned int size)
+{
+	return __perf_output_begin(handle, event, size, false);
+}
+
 unsigned int perf_output_copy(struct perf_output_handle *handle,
 		      const void *buf, unsigned int len)
 {
-- 
1.8.3.4

^ permalink raw reply related	[flat|nested] 60+ messages in thread

* [PATCH 13/46] perf core: Add backward attribute to perf event
  2016-02-26  9:31 [PATCH 00/46] perf tools: Fix and improvements (bpf and overwrite) Wang Nan
                   ` (11 preceding siblings ...)
  2016-02-26  9:32 ` [PATCH 12/46] perf core: Prepare writing into ring buffer from end Wang Nan
@ 2016-02-26  9:32 ` Wang Nan
  2016-02-26  9:32 ` [PATCH 14/46] perf core: Reduce perf event output overhead by new overflow handler Wang Nan
                   ` (32 subsequent siblings)
  45 siblings, 0 replies; 60+ messages in thread
From: Wang Nan @ 2016-02-26  9:32 UTC (permalink / raw)
  To: Alexei Starovoitov, Arnaldo Carvalho de Melo, Arnaldo Carvalho de Melo
  Cc: Jiri Olsa, Li Zefan, Peter Zijlstra, pi3orama, Wang Nan,
	linux-kernel, He Kuang, Brendan Gregg, Masami Hiramatsu,
	Namhyung Kim

In perf_event_attr a new bit 'write_backward' is appended to indicate
this event should write ring buffer from its end to beginning.

In perf_output_begin(), prepare ring buffer according this bit.

This patch introduces small overhead into perf_output_begin():
an extra memory read and a conditional branch. Further patch can remove
this overhead by using custom output handler.

Signed-off-by: Wang Nan <wangnan0@huawei.com>
Cc: He Kuang <hekuang@huawei.com>
Cc: Alexei Starovoitov <ast@kernel.org>
Cc: Arnaldo Carvalho de Melo <acme@redhat.com>
Cc: Brendan Gregg <brendan.d.gregg@gmail.com>
Cc: Jiri Olsa <jolsa@kernel.org>
Cc: Masami Hiramatsu <masami.hiramatsu.pt@hitachi.com>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Zefan Li <lizefan@huawei.com>
Cc: pi3orama@163.com
---
 include/linux/perf_event.h      | 5 +++++
 include/uapi/linux/perf_event.h | 3 ++-
 kernel/events/core.c            | 7 +++++++
 kernel/events/ring_buffer.c     | 2 ++
 4 files changed, 16 insertions(+), 1 deletion(-)

diff --git a/include/linux/perf_event.h b/include/linux/perf_event.h
index b35a61a..0ce1015 100644
--- a/include/linux/perf_event.h
+++ b/include/linux/perf_event.h
@@ -1029,6 +1029,11 @@ static inline bool has_aux(struct perf_event *event)
 	return event->pmu->setup_aux;
 }
 
+static inline bool is_write_backward(struct perf_event *event)
+{
+	return !!event->attr.write_backward;
+}
+
 extern int perf_output_begin(struct perf_output_handle *handle,
 			     struct perf_event *event, unsigned int size);
 extern void perf_output_end(struct perf_output_handle *handle);
diff --git a/include/uapi/linux/perf_event.h b/include/uapi/linux/perf_event.h
index a3c1903..43fc8d2 100644
--- a/include/uapi/linux/perf_event.h
+++ b/include/uapi/linux/perf_event.h
@@ -340,7 +340,8 @@ struct perf_event_attr {
 				comm_exec      :  1, /* flag comm events that are due to an exec */
 				use_clockid    :  1, /* use @clockid for time fields */
 				context_switch :  1, /* context switch data */
-				__reserved_1   : 37;
+				write_backward :  1, /* Write ring buffer from end to beginning */
+				__reserved_1   : 36;
 
 	union {
 		__u32		wakeup_events;	  /* wakeup every n events */
diff --git a/kernel/events/core.c b/kernel/events/core.c
index ae34061..9353154 100644
--- a/kernel/events/core.c
+++ b/kernel/events/core.c
@@ -8101,6 +8101,13 @@ perf_event_set_output(struct perf_event *event, struct perf_event *output_event)
 		goto out;
 
 	/*
+	 * Either writing ring buffer from beginning or from end.
+	 * Mixing is not allowed.
+	 */
+	if (is_write_backward(output_event) != is_write_backward(event))
+		goto out;
+
+	/*
 	 * If both events generate aux data, they must be on the same PMU
 	 */
 	if (has_aux(event) && has_aux(output_event) &&
diff --git a/kernel/events/ring_buffer.c b/kernel/events/ring_buffer.c
index 37c11c6..80b1fa7 100644
--- a/kernel/events/ring_buffer.c
+++ b/kernel/events/ring_buffer.c
@@ -233,6 +233,8 @@ out:
 int perf_output_begin(struct perf_output_handle *handle,
 		      struct perf_event *event, unsigned int size)
 {
+	if (unlikely(is_write_backward(event)))
+		return __perf_output_begin(handle, event, size, true);
 	return __perf_output_begin(handle, event, size, false);
 }
 
-- 
1.8.3.4

^ permalink raw reply related	[flat|nested] 60+ messages in thread

* [PATCH 14/46] perf core: Reduce perf event output overhead by new overflow handler
  2016-02-26  9:31 [PATCH 00/46] perf tools: Fix and improvements (bpf and overwrite) Wang Nan
                   ` (12 preceding siblings ...)
  2016-02-26  9:32 ` [PATCH 13/46] perf core: Add backward attribute to perf event Wang Nan
@ 2016-02-26  9:32 ` Wang Nan
  2016-02-26  9:32 ` [PATCH 15/46] perf tools: Only validate is_pos for tracking evsels Wang Nan
                   ` (31 subsequent siblings)
  45 siblings, 0 replies; 60+ messages in thread
From: Wang Nan @ 2016-02-26  9:32 UTC (permalink / raw)
  To: Alexei Starovoitov, Arnaldo Carvalho de Melo, Arnaldo Carvalho de Melo
  Cc: Jiri Olsa, Li Zefan, Peter Zijlstra, pi3orama, Wang Nan,
	linux-kernel, He Kuang, Brendan Gregg, Masami Hiramatsu,
	Namhyung Kim

By creating onward and backward specific overflow handlers and setting
them according to event's backward setting, normal sampling events
don't need checking backward setting of an event any more.

This is the last patch of backward writing patchset. After this patch,
there's no extra overhead introduced to the fast path of sampling
output.

Signed-off-by: Wang Nan <wangnan0@huawei.com>
Cc: He Kuang <hekuang@huawei.com>
Cc: Alexei Starovoitov <ast@kernel.org>
Cc: Arnaldo Carvalho de Melo <acme@redhat.com>
Cc: Brendan Gregg <brendan.d.gregg@gmail.com>
Cc: Jiri Olsa <jolsa@kernel.org>
Cc: Masami Hiramatsu <masami.hiramatsu.pt@hitachi.com>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Zefan Li <lizefan@huawei.com>
Cc: pi3orama@163.com
---
 include/linux/perf_event.h  | 17 +++++++++++++++--
 kernel/events/core.c        | 41 ++++++++++++++++++++++++++++++++++++-----
 kernel/events/ring_buffer.c | 12 ++++++++++++
 3 files changed, 63 insertions(+), 7 deletions(-)

diff --git a/include/linux/perf_event.h b/include/linux/perf_event.h
index 0ce1015..e466cc6 100644
--- a/include/linux/perf_event.h
+++ b/include/linux/perf_event.h
@@ -827,9 +827,15 @@ extern int perf_event_overflow(struct perf_event *event,
 				 struct perf_sample_data *data,
 				 struct pt_regs *regs);
 
+extern void perf_event_output_onward(struct perf_event *event,
+				     struct perf_sample_data *data,
+				     struct pt_regs *regs);
+extern void perf_event_output_backward(struct perf_event *event,
+				       struct perf_sample_data *data,
+				       struct pt_regs *regs);
 extern void perf_event_output(struct perf_event *event,
-				struct perf_sample_data *data,
-				struct pt_regs *regs);
+			      struct perf_sample_data *data,
+			      struct pt_regs *regs);
 
 extern void
 perf_event_header__init_id(struct perf_event_header *header,
@@ -1036,6 +1042,13 @@ static inline bool is_write_backward(struct perf_event *event)
 
 extern int perf_output_begin(struct perf_output_handle *handle,
 			     struct perf_event *event, unsigned int size);
+extern int perf_output_begin_onward(struct perf_output_handle *handle,
+				    struct perf_event *event,
+				    unsigned int size);
+extern int perf_output_begin_backward(struct perf_output_handle *handle,
+				      struct perf_event *event,
+				      unsigned int size);
+
 extern void perf_output_end(struct perf_output_handle *handle);
 extern unsigned int perf_output_copy(struct perf_output_handle *handle,
 			     const void *buf, unsigned int len);
diff --git a/kernel/events/core.c b/kernel/events/core.c
index 9353154..ce70f54 100644
--- a/kernel/events/core.c
+++ b/kernel/events/core.c
@@ -5531,9 +5531,13 @@ void perf_prepare_sample(struct perf_event_header *header,
 	}
 }
 
-void perf_event_output(struct perf_event *event,
-			struct perf_sample_data *data,
-			struct pt_regs *regs)
+static void __always_inline
+__perf_event_output(struct perf_event *event,
+		    struct perf_sample_data *data,
+		    struct pt_regs *regs,
+		    int (*output_begin)(struct perf_output_handle *,
+					struct perf_event *,
+					unsigned int))
 {
 	struct perf_output_handle handle;
 	struct perf_event_header header;
@@ -5543,7 +5547,7 @@ void perf_event_output(struct perf_event *event,
 
 	perf_prepare_sample(&header, data, event, regs);
 
-	if (perf_output_begin(&handle, event, header.size))
+	if (output_begin(&handle, event, header.size))
 		goto exit;
 
 	perf_output_sample(&handle, &header, data, event);
@@ -5554,6 +5558,30 @@ exit:
 	rcu_read_unlock();
 }
 
+void
+perf_event_output_onward(struct perf_event *event,
+			 struct perf_sample_data *data,
+			 struct pt_regs *regs)
+{
+	__perf_event_output(event, data, regs, perf_output_begin_onward);
+}
+
+void
+perf_event_output_backward(struct perf_event *event,
+			   struct perf_sample_data *data,
+			   struct pt_regs *regs)
+{
+	__perf_event_output(event, data, regs, perf_output_begin_backward);
+}
+
+void
+perf_event_output(struct perf_event *event,
+		  struct perf_sample_data *data,
+		  struct pt_regs *regs)
+{
+	__perf_event_output(event, data, regs, perf_output_begin);
+}
+
 /*
  * read event_id
  */
@@ -7868,8 +7896,11 @@ perf_event_alloc(struct perf_event_attr *attr, int cpu,
 	if (overflow_handler) {
 		event->overflow_handler	= overflow_handler;
 		event->overflow_handler_context = context;
+	} else if (is_write_backward(event)){
+		event->overflow_handler = perf_event_output_backward;
+		event->overflow_handler_context = NULL;
 	} else {
-		event->overflow_handler = perf_event_output;
+		event->overflow_handler = perf_event_output_onward;
 		event->overflow_handler_context = NULL;
 	}
 
diff --git a/kernel/events/ring_buffer.c b/kernel/events/ring_buffer.c
index 80b1fa7..7e30e012 100644
--- a/kernel/events/ring_buffer.c
+++ b/kernel/events/ring_buffer.c
@@ -230,6 +230,18 @@ out:
 	return -ENOSPC;
 }
 
+int perf_output_begin_onward(struct perf_output_handle *handle,
+			     struct perf_event *event, unsigned int size)
+{
+	return __perf_output_begin(handle, event, size, false);
+}
+
+int perf_output_begin_backward(struct perf_output_handle *handle,
+			       struct perf_event *event, unsigned int size)
+{
+	return __perf_output_begin(handle, event, size, true);
+}
+
 int perf_output_begin(struct perf_output_handle *handle,
 		      struct perf_event *event, unsigned int size)
 {
-- 
1.8.3.4

^ permalink raw reply related	[flat|nested] 60+ messages in thread

* [PATCH 15/46] perf tools: Only validate is_pos for tracking evsels
  2016-02-26  9:31 [PATCH 00/46] perf tools: Fix and improvements (bpf and overwrite) Wang Nan
                   ` (13 preceding siblings ...)
  2016-02-26  9:32 ` [PATCH 14/46] perf core: Reduce perf event output overhead by new overflow handler Wang Nan
@ 2016-02-26  9:32 ` Wang Nan
  2016-02-26  9:32 ` [PATCH 16/46] perf tools: Print write_backward value in perf_event_attr__fprintf Wang Nan
                   ` (30 subsequent siblings)
  45 siblings, 0 replies; 60+ messages in thread
From: Wang Nan @ 2016-02-26  9:32 UTC (permalink / raw)
  To: Alexei Starovoitov, Arnaldo Carvalho de Melo, Arnaldo Carvalho de Melo
  Cc: Jiri Olsa, Li Zefan, Peter Zijlstra, pi3orama, Wang Nan,
	linux-kernel, He Kuang, Masami Hiramatsu, Namhyung Kim

is_pos only useful for tracking events (fork, mmap, exit, ...).
Perf collects those events through evsel with 'tracking' set.
Therefore, there's no need to validate every is_pos against
evlist->is_pos.

With this commit, kernel can safely put something at the end of
a record. For example, PERF_SAMPLE_TAILSIZE [1]. However, since
we have dropped TAILSIZE, this commit is not mandatory, just
making code robust if someone implements similar things in their
private kernel.

[1] http://lkml.kernel.org/g/1449063499-236703-1-git-send-email-wangnan0@huawei.com

Signed-off-by: Wang Nan <wangnan0@huawei.com>
Signed-off-by: He Kuang <hekuang@huawei.com>
Cc: Arnaldo Carvalho de Melo <acme@redhat.com>
Cc: Jiri Olsa <jolsa@kernel.org>
Cc: Masami Hiramatsu <masami.hiramatsu.pt@hitachi.com>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Zefan Li <lizefan@huawei.com>
Cc: pi3orama@163.com
---
 tools/perf/util/evlist.c | 11 +++++++++--
 1 file changed, 9 insertions(+), 2 deletions(-)

diff --git a/tools/perf/util/evlist.c b/tools/perf/util/evlist.c
index 86a0383..227950b 100644
--- a/tools/perf/util/evlist.c
+++ b/tools/perf/util/evlist.c
@@ -1277,8 +1277,15 @@ bool perf_evlist__valid_sample_type(struct perf_evlist *evlist)
 		return false;
 
 	evlist__for_each(evlist, pos) {
-		if (pos->id_pos != evlist->id_pos ||
-		    pos->is_pos != evlist->is_pos)
+		if (pos->id_pos != evlist->id_pos)
+			return false;
+		/*
+		 * Only tracking events needs is_pos. Those events are
+		 * collected if evsel->tracking is selected.
+		 * For other evsel, is_pos is useless, so skip
+		 * validating them.
+		 */
+		if (pos->tracking && pos->is_pos != evlist->is_pos)
 			return false;
 	}
 
-- 
1.8.3.4

^ permalink raw reply related	[flat|nested] 60+ messages in thread

* [PATCH 16/46] perf tools: Print write_backward value in perf_event_attr__fprintf
  2016-02-26  9:31 [PATCH 00/46] perf tools: Fix and improvements (bpf and overwrite) Wang Nan
                   ` (14 preceding siblings ...)
  2016-02-26  9:32 ` [PATCH 15/46] perf tools: Only validate is_pos for tracking evsels Wang Nan
@ 2016-02-26  9:32 ` Wang Nan
  2016-02-26  9:32 ` [PATCH 17/46] perf tools: Make ordered_events reusable Wang Nan
                   ` (29 subsequent siblings)
  45 siblings, 0 replies; 60+ messages in thread
From: Wang Nan @ 2016-02-26  9:32 UTC (permalink / raw)
  To: Alexei Starovoitov, Arnaldo Carvalho de Melo, Arnaldo Carvalho de Melo
  Cc: Jiri Olsa, Li Zefan, Peter Zijlstra, pi3orama, Wang Nan,
	linux-kernel, He Kuang, Masami Hiramatsu, Namhyung Kim

Print write_backward setting when printing perf evsel.

Signed-off-by: Wang Nan <wangnan0@huawei.com>
Signed-off-by: He Kuang <hekuang@huawei.com>
Cc: Arnaldo Carvalho de Melo <acme@redhat.com>
Cc: Jiri Olsa <jolsa@kernel.org>
Cc: Masami Hiramatsu <masami.hiramatsu.pt@hitachi.com>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Zefan Li <lizefan@huawei.com>
Cc: pi3orama@163.com
---
 tools/perf/util/evsel.c | 1 +
 1 file changed, 1 insertion(+)

diff --git a/tools/perf/util/evsel.c b/tools/perf/util/evsel.c
index 0902fe4..510afa4 100644
--- a/tools/perf/util/evsel.c
+++ b/tools/perf/util/evsel.c
@@ -1299,6 +1299,7 @@ int perf_event_attr__fprintf(FILE *fp, struct perf_event_attr *attr,
 	PRINT_ATTRf(comm_exec, p_unsigned);
 	PRINT_ATTRf(use_clockid, p_unsigned);
 	PRINT_ATTRf(context_switch, p_unsigned);
+	PRINT_ATTRf(write_backward, p_unsigned);
 
 	PRINT_ATTRn("{ wakeup_events, wakeup_watermark }", wakeup_events, p_unsigned);
 	PRINT_ATTRf(bp_type, p_unsigned);
-- 
1.8.3.4

^ permalink raw reply related	[flat|nested] 60+ messages in thread

* [PATCH 17/46] perf tools: Make ordered_events reusable
  2016-02-26  9:31 [PATCH 00/46] perf tools: Fix and improvements (bpf and overwrite) Wang Nan
                   ` (15 preceding siblings ...)
  2016-02-26  9:32 ` [PATCH 16/46] perf tools: Print write_backward value in perf_event_attr__fprintf Wang Nan
@ 2016-02-26  9:32 ` Wang Nan
  2016-02-26  9:32 ` [PATCH 18/46] perf record: Use WARN_ONCE to replace 'if' condition Wang Nan
                   ` (28 subsequent siblings)
  45 siblings, 0 replies; 60+ messages in thread
From: Wang Nan @ 2016-02-26  9:32 UTC (permalink / raw)
  To: Alexei Starovoitov, Arnaldo Carvalho de Melo, Arnaldo Carvalho de Melo
  Cc: Jiri Olsa, Li Zefan, Peter Zijlstra, pi3orama, Wang Nan,
	linux-kernel, He Kuang, Masami Hiramatsu, Namhyung Kim

ordered_events__free() leaves linked lists and timestamps not cleared,
so unable to be reused after ordered_events__free(). Which is inconvenient
after 'perf record' supports generating multiple perf.data output and
process build-ids for each of them.

Introduce ordered_events__reinit() for this.

Signed-off-by: Wang Nan <wangnan0@huawei.com>
Signed-off-by: He Kuang <hekuang@huawei.com>
Cc: Arnaldo Carvalho de Melo <acme@redhat.com>
Cc: Jiri Olsa <jolsa@kernel.org>
Cc: Masami Hiramatsu <masami.hiramatsu.pt@hitachi.com>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Zefan Li <lizefan@huawei.com>
Cc: pi3orama@163.com
---
 tools/perf/util/ordered-events.c | 9 +++++++++
 tools/perf/util/ordered-events.h | 1 +
 tools/perf/util/session.c        | 6 +++++-
 3 files changed, 15 insertions(+), 1 deletion(-)

diff --git a/tools/perf/util/ordered-events.c b/tools/perf/util/ordered-events.c
index b1b9e23..fe84df1 100644
--- a/tools/perf/util/ordered-events.c
+++ b/tools/perf/util/ordered-events.c
@@ -308,3 +308,12 @@ void ordered_events__free(struct ordered_events *oe)
 		free(event);
 	}
 }
+
+void ordered_events__reinit(struct ordered_events *oe)
+{
+	ordered_events__deliver_t old_deliver = oe->deliver;
+
+	ordered_events__free(oe);
+	memset(oe, '\0', sizeof(*oe));
+	ordered_events__init(oe, old_deliver);
+}
diff --git a/tools/perf/util/ordered-events.h b/tools/perf/util/ordered-events.h
index f403991..e11468a 100644
--- a/tools/perf/util/ordered-events.h
+++ b/tools/perf/util/ordered-events.h
@@ -49,6 +49,7 @@ void ordered_events__delete(struct ordered_events *oe, struct ordered_event *eve
 int ordered_events__flush(struct ordered_events *oe, enum oe_flush how);
 void ordered_events__init(struct ordered_events *oe, ordered_events__deliver_t deliver);
 void ordered_events__free(struct ordered_events *oe);
+void ordered_events__reinit(struct ordered_events *oe);
 
 static inline
 void ordered_events__set_alloc_size(struct ordered_events *oe, u64 size)
diff --git a/tools/perf/util/session.c b/tools/perf/util/session.c
index 40b7a0d..ab3a296 100644
--- a/tools/perf/util/session.c
+++ b/tools/perf/util/session.c
@@ -1857,7 +1857,11 @@ out:
 out_err:
 	ui_progress__finish();
 	perf_session__warn_about_errors(session);
-	ordered_events__free(&session->ordered_events);
+	/*
+	 * We may switching perf.data output, make ordered_events
+	 * reusable.
+	 */
+	ordered_events__reinit(&session->ordered_events);
 	auxtrace__free_events(session);
 	session->one_mmap = false;
 	return err;
-- 
1.8.3.4

^ permalink raw reply related	[flat|nested] 60+ messages in thread

* [PATCH 18/46] perf record: Use WARN_ONCE to replace 'if' condition
  2016-02-26  9:31 [PATCH 00/46] perf tools: Fix and improvements (bpf and overwrite) Wang Nan
                   ` (16 preceding siblings ...)
  2016-02-26  9:32 ` [PATCH 17/46] perf tools: Make ordered_events reusable Wang Nan
@ 2016-02-26  9:32 ` Wang Nan
  2016-03-05  8:15   ` [tip:perf/core] " tip-bot for Wang Nan
  2016-02-26  9:32 ` [PATCH 19/46] perf record: Extract synthesize code to record__synthesize() Wang Nan
                   ` (27 subsequent siblings)
  45 siblings, 1 reply; 60+ messages in thread
From: Wang Nan @ 2016-02-26  9:32 UTC (permalink / raw)
  To: Alexei Starovoitov, Arnaldo Carvalho de Melo, Arnaldo Carvalho de Melo
  Cc: Jiri Olsa, Li Zefan, Peter Zijlstra, pi3orama, Wang Nan,
	linux-kernel, He Kuang, Masami Hiramatsu, Namhyung Kim

Following commits will extract kernel and module synthesizing code into
a separated function and call it multiple times. This patch replace
'if (err < 0)' using WARN_ONCE, makes sure the error message show
one time.

Signed-off-by: Wang Nan <wangnan0@huawei.com>
Cc: He Kuang <hekuang@huawei.com>
Cc: Arnaldo Carvalho de Melo <acme@redhat.com>
Cc: Jiri Olsa <jolsa@kernel.org>
Cc: Masami Hiramatsu <masami.hiramatsu.pt@hitachi.com>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Zefan Li <lizefan@huawei.com>
Cc: pi3orama@163.com
---
 tools/perf/builtin-record.c | 15 +++++++--------
 1 file changed, 7 insertions(+), 8 deletions(-)

diff --git a/tools/perf/builtin-record.c b/tools/perf/builtin-record.c
index 7d11162..9dec7e5 100644
--- a/tools/perf/builtin-record.c
+++ b/tools/perf/builtin-record.c
@@ -33,6 +33,7 @@
 #include "util/parse-regs-options.h"
 #include "util/llvm-utils.h"
 #include "util/bpf-loader.h"
+#include "asm/bug.h"
 
 #include <unistd.h>
 #include <sched.h>
@@ -615,17 +616,15 @@ static int __cmd_record(struct record *rec, int argc, const char **argv)
 
 	err = perf_event__synthesize_kernel_mmap(tool, process_synthesized_event,
 						 machine);
-	if (err < 0)
-		pr_err("Couldn't record kernel reference relocation symbol\n"
-		       "Symbol resolution may be skewed if relocation was used (e.g. kexec).\n"
-		       "Check /proc/kallsyms permission or run as root.\n");
+	WARN_ONCE(err < 0, "Couldn't record kernel reference relocation symbol\n"
+			   "Symbol resolution may be skewed if relocation was used (e.g. kexec).\n"
+			   "Check /proc/kallsyms permission or run as root.\n");
 
 	err = perf_event__synthesize_modules(tool, process_synthesized_event,
 					     machine);
-	if (err < 0)
-		pr_err("Couldn't record kernel module information.\n"
-		       "Symbol resolution may be skewed if relocation was used (e.g. kexec).\n"
-		       "Check /proc/modules permission or run as root.\n");
+	WARN_ONCE(err < 0, "Couldn't record kernel module information.\n"
+			   "Symbol resolution may be skewed if relocation was used (e.g. kexec).\n"
+			   "Check /proc/modules permission or run as root.\n");
 
 	if (perf_guest) {
 		machines__process_guests(&session->machines,
-- 
1.8.3.4

^ permalink raw reply related	[flat|nested] 60+ messages in thread

* [PATCH 19/46] perf record: Extract synthesize code to record__synthesize()
  2016-02-26  9:31 [PATCH 00/46] perf tools: Fix and improvements (bpf and overwrite) Wang Nan
                   ` (17 preceding siblings ...)
  2016-02-26  9:32 ` [PATCH 18/46] perf record: Use WARN_ONCE to replace 'if' condition Wang Nan
@ 2016-02-26  9:32 ` Wang Nan
  2016-03-05  8:16   ` [tip:perf/core] " tip-bot for Wang Nan
  2016-02-26  9:32 ` [PATCH 20/46] perf tools: Add perf_data_file__switch() helper Wang Nan
                   ` (26 subsequent siblings)
  45 siblings, 1 reply; 60+ messages in thread
From: Wang Nan @ 2016-02-26  9:32 UTC (permalink / raw)
  To: Alexei Starovoitov, Arnaldo Carvalho de Melo, Arnaldo Carvalho de Melo
  Cc: Jiri Olsa, Li Zefan, Peter Zijlstra, pi3orama, Wang Nan,
	linux-kernel, He Kuang, Masami Hiramatsu, Namhyung Kim

Create record__synthesize(). It can be used to create tracking events
for each perf.data after perf supporting splitting into multiple
outputs.

Signed-off-by: Wang Nan <wangnan0@huawei.com>
Signed-off-by: He Kuang <hekuang@huawei.com>
Cc: Arnaldo Carvalho de Melo <acme@redhat.com>
Cc: Jiri Olsa <jolsa@kernel.org>
Cc: Masami Hiramatsu <masami.hiramatsu.pt@hitachi.com>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Zefan Li <lizefan@huawei.com>
Cc: pi3orama@163.com
---
 tools/perf/builtin-record.c | 125 +++++++++++++++++++++++++-------------------
 1 file changed, 70 insertions(+), 55 deletions(-)

diff --git a/tools/perf/builtin-record.c b/tools/perf/builtin-record.c
index 9dec7e5..cb583b4 100644
--- a/tools/perf/builtin-record.c
+++ b/tools/perf/builtin-record.c
@@ -486,6 +486,74 @@ static void workload_exec_failed_signal(int signo __maybe_unused,
 
 static void snapshot_sig_handler(int sig);
 
+static int record__synthesize(struct record *rec)
+{
+	struct perf_session *session = rec->session;
+	struct machine *machine = &session->machines.host;
+	struct perf_data_file *file = &rec->file;
+	struct record_opts *opts = &rec->opts;
+	struct perf_tool *tool = &rec->tool;
+	int fd = perf_data_file__fd(file);
+	int err = 0;
+
+	if (file->is_pipe) {
+		err = perf_event__synthesize_attrs(tool, session,
+						   process_synthesized_event);
+		if (err < 0) {
+			pr_err("Couldn't synthesize attrs.\n");
+			goto out;
+		}
+
+		if (have_tracepoints(&rec->evlist->entries)) {
+			/*
+			 * FIXME err <= 0 here actually means that
+			 * there were no tracepoints so its not really
+			 * an error, just that we don't need to
+			 * synthesize anything.  We really have to
+			 * return this more properly and also
+			 * propagate errors that now are calling die()
+			 */
+			err = perf_event__synthesize_tracing_data(tool,	fd, rec->evlist,
+								  process_synthesized_event);
+			if (err <= 0) {
+				pr_err("Couldn't record tracing data.\n");
+				goto out;
+			}
+			rec->bytes_written += err;
+		}
+	}
+
+	if (rec->opts.full_auxtrace) {
+		err = perf_event__synthesize_auxtrace_info(rec->itr, tool,
+					session, process_synthesized_event);
+		if (err)
+			goto out;
+	}
+
+	err = perf_event__synthesize_kernel_mmap(tool, process_synthesized_event,
+						 machine);
+	WARN_ONCE(err < 0, "Couldn't record kernel reference relocation symbol\n"
+			   "Symbol resolution may be skewed if relocation was used (e.g. kexec).\n"
+			   "Check /proc/kallsyms permission or run as root.\n");
+
+	err = perf_event__synthesize_modules(tool, process_synthesized_event,
+					     machine);
+	WARN_ONCE(err < 0, "Couldn't record kernel module information.\n"
+			   "Symbol resolution may be skewed if relocation was used (e.g. kexec).\n"
+			   "Check /proc/modules permission or run as root.\n");
+
+	if (perf_guest) {
+		machines__process_guests(&session->machines,
+					 perf_event__synthesize_guest_os, tool);
+	}
+
+	err = __machine__synthesize_threads(machine, tool, &opts->target, rec->evlist->threads,
+					    process_synthesized_event, opts->sample_address,
+					    opts->proc_map_timeout);
+out:
+	return err;
+}
+
 static int __cmd_record(struct record *rec, int argc, const char **argv)
 {
 	int err;
@@ -580,61 +648,8 @@ static int __cmd_record(struct record *rec, int argc, const char **argv)
 
 	machine = &session->machines.host;
 
-	if (file->is_pipe) {
-		err = perf_event__synthesize_attrs(tool, session,
-						   process_synthesized_event);
-		if (err < 0) {
-			pr_err("Couldn't synthesize attrs.\n");
-			goto out_child;
-		}
-
-		if (have_tracepoints(&rec->evlist->entries)) {
-			/*
-			 * FIXME err <= 0 here actually means that
-			 * there were no tracepoints so its not really
-			 * an error, just that we don't need to
-			 * synthesize anything.  We really have to
-			 * return this more properly and also
-			 * propagate errors that now are calling die()
-			 */
-			err = perf_event__synthesize_tracing_data(tool,	fd, rec->evlist,
-								  process_synthesized_event);
-			if (err <= 0) {
-				pr_err("Couldn't record tracing data.\n");
-				goto out_child;
-			}
-			rec->bytes_written += err;
-		}
-	}
-
-	if (rec->opts.full_auxtrace) {
-		err = perf_event__synthesize_auxtrace_info(rec->itr, tool,
-					session, process_synthesized_event);
-		if (err)
-			goto out_delete_session;
-	}
-
-	err = perf_event__synthesize_kernel_mmap(tool, process_synthesized_event,
-						 machine);
-	WARN_ONCE(err < 0, "Couldn't record kernel reference relocation symbol\n"
-			   "Symbol resolution may be skewed if relocation was used (e.g. kexec).\n"
-			   "Check /proc/kallsyms permission or run as root.\n");
-
-	err = perf_event__synthesize_modules(tool, process_synthesized_event,
-					     machine);
-	WARN_ONCE(err < 0, "Couldn't record kernel module information.\n"
-			   "Symbol resolution may be skewed if relocation was used (e.g. kexec).\n"
-			   "Check /proc/modules permission or run as root.\n");
-
-	if (perf_guest) {
-		machines__process_guests(&session->machines,
-					 perf_event__synthesize_guest_os, tool);
-	}
-
-	err = __machine__synthesize_threads(machine, tool, &opts->target, rec->evlist->threads,
-					    process_synthesized_event, opts->sample_address,
-					    opts->proc_map_timeout);
-	if (err != 0)
+	err = record__synthesize(rec);
+	if (err < 0)
 		goto out_child;
 
 	if (rec->realtime_prio) {
-- 
1.8.3.4

^ permalink raw reply related	[flat|nested] 60+ messages in thread

* [PATCH 20/46] perf tools: Add perf_data_file__switch() helper
  2016-02-26  9:31 [PATCH 00/46] perf tools: Fix and improvements (bpf and overwrite) Wang Nan
                   ` (18 preceding siblings ...)
  2016-02-26  9:32 ` [PATCH 19/46] perf record: Extract synthesize code to record__synthesize() Wang Nan
@ 2016-02-26  9:32 ` Wang Nan
  2016-02-26  9:32 ` [PATCH 21/46] perf record: Turns auxtrace_snapshot_enable into 3 states Wang Nan
                   ` (25 subsequent siblings)
  45 siblings, 0 replies; 60+ messages in thread
From: Wang Nan @ 2016-02-26  9:32 UTC (permalink / raw)
  To: Alexei Starovoitov, Arnaldo Carvalho de Melo, Arnaldo Carvalho de Melo
  Cc: Jiri Olsa, Li Zefan, Peter Zijlstra, pi3orama, Wang Nan,
	linux-kernel, He Kuang, Masami Hiramatsu, Namhyung Kim

perf_data_file__switch() closes current output file, renames it, then
open a new one to continue record. It will be used by perf record
to split output into multiple perf.data files.

Signed-off-by: Wang Nan <wangnan0@huawei.com>
Signed-off-by: He Kuang <hekuang@huawei.com>
Cc: Arnaldo Carvalho de Melo <acme@redhat.com>
Cc: Jiri Olsa <jolsa@kernel.org>
Cc: Masami Hiramatsu <masami.hiramatsu.pt@hitachi.com>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Zefan Li <lizefan@huawei.com>
Cc: pi3orama@163.com
---
 tools/perf/util/data.c | 38 ++++++++++++++++++++++++++++++++++++++
 tools/perf/util/data.h | 11 ++++++++++-
 2 files changed, 48 insertions(+), 1 deletion(-)

diff --git a/tools/perf/util/data.c b/tools/perf/util/data.c
index 1921942..6840fc9 100644
--- a/tools/perf/util/data.c
+++ b/tools/perf/util/data.c
@@ -136,3 +136,41 @@ ssize_t perf_data_file__write(struct perf_data_file *file,
 {
 	return writen(file->fd, buf, size);
 }
+
+int perf_data_file__switch(struct perf_data_file *file,
+			   const char *postfix,
+			   size_t pos, bool at_exit)
+{
+	char *new_filepath;
+	int ret;
+
+	if (check_pipe(file))
+		return -EINVAL;
+	if (perf_data_file__is_read(file))
+		return -EINVAL;
+
+	if (asprintf(&new_filepath, "%s.%s", file->path, postfix) < 0)
+		return -ENOMEM;
+
+	/* Only fire a warning, don't return error. */
+	if (rename(file->path, new_filepath))
+		pr_warning("Failed to rename %s to %s\n", file->path, new_filepath);
+
+	if (!at_exit) {
+		close(file->fd);
+		ret = perf_data_file__open(file);
+		if (ret < 0)
+			goto out;
+
+		if (lseek(file->fd, pos, SEEK_SET) == (off_t)-1) {
+			ret = -errno;
+			pr_debug("Failed to lseek to %zu: %s",
+				 pos, strerror(errno));
+			goto out;
+		}
+	}
+	ret = file->fd;
+out:
+	free(new_filepath);
+	return ret;
+}
diff --git a/tools/perf/util/data.h b/tools/perf/util/data.h
index 2b15d0c..ae510ce 100644
--- a/tools/perf/util/data.h
+++ b/tools/perf/util/data.h
@@ -46,5 +46,14 @@ int perf_data_file__open(struct perf_data_file *file);
 void perf_data_file__close(struct perf_data_file *file);
 ssize_t perf_data_file__write(struct perf_data_file *file,
 			      void *buf, size_t size);
-
+/*
+ * If at_exit is set, only rename current perf.data to
+ * perf.data.<postfix>, continue write on original file.
+ * Set at_exit when flushing the last output.
+ *
+ * Return value is fd of new output.
+ */
+int perf_data_file__switch(struct perf_data_file *file,
+			   const char *postfix,
+			   size_t pos, bool at_exit);
 #endif /* __PERF_DATA_H */
-- 
1.8.3.4

^ permalink raw reply related	[flat|nested] 60+ messages in thread

* [PATCH 21/46] perf record: Turns auxtrace_snapshot_enable into 3 states
  2016-02-26  9:31 [PATCH 00/46] perf tools: Fix and improvements (bpf and overwrite) Wang Nan
                   ` (19 preceding siblings ...)
  2016-02-26  9:32 ` [PATCH 20/46] perf tools: Add perf_data_file__switch() helper Wang Nan
@ 2016-02-26  9:32 ` Wang Nan
  2016-02-26  9:32 ` [PATCH 22/46] perf record: Introduce record__finish_output() to finish a perf.data Wang Nan
                   ` (24 subsequent siblings)
  45 siblings, 0 replies; 60+ messages in thread
From: Wang Nan @ 2016-02-26  9:32 UTC (permalink / raw)
  To: Alexei Starovoitov, Arnaldo Carvalho de Melo, Arnaldo Carvalho de Melo
  Cc: Jiri Olsa, Li Zefan, Peter Zijlstra, pi3orama, Wang Nan,
	linux-kernel, He Kuang, Masami Hiramatsu, Namhyung Kim

auxtrace_snapshot_enable has only two states (0/1). Turns it into a
triple states enum so SIGUSR2 handler can safely do other works without
triggering auxtrace snapshot.

Signed-off-by: Wang Nan <wangnan0@huawei.com>
Signed-off-by: He Kuang <hekuang@huawei.com>
Acked-by: Jiri Olsa <jolsa@kernel.org>
Cc: Arnaldo Carvalho de Melo <acme@redhat.com>
Cc: Masami Hiramatsu <masami.hiramatsu.pt@hitachi.com>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Zefan Li <lizefan@huawei.com>
Cc: pi3orama@163.com
---
 tools/perf/builtin-record.c | 59 +++++++++++++++++++++++++++++++++++++--------
 1 file changed, 49 insertions(+), 10 deletions(-)

diff --git a/tools/perf/builtin-record.c b/tools/perf/builtin-record.c
index cb583b4..6257fd7 100644
--- a/tools/perf/builtin-record.c
+++ b/tools/perf/builtin-record.c
@@ -124,7 +124,43 @@ out:
 static volatile int done;
 static volatile int signr = -1;
 static volatile int child_finished;
-static volatile int auxtrace_snapshot_enabled;
+
+static volatile enum {
+	AUXTRACE_SNAPSHOT_OFF = -1,
+	AUXTRACE_SNAPSHOT_DISABLED = 0,
+	AUXTRACE_SNAPSHOT_ENABLED = 1,
+} auxtrace_snapshot_state = AUXTRACE_SNAPSHOT_OFF;
+
+static inline void
+auxtrace_snapshot_on(void)
+{
+	auxtrace_snapshot_state = AUXTRACE_SNAPSHOT_DISABLED;
+}
+
+static inline void
+auxtrace_snapshot_enable(void)
+{
+	if (auxtrace_snapshot_state == AUXTRACE_SNAPSHOT_OFF)
+		return;
+	auxtrace_snapshot_state = AUXTRACE_SNAPSHOT_ENABLED;
+}
+
+static inline void
+auxtrace_snapshot_disable(void)
+{
+	if (auxtrace_snapshot_state == AUXTRACE_SNAPSHOT_OFF)
+		return;
+	auxtrace_snapshot_state = AUXTRACE_SNAPSHOT_DISABLED;
+}
+
+static inline bool
+auxtrace_snapshot_is_enabled(void)
+{
+	if (auxtrace_snapshot_state == AUXTRACE_SNAPSHOT_OFF)
+		return false;
+	return auxtrace_snapshot_state == AUXTRACE_SNAPSHOT_ENABLED;
+}
+
 static volatile int auxtrace_snapshot_err;
 static volatile int auxtrace_record__snapshot_started;
 
@@ -248,7 +284,7 @@ static void record__read_auxtrace_snapshot(struct record *rec)
 	} else {
 		auxtrace_snapshot_err = auxtrace_record__snapshot_finish(rec->itr);
 		if (!auxtrace_snapshot_err)
-			auxtrace_snapshot_enabled = 1;
+			auxtrace_snapshot_enable();
 	}
 }
 
@@ -574,10 +610,13 @@ static int __cmd_record(struct record *rec, int argc, const char **argv)
 	signal(SIGCHLD, sig_handler);
 	signal(SIGINT, sig_handler);
 	signal(SIGTERM, sig_handler);
-	if (rec->opts.auxtrace_snapshot_mode)
+
+	if (rec->opts.auxtrace_snapshot_mode) {
 		signal(SIGUSR2, snapshot_sig_handler);
-	else
+		auxtrace_snapshot_on();
+	} else {
 		signal(SIGUSR2, SIG_IGN);
+	}
 
 	session = perf_session__new(file, false, tool);
 	if (session == NULL) {
@@ -703,12 +742,12 @@ static int __cmd_record(struct record *rec, int argc, const char **argv)
 		perf_evlist__enable(rec->evlist);
 	}
 
-	auxtrace_snapshot_enabled = 1;
+	auxtrace_snapshot_enable();
 	for (;;) {
 		unsigned long long hits = rec->samples;
 
 		if (record__mmap_read_all(rec) < 0) {
-			auxtrace_snapshot_enabled = 0;
+			auxtrace_snapshot_disable();
 			err = -1;
 			goto out_child;
 		}
@@ -746,12 +785,12 @@ static int __cmd_record(struct record *rec, int argc, const char **argv)
 		 * disable events in this case.
 		 */
 		if (done && !disabled && !target__none(&opts->target)) {
-			auxtrace_snapshot_enabled = 0;
+			auxtrace_snapshot_disable();
 			perf_evlist__disable(rec->evlist);
 			disabled = true;
 		}
 	}
-	auxtrace_snapshot_enabled = 0;
+	auxtrace_snapshot_disable();
 
 	if (forks && workload_exec_errno) {
 		char msg[STRERR_BUFSIZE];
@@ -1319,9 +1358,9 @@ out_symbol_exit:
 
 static void snapshot_sig_handler(int sig __maybe_unused)
 {
-	if (!auxtrace_snapshot_enabled)
+	if (!auxtrace_snapshot_is_enabled())
 		return;
-	auxtrace_snapshot_enabled = 0;
+	auxtrace_snapshot_disable();
 	auxtrace_snapshot_err = auxtrace_record__snapshot_start(record.itr);
 	auxtrace_record__snapshot_started = 1;
 }
-- 
1.8.3.4

^ permalink raw reply related	[flat|nested] 60+ messages in thread

* [PATCH 22/46] perf record: Introduce record__finish_output() to finish a perf.data
  2016-02-26  9:31 [PATCH 00/46] perf tools: Fix and improvements (bpf and overwrite) Wang Nan
                   ` (20 preceding siblings ...)
  2016-02-26  9:32 ` [PATCH 21/46] perf record: Turns auxtrace_snapshot_enable into 3 states Wang Nan
@ 2016-02-26  9:32 ` Wang Nan
  2016-03-05  8:16   ` [tip:perf/core] " tip-bot for Wang Nan
  2016-02-26  9:32 ` [PATCH 23/46] perf record: Add '--timestamp-filename' option to append timestamp to output filename Wang Nan
                   ` (23 subsequent siblings)
  45 siblings, 1 reply; 60+ messages in thread
From: Wang Nan @ 2016-02-26  9:32 UTC (permalink / raw)
  To: Alexei Starovoitov, Arnaldo Carvalho de Melo, Arnaldo Carvalho de Melo
  Cc: Jiri Olsa, Li Zefan, Peter Zijlstra, pi3orama, Wang Nan,
	linux-kernel, He Kuang, Masami Hiramatsu, Namhyung Kim

Move code for finalizing 'perf.data' to record__finish_output(). It
will be used by following commits to split output to multiple files.

Signed-off-by: Wang Nan <wangnan0@huawei.com>
Signed-off-by: He Kuang <hekuang@huawei.com>
Cc: Arnaldo Carvalho de Melo <acme@redhat.com>
Cc: Jiri Olsa <jolsa@kernel.org>
Cc: Masami Hiramatsu <masami.hiramatsu.pt@hitachi.com>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Zefan Li <lizefan@huawei.com>
Cc: pi3orama@163.com
---
 tools/perf/builtin-record.c | 37 +++++++++++++++++++++++++------------
 1 file changed, 25 insertions(+), 12 deletions(-)

diff --git a/tools/perf/builtin-record.c b/tools/perf/builtin-record.c
index 6257fd7..31aa1d4 100644
--- a/tools/perf/builtin-record.c
+++ b/tools/perf/builtin-record.c
@@ -504,6 +504,29 @@ static void record__init_features(struct record *rec)
 	perf_header__clear_feat(&session->header, HEADER_STAT);
 }
 
+static void
+record__finish_output(struct record *rec)
+{
+	struct perf_data_file *file = &rec->file;
+	int fd = perf_data_file__fd(file);
+
+	if (file->is_pipe)
+		return;
+
+	rec->session->header.data_size += rec->bytes_written;
+	file->size = lseek(perf_data_file__fd(file), 0, SEEK_CUR);
+
+	if (!rec->no_buildid) {
+		process_buildids(rec);
+
+		if (rec->buildid_all)
+			dsos__hit_all(rec->session);
+	}
+	perf_session__write_header(rec->session, rec->evlist, fd, true);
+
+	return;
+}
+
 static volatile int workload_exec_errno;
 
 /*
@@ -824,18 +847,8 @@ out_child:
 	/* this will be recalculated during process_buildids() */
 	rec->samples = 0;
 
-	if (!err && !file->is_pipe) {
-		rec->session->header.data_size += rec->bytes_written;
-		file->size = lseek(perf_data_file__fd(file), 0, SEEK_CUR);
-
-		if (!rec->no_buildid) {
-			process_buildids(rec);
-
-			if (rec->buildid_all)
-				dsos__hit_all(rec->session);
-		}
-		perf_session__write_header(rec->session, rec->evlist, fd, true);
-	}
+	if (!err)
+		record__finish_output(rec);
 
 	if (!err && !quiet) {
 		char samples[128];
-- 
1.8.3.4

^ permalink raw reply related	[flat|nested] 60+ messages in thread

* [PATCH 23/46] perf record: Add '--timestamp-filename' option to append timestamp to output filename
  2016-02-26  9:31 [PATCH 00/46] perf tools: Fix and improvements (bpf and overwrite) Wang Nan
                   ` (21 preceding siblings ...)
  2016-02-26  9:32 ` [PATCH 22/46] perf record: Introduce record__finish_output() to finish a perf.data Wang Nan
@ 2016-02-26  9:32 ` Wang Nan
  2016-02-26  9:32 ` [PATCH 24/46] perf record: Split output into multiple files via '--switch-output' Wang Nan
                   ` (22 subsequent siblings)
  45 siblings, 0 replies; 60+ messages in thread
From: Wang Nan @ 2016-02-26  9:32 UTC (permalink / raw)
  To: Alexei Starovoitov, Arnaldo Carvalho de Melo, Arnaldo Carvalho de Melo
  Cc: Jiri Olsa, Li Zefan, Peter Zijlstra, pi3orama, Wang Nan,
	linux-kernel, He Kuang, Masami Hiramatsu, Namhyung Kim

This options append current timestamp to output. For example:

 # perf record -a --timestamp-filename
 ^C[ perf record: Woken up 1 times to write data ]
 [ perf record: Dump perf.data.2015122622265847 ]
 [ perf record: Captured and wrote 0.742 MB perf.data (90 samples) ]
 # ls
 perf.data.201512262226584

After 'perf record' support generating multiple output files, timestamp
would be useful to identify each of them.

Signed-off-by: Wang Nan <wangnan0@huawei.com>
Signed-off-by: He Kuang <hekuang@huawei.com>
Cc: Arnaldo Carvalho de Melo <acme@redhat.com>
Cc: Jiri Olsa <jolsa@kernel.org>
Cc: Masami Hiramatsu <masami.hiramatsu.pt@hitachi.com>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Zefan Li <lizefan@huawei.com>
Cc: pi3orama@163.com
---
 tools/perf/builtin-record.c | 47 +++++++++++++++++++++++++++++++++++++++++++--
 1 file changed, 45 insertions(+), 2 deletions(-)

diff --git a/tools/perf/builtin-record.c b/tools/perf/builtin-record.c
index 31aa1d4..b982eec 100644
--- a/tools/perf/builtin-record.c
+++ b/tools/perf/builtin-record.c
@@ -55,6 +55,7 @@ struct record {
 	bool			no_buildid_cache;
 	bool			no_buildid_cache_set;
 	bool			buildid_all;
+	bool			timestamp_filename;
 	unsigned long long	samples;
 };
 
@@ -527,6 +528,37 @@ record__finish_output(struct record *rec)
 	return;
 }
 
+static int
+record__switch_output(struct record *rec, bool at_exit)
+{
+	struct perf_data_file *file = &rec->file;
+	int fd, err;
+
+	/* Same Size:      "2015122520103046"*/
+	char timestamp[] = "InvalidTimestamp";
+
+	rec->samples = 0;
+	record__finish_output(rec);
+	err = fetch_current_timestamp(timestamp, sizeof(timestamp));
+	if (err) {
+		pr_err("Failed to get current timestamp\n");
+		return -EINVAL;
+	}
+
+	fd = perf_data_file__switch(file, timestamp,
+				    rec->session->header.data_offset,
+				    at_exit);
+	if (fd >= 0 && !at_exit) {
+		rec->bytes_written = 0;
+		rec->session->header.data_size = 0;
+	}
+
+	if (!quiet)
+		fprintf(stderr, "[ perf record: Dump %s.%s ]\n",
+			file->path, timestamp);
+	return fd;
+}
+
 static volatile int workload_exec_errno;
 
 /*
@@ -847,8 +879,17 @@ out_child:
 	/* this will be recalculated during process_buildids() */
 	rec->samples = 0;
 
-	if (!err)
-		record__finish_output(rec);
+	if (!err) {
+		if (!rec->timestamp_filename) {
+			record__finish_output(rec);
+		} else {
+			fd = record__switch_output(rec, true);
+			if (fd < 0) {
+				status = fd;
+				goto out_delete_session;
+			}
+		}
+	}
 
 	if (!err && !quiet) {
 		char samples[128];
@@ -1231,6 +1272,8 @@ struct option __record_options[] = {
 		   "file", "vmlinux pathname"),
 	OPT_BOOLEAN(0, "buildid-all", &record.buildid_all,
 		    "Record build-id of all DSOs regardless of hits"),
+	OPT_BOOLEAN(0, "timestamp-filename", &record.timestamp_filename,
+		    "append timestamp to output filename"),
 	OPT_END()
 };
 
-- 
1.8.3.4

^ permalink raw reply related	[flat|nested] 60+ messages in thread

* [PATCH 24/46] perf record: Split output into multiple files via '--switch-output'
  2016-02-26  9:31 [PATCH 00/46] perf tools: Fix and improvements (bpf and overwrite) Wang Nan
                   ` (22 preceding siblings ...)
  2016-02-26  9:32 ` [PATCH 23/46] perf record: Add '--timestamp-filename' option to append timestamp to output filename Wang Nan
@ 2016-02-26  9:32 ` Wang Nan
  2016-02-26  9:32 ` [PATCH 25/46] perf record: Force enable --timestamp-filename when --switch-output is provided Wang Nan
                   ` (21 subsequent siblings)
  45 siblings, 0 replies; 60+ messages in thread
From: Wang Nan @ 2016-02-26  9:32 UTC (permalink / raw)
  To: Alexei Starovoitov, Arnaldo Carvalho de Melo, Arnaldo Carvalho de Melo
  Cc: Jiri Olsa, Li Zefan, Peter Zijlstra, pi3orama, Wang Nan,
	linux-kernel, He Kuang, Masami Hiramatsu, Namhyung Kim

Allow 'perf record' splits its output into multiple files.

For example:

 # ~/perf record -a --timestamp-filename --switch-output &
 [1] 10763
 # kill -s SIGUSR2 10763
 [ perf record: dump data: Woken up 1 times ]
 # [ perf record: Dump perf.data.2015122622314468 ]

 # kill -s SIGUSR2 10763
 [ perf record: dump data: Woken up 1 times ]
 # [ perf record: Dump perf.data.2015122622314762 ]

 # kill -s SIGUSR2 10763
 [ perf record: dump data: Woken up 1 times ]
 #[ perf record: Dump perf.data.2015122622315171 ]

 # fg
 perf record -a --timestamp-filename --switch-output
 ^C[ perf record: Woken up 1 times to write data ]
 [ perf record: Dump perf.data.2015122622315513 ]
 [ perf record: Captured and wrote 0.014 MB perf.data (296 samples) ]

 # ls -l
 total 920
 -rw------- 1 root root 797692 Dec 26 22:31 perf.data.2015122622314468
 -rw------- 1 root root  59960 Dec 26 22:31 perf.data.2015122622314762
 -rw------- 1 root root  59912 Dec 26 22:31 perf.data.2015122622315171
 -rw------- 1 root root  19220 Dec 26 22:31 perf.data.2015122622315513

Signed-off-by: Wang Nan <wangnan0@huawei.com>
Signed-off-by: He Kuang <hekuang@huawei.com>
Cc: Arnaldo Carvalho de Melo <acme@redhat.com>
Cc: Jiri Olsa <jolsa@kernel.org>
Cc: Masami Hiramatsu <masami.hiramatsu.pt@hitachi.com>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Zefan Li <lizefan@huawei.com>
Cc: pi3orama@163.com
---
 tools/perf/builtin-record.c | 34 ++++++++++++++++++++++++++++------
 1 file changed, 28 insertions(+), 6 deletions(-)

diff --git a/tools/perf/builtin-record.c b/tools/perf/builtin-record.c
index b982eec..897d720 100644
--- a/tools/perf/builtin-record.c
+++ b/tools/perf/builtin-record.c
@@ -56,6 +56,7 @@ struct record {
 	bool			no_buildid_cache_set;
 	bool			buildid_all;
 	bool			timestamp_filename;
+	bool			switch_output;
 	unsigned long long	samples;
 };
 
@@ -164,6 +165,7 @@ auxtrace_snapshot_is_enabled(void)
 
 static volatile int auxtrace_snapshot_err;
 static volatile int auxtrace_record__snapshot_started;
+static volatile int switch_output_started;
 
 static void sig_handler(int sig)
 {
@@ -666,7 +668,7 @@ static int __cmd_record(struct record *rec, int argc, const char **argv)
 	signal(SIGINT, sig_handler);
 	signal(SIGTERM, sig_handler);
 
-	if (rec->opts.auxtrace_snapshot_mode) {
+	if (rec->opts.auxtrace_snapshot_mode || rec->switch_output) {
 		signal(SIGUSR2, snapshot_sig_handler);
 		auxtrace_snapshot_on();
 	} else {
@@ -818,9 +820,25 @@ static int __cmd_record(struct record *rec, int argc, const char **argv)
 			}
 		}
 
+		if (switch_output_started) {
+			switch_output_started = 0;
+
+			if (!quiet)
+				fprintf(stderr, "[ perf record: dump data: Woken up %ld times ]\n",
+					waking);
+			waking = 0;
+			fd = record__switch_output(rec, false);
+			if (fd < 0) {
+				pr_err("Failed to switch to new file\n");
+				err = fd;
+				goto out_child;
+			}
+		}
+
 		if (hits == rec->samples) {
 			if (done || draining)
 				break;
+
 			err = perf_evlist__poll(rec->evlist, -1);
 			/*
 			 * Propagate error, only if there's any. Ignore positive
@@ -1274,6 +1292,8 @@ struct option __record_options[] = {
 		    "Record build-id of all DSOs regardless of hits"),
 	OPT_BOOLEAN(0, "timestamp-filename", &record.timestamp_filename,
 		    "append timestamp to output filename"),
+	OPT_BOOLEAN(0, "switch-output", &record.switch_output,
+		    "Switch output when receive SIGUSR2"),
 	OPT_END()
 };
 
@@ -1414,9 +1434,11 @@ out_symbol_exit:
 
 static void snapshot_sig_handler(int sig __maybe_unused)
 {
-	if (!auxtrace_snapshot_is_enabled())
-		return;
-	auxtrace_snapshot_disable();
-	auxtrace_snapshot_err = auxtrace_record__snapshot_start(record.itr);
-	auxtrace_record__snapshot_started = 1;
+	if (auxtrace_snapshot_is_enabled()) {
+		auxtrace_snapshot_disable();
+		auxtrace_snapshot_err = auxtrace_record__snapshot_start(record.itr);
+		auxtrace_record__snapshot_started = 1;
+	}
+
+	switch_output_started = 1;
 }
-- 
1.8.3.4

^ permalink raw reply related	[flat|nested] 60+ messages in thread

* [PATCH 25/46] perf record: Force enable --timestamp-filename when --switch-output is provided
  2016-02-26  9:31 [PATCH 00/46] perf tools: Fix and improvements (bpf and overwrite) Wang Nan
                   ` (23 preceding siblings ...)
  2016-02-26  9:32 ` [PATCH 24/46] perf record: Split output into multiple files via '--switch-output' Wang Nan
@ 2016-02-26  9:32 ` Wang Nan
  2016-02-26  9:32 ` [PATCH 26/46] perf record: Disable buildid cache options by default in switch output mode Wang Nan
                   ` (20 subsequent siblings)
  45 siblings, 0 replies; 60+ messages in thread
From: Wang Nan @ 2016-02-26  9:32 UTC (permalink / raw)
  To: Alexei Starovoitov, Arnaldo Carvalho de Melo, Arnaldo Carvalho de Melo
  Cc: Jiri Olsa, Li Zefan, Peter Zijlstra, pi3orama, Wang Nan,
	linux-kernel, He Kuang, Masami Hiramatsu, Namhyung Kim

Without this patch, the last output doesn't have timestamp appended if
--timestamp-filename is not explicitly provided. For example:

 # perf record -a --switch-output &
 [1] 11224
 # kill -s SIGUSR2 11224
 [ perf record: dump data: Woken up 1 times ]
 # [ perf record: Dump perf.data.2015122622372823 ]

 # fg
 perf record -a --switch-output
 ^C[ perf record: Woken up 1 times to write data ]
 [ perf record: Captured and wrote 0.027 MB perf.data (540 samples) ]

 # ls -l
 total 836
 -rw------- 1 root root  33256 Dec 26 22:37 perf.data   <---- *Odd*
 -rw------- 1 root root 817156 Dec 26 22:37 perf.data.2015122622372823

Signed-off-by: Wang Nan <wangnan0@huawei.com>
Signed-off-by: He Kuang <hekuang@huawei.com>
Cc: Arnaldo Carvalho de Melo <acme@redhat.com>
Cc: Jiri Olsa <jolsa@kernel.org>
Cc: Masami Hiramatsu <masami.hiramatsu.pt@hitachi.com>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Zefan Li <lizefan@huawei.com>
Cc: pi3orama@163.com
---
 tools/perf/builtin-record.c | 3 +++
 1 file changed, 3 insertions(+)

diff --git a/tools/perf/builtin-record.c b/tools/perf/builtin-record.c
index 897d720..0092e54 100644
--- a/tools/perf/builtin-record.c
+++ b/tools/perf/builtin-record.c
@@ -1349,6 +1349,9 @@ int cmd_record(int argc, const char **argv, const char *prefix __maybe_unused)
 		return -EINVAL;
 	}
 
+	if (rec->switch_output)
+		rec->timestamp_filename = true;
+
 	if (!rec->itr) {
 		rec->itr = auxtrace_record__init(rec->evlist, &err);
 		if (err)
-- 
1.8.3.4

^ permalink raw reply related	[flat|nested] 60+ messages in thread

* [PATCH 26/46] perf record: Disable buildid cache options by default in switch output mode
  2016-02-26  9:31 [PATCH 00/46] perf tools: Fix and improvements (bpf and overwrite) Wang Nan
                   ` (24 preceding siblings ...)
  2016-02-26  9:32 ` [PATCH 25/46] perf record: Force enable --timestamp-filename when --switch-output is provided Wang Nan
@ 2016-02-26  9:32 ` Wang Nan
  2016-02-26  9:32 ` [PATCH 27/46] perf record: Re-synthesize tracking events after output switching Wang Nan
                   ` (19 subsequent siblings)
  45 siblings, 0 replies; 60+ messages in thread
From: Wang Nan @ 2016-02-26  9:32 UTC (permalink / raw)
  To: Alexei Starovoitov, Arnaldo Carvalho de Melo, Arnaldo Carvalho de Melo
  Cc: Jiri Olsa, Li Zefan, Peter Zijlstra, pi3orama, Wang Nan,
	linux-kernel, He Kuang, Masami Hiramatsu, Namhyung Kim

Cost of buildid cache processing is high: read all events in output
perf.data, open elf files to read buildid then copy them into
~/.debug directory. In switch output mode, causes perf stop receiving
from perf events for too long.

Enable no-buildid and no-buildid-cache by default if --switch-output
is provided. Still allow user use --no-no-buildid to explicitly enable
buildid in this case.

Signed-off-by: Wang Nan <wangnan0@huawei.com>
Signed-off-by: He Kuang <hekuang@huawei.com>
Cc: Arnaldo Carvalho de Melo <acme@redhat.com>
Cc: Jiri Olsa <jolsa@kernel.org>
Cc: Masami Hiramatsu <masami.hiramatsu.pt@hitachi.com>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Zefan Li <lizefan@huawei.com>
Cc: pi3orama@163.com
---
 tools/perf/builtin-record.c | 30 +++++++++++++++++++++++++++++-
 1 file changed, 29 insertions(+), 1 deletion(-)

diff --git a/tools/perf/builtin-record.c b/tools/perf/builtin-record.c
index 0092e54..8a1523f 100644
--- a/tools/perf/builtin-record.c
+++ b/tools/perf/builtin-record.c
@@ -1377,8 +1377,36 @@ int cmd_record(int argc, const char **argv, const char *prefix __maybe_unused)
 "If some relocation was applied (e.g. kexec) symbols may be misresolved\n"
 "even with a suitable vmlinux or kallsyms file.\n\n");
 
-	if (rec->no_buildid_cache || rec->no_buildid)
+	if (rec->no_buildid_cache || rec->no_buildid) {
 		disable_buildid_cache();
+	} else if (rec->switch_output) {
+		/*
+		 * In 'perf record --switch-output', disable buildid
+		 * generation by default to reduce data file switching
+		 * overhead. Still generate buildid if they are required
+		 * explicitly using
+		 *
+		 *  perf record --signal-trigger --no-no-buildid \
+		 *              --no-no-buildid-cache
+		 *
+		 * Following code equals to:
+		 *
+		 * if ((rec->no_buildid || !rec->no_buildid_set) &&
+		 *     (rec->no_buildid_cache || !rec->no_buildid_cache_set))
+		 *         disable_buildid_cache();
+		 */
+		bool disable = true;
+
+		if (rec->no_buildid_set && !rec->no_buildid)
+			disable = false;
+		if (rec->no_buildid_cache_set && !rec->no_buildid_cache)
+			disable = false;
+		if (disable) {
+			rec->no_buildid = true;
+			rec->no_buildid_cache = true;
+			disable_buildid_cache();
+		}
+	}
 
 	if (rec->evlist->nr_entries == 0 &&
 	    perf_evlist__add_default(rec->evlist) < 0) {
-- 
1.8.3.4

^ permalink raw reply related	[flat|nested] 60+ messages in thread

* [PATCH 27/46] perf record: Re-synthesize tracking events after output switching
  2016-02-26  9:31 [PATCH 00/46] perf tools: Fix and improvements (bpf and overwrite) Wang Nan
                   ` (25 preceding siblings ...)
  2016-02-26  9:32 ` [PATCH 26/46] perf record: Disable buildid cache options by default in switch output mode Wang Nan
@ 2016-02-26  9:32 ` Wang Nan
  2016-02-26  9:32 ` [PATCH 28/46] perf record: Generate tracking events for process forked by perf Wang Nan
                   ` (18 subsequent siblings)
  45 siblings, 0 replies; 60+ messages in thread
From: Wang Nan @ 2016-02-26  9:32 UTC (permalink / raw)
  To: Alexei Starovoitov, Arnaldo Carvalho de Melo, Arnaldo Carvalho de Melo
  Cc: Jiri Olsa, Li Zefan, Peter Zijlstra, pi3orama, Wang Nan,
	linux-kernel, He Kuang, Masami Hiramatsu, Namhyung Kim

Tracking events describe kernel and threads. They are generated by
reading /proc/kallsyms, /proc/*/maps and /proc/*/task/* during
initialization of 'perf record', serialized into event sequences and put
at the head of 'perf.data'. In case of output switching, each output
file should contain those events.

This patch calls record__synthesize() during output switching, so the
event sequences described above can be collected again.

Signed-off-by: Wang Nan <wangnan0@huawei.com>
Signed-off-by: He Kuang <hekuang@huawei.com>
Cc: Arnaldo Carvalho de Melo <acme@redhat.com>
Cc: Jiri Olsa <jolsa@kernel.org>
Cc: Masami Hiramatsu <masami.hiramatsu.pt@hitachi.com>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Zefan Li <lizefan@huawei.com>
Cc: pi3orama@163.com
---
 tools/perf/builtin-record.c | 7 +++++++
 1 file changed, 7 insertions(+)

diff --git a/tools/perf/builtin-record.c b/tools/perf/builtin-record.c
index 8a1523f..b6feea2 100644
--- a/tools/perf/builtin-record.c
+++ b/tools/perf/builtin-record.c
@@ -530,6 +530,8 @@ record__finish_output(struct record *rec)
 	return;
 }
 
+static int record__synthesize(struct record *rec);
+
 static int
 record__switch_output(struct record *rec, bool at_exit)
 {
@@ -558,6 +560,11 @@ record__switch_output(struct record *rec, bool at_exit)
 	if (!quiet)
 		fprintf(stderr, "[ perf record: Dump %s.%s ]\n",
 			file->path, timestamp);
+
+	/* Output tracking events */
+	if (!at_exit)
+		record__synthesize(rec);
+
 	return fd;
 }
 
-- 
1.8.3.4

^ permalink raw reply related	[flat|nested] 60+ messages in thread

* [PATCH 28/46] perf record: Generate tracking events for process forked by perf
  2016-02-26  9:31 [PATCH 00/46] perf tools: Fix and improvements (bpf and overwrite) Wang Nan
                   ` (26 preceding siblings ...)
  2016-02-26  9:32 ` [PATCH 27/46] perf record: Re-synthesize tracking events after output switching Wang Nan
@ 2016-02-26  9:32 ` Wang Nan
  2016-02-26  9:32 ` [PATCH 29/46] perf record: Ensure return non-zero rc when mmap fail Wang Nan
                   ` (17 subsequent siblings)
  45 siblings, 0 replies; 60+ messages in thread
From: Wang Nan @ 2016-02-26  9:32 UTC (permalink / raw)
  To: Alexei Starovoitov, Arnaldo Carvalho de Melo, Arnaldo Carvalho de Melo
  Cc: Jiri Olsa, Li Zefan, Peter Zijlstra, pi3orama, Wang Nan,
	linux-kernel, He Kuang, Masami Hiramatsu, Namhyung Kim

With 'perf record --switch-output' without -a, record__synthesize() in
record__switch_output() won't generate tracking events because there's
no thread_map in evlist. Which causes newly created perf.data doesn't
contain map and comm information.

This patch creates a fake thread_map and directly call
perf_event__synthesize_thread_map() for those events.

Signed-off-by: Wang Nan <wangnan0@huawei.com>
Signed-off-by: He Kuang <hekuang@huawei.com>
Cc: Arnaldo Carvalho de Melo <acme@redhat.com>
Cc: Jiri Olsa <jolsa@kernel.org>
Cc: Masami Hiramatsu <masami.hiramatsu.pt@hitachi.com>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Zefan Li <lizefan@huawei.com>
Cc: pi3orama@163.com
---
 tools/perf/builtin-record.c | 31 ++++++++++++++++++++++++++++++-
 1 file changed, 30 insertions(+), 1 deletion(-)

diff --git a/tools/perf/builtin-record.c b/tools/perf/builtin-record.c
index b6feea2..a2de15b 100644
--- a/tools/perf/builtin-record.c
+++ b/tools/perf/builtin-record.c
@@ -530,6 +530,23 @@ record__finish_output(struct record *rec)
 	return;
 }
 
+static int record__synthesize_workload(struct record *rec)
+{
+	struct {
+		struct thread_map map;
+		struct thread_map_data map_data;
+	} thread_map;
+
+	thread_map.map.nr = 1;
+	thread_map.map.map[0].pid = rec->evlist->workload.pid;
+	thread_map.map.map[0].comm = NULL;
+	return perf_event__synthesize_thread_map(&rec->tool, &thread_map.map,
+						 process_synthesized_event,
+						 &rec->session->machines.host,
+						 rec->opts.sample_address,
+						 rec->opts.proc_map_timeout);
+}
+
 static int record__synthesize(struct record *rec);
 
 static int
@@ -562,9 +579,21 @@ record__switch_output(struct record *rec, bool at_exit)
 			file->path, timestamp);
 
 	/* Output tracking events */
-	if (!at_exit)
+	if (!at_exit) {
 		record__synthesize(rec);
 
+		/*
+		 * In 'perf record --switch-output' without -a,
+		 * record__synthesize() in record__switch_output() won't
+		 * generate tracking events because there's no thread_map
+		 * in evlist. Which causes newly created perf.data doesn't
+		 * contain map and comm information.
+		 * Create a fake thread_map and directly call
+		 * perf_event__synthesize_thread_map() for those events.
+		 */
+		if (target__none(&rec->opts.target))
+			record__synthesize_workload(rec);
+	}
 	return fd;
 }
 
-- 
1.8.3.4

^ permalink raw reply related	[flat|nested] 60+ messages in thread

* [PATCH 29/46] perf record: Ensure return non-zero rc when mmap fail
  2016-02-26  9:31 [PATCH 00/46] perf tools: Fix and improvements (bpf and overwrite) Wang Nan
                   ` (27 preceding siblings ...)
  2016-02-26  9:32 ` [PATCH 28/46] perf record: Generate tracking events for process forked by perf Wang Nan
@ 2016-02-26  9:32 ` Wang Nan
  2016-03-05  8:17   ` [tip:perf/core] " tip-bot for Wang Nan
  2016-02-26  9:32 ` [PATCH 30/46] perf record: Prevent reading invalid data in record__mmap_read Wang Nan
                   ` (16 subsequent siblings)
  45 siblings, 1 reply; 60+ messages in thread
From: Wang Nan @ 2016-02-26  9:32 UTC (permalink / raw)
  To: Alexei Starovoitov, Arnaldo Carvalho de Melo, Arnaldo Carvalho de Melo
  Cc: Jiri Olsa, Li Zefan, Peter Zijlstra, pi3orama, Wang Nan,
	linux-kernel, He Kuang, Masami Hiramatsu, Namhyung Kim

perf_evlist__mmap_ex() can fail without setting errno (for example,
fail in condition checking. In this case all syscall is success).
If this happen, record__open() incorrectly returns 0. Force setting
rc is a quick way to avoid this problem, or we have to follow all
possible code path in perf_evlist__mmap_ex() to make sure there's
at least one system call before returning an error.

Signed-off-by: Wang Nan <wangnan0@huawei.com>
Signed-off-by: He Kuang <hekuang@huawei.com>
Cc: Arnaldo Carvalho de Melo <acme@redhat.com>
Cc: Jiri Olsa <jolsa@kernel.org>
Cc: Masami Hiramatsu <masami.hiramatsu.pt@hitachi.com>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Zefan Li <lizefan@huawei.com>
Cc: pi3orama@163.com
---
 tools/perf/builtin-record.c | 5 ++++-
 1 file changed, 4 insertions(+), 1 deletion(-)

diff --git a/tools/perf/builtin-record.c b/tools/perf/builtin-record.c
index a2de15b..fa16099 100644
--- a/tools/perf/builtin-record.c
+++ b/tools/perf/builtin-record.c
@@ -363,7 +363,10 @@ try_again:
 		} else {
 			pr_err("failed to mmap with %d (%s)\n", errno,
 				strerror_r(errno, msg, sizeof(msg)));
-			rc = -errno;
+			if (errno)
+				rc = -errno;
+			else
+				rc = -EINVAL;
 		}
 		goto out;
 	}
-- 
1.8.3.4

^ permalink raw reply related	[flat|nested] 60+ messages in thread

* [PATCH 30/46] perf record: Prevent reading invalid data in record__mmap_read
  2016-02-26  9:31 [PATCH 00/46] perf tools: Fix and improvements (bpf and overwrite) Wang Nan
                   ` (28 preceding siblings ...)
  2016-02-26  9:32 ` [PATCH 29/46] perf record: Ensure return non-zero rc when mmap fail Wang Nan
@ 2016-02-26  9:32 ` Wang Nan
  2016-02-26  9:32 ` [PATCH 31/46] perf tools: Add evlist channel helpers Wang Nan
                   ` (15 subsequent siblings)
  45 siblings, 0 replies; 60+ messages in thread
From: Wang Nan @ 2016-02-26  9:32 UTC (permalink / raw)
  To: Alexei Starovoitov, Arnaldo Carvalho de Melo, Arnaldo Carvalho de Melo
  Cc: Jiri Olsa, Li Zefan, Peter Zijlstra, pi3orama, Wang Nan,
	linux-kernel, He Kuang, Masami Hiramatsu, Namhyung Kim

When record__mmap_read() requires data more than the size of ring
buffer, drop those data to avoid accessing invalid memory.

This can happen when reading from overwritable ring buffer, which
should be avoided. However, check this for robustness.

Signed-off-by: Wang Nan <wangnan0@huawei.com>
Signed-off-by: He Kuang <hekuang@huawei.com>
Cc: Arnaldo Carvalho de Melo <acme@redhat.com>
Cc: Jiri Olsa <jolsa@kernel.org>
Cc: Masami Hiramatsu <masami.hiramatsu.pt@hitachi.com>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Zefan Li <lizefan@huawei.com>
Cc: pi3orama@163.com
---
 tools/perf/builtin-record.c | 8 ++++++++
 1 file changed, 8 insertions(+)

diff --git a/tools/perf/builtin-record.c b/tools/perf/builtin-record.c
index fa16099..9ffdef9 100644
--- a/tools/perf/builtin-record.c
+++ b/tools/perf/builtin-record.c
@@ -38,6 +38,7 @@
 #include <unistd.h>
 #include <sched.h>
 #include <sys/mman.h>
+#include <asm/bug.h>
 
 
 struct record {
@@ -96,6 +97,13 @@ static int record__mmap_read(struct record *rec, int idx)
 	rec->samples++;
 
 	size = head - old;
+	if (size > (unsigned long)(md->mask) + 1) {
+		WARN_ONCE(1, "failed to keep up with mmap data. (warn only once)\n");
+
+		md->prev = head;
+		perf_evlist__mmap_consume(rec->evlist, idx);
+		return 0;
+	}
 
 	if ((old & md->mask) + size != (head & md->mask)) {
 		buf = &data[old & md->mask];
-- 
1.8.3.4

^ permalink raw reply related	[flat|nested] 60+ messages in thread

* [PATCH 31/46] perf tools: Add evlist channel helpers
  2016-02-26  9:31 [PATCH 00/46] perf tools: Fix and improvements (bpf and overwrite) Wang Nan
                   ` (29 preceding siblings ...)
  2016-02-26  9:32 ` [PATCH 30/46] perf record: Prevent reading invalid data in record__mmap_read Wang Nan
@ 2016-02-26  9:32 ` Wang Nan
  2016-02-26  9:32 ` [PATCH 32/46] perf tools: Automatically add new channel according to evlist Wang Nan
                   ` (14 subsequent siblings)
  45 siblings, 0 replies; 60+ messages in thread
From: Wang Nan @ 2016-02-26  9:32 UTC (permalink / raw)
  To: Alexei Starovoitov, Arnaldo Carvalho de Melo, Arnaldo Carvalho de Melo
  Cc: Jiri Olsa, Li Zefan, Peter Zijlstra, pi3orama, Wang Nan,
	linux-kernel, He Kuang, Masami Hiramatsu, Namhyung Kim

In this commit sereval helpers are introduced to support the principle
of channel. Channels hold different groups of evsels which configured
differently. It will be used for overwritable evsels, which allows perf
record some events continuously while capture snapshot for other events
when something happen. Tracking events (mmap, mmap2, fork, exit ...)
are another possible events worth to be put into a separated channel.

Channels are represented by an array with channel flags. Each channel
contains evlist->nr_mmaps mmaps. Channels are configured before
perf_evlist__mmap_ex(). During that function nr_mmaps mmaps for each
channel are allocated together as a big array.
perf_evlist__channel_idx() converts index in the big array and the
channel number. For API functions which accept idx, _ex() versions are
introduced to accept selecting an mmap from a channel.

Signed-off-by: Wang Nan <wangnan0@huawei.com>
Signed-off-by: He Kuang <hekuang@huawei.com>
Cc: Arnaldo Carvalho de Melo <acme@redhat.com>
Cc: Jiri Olsa <jolsa@kernel.org>
Cc: Masami Hiramatsu <masami.hiramatsu.pt@hitachi.com>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Zefan Li <lizefan@huawei.com>
Cc: pi3orama@163.com
---
 tools/perf/builtin-record.c |   6 ++
 tools/perf/util/evlist.c    | 132 ++++++++++++++++++++++++++++++++++++++++++--
 tools/perf/util/evlist.h    |  58 +++++++++++++++++++
 3 files changed, 190 insertions(+), 6 deletions(-)

diff --git a/tools/perf/builtin-record.c b/tools/perf/builtin-record.c
index 9ffdef9..30389b4 100644
--- a/tools/perf/builtin-record.c
+++ b/tools/perf/builtin-record.c
@@ -357,6 +357,12 @@ try_again:
 		goto out;
 	}
 
+	perf_evlist__channel_reset(evlist);
+	rc = perf_evlist__channel_add(evlist, 0, true);
+	if (rc < 0)
+		goto out;
+	rc = 0;
+
 	if (perf_evlist__mmap_ex(evlist, opts->mmap_pages, false,
 				 opts->auxtrace_mmap_pages,
 				 opts->auxtrace_snapshot_mode) < 0) {
diff --git a/tools/perf/util/evlist.c b/tools/perf/util/evlist.c
index 227950b..2a888fe0 100644
--- a/tools/perf/util/evlist.c
+++ b/tools/perf/util/evlist.c
@@ -679,14 +679,51 @@ static struct perf_evsel *perf_evlist__event2evsel(struct perf_evlist *evlist,
 	return NULL;
 }
 
-union perf_event *perf_evlist__mmap_read(struct perf_evlist *evlist, int idx)
+int perf_evlist__channel_idx(struct perf_evlist *evlist,
+			     int *p_channel, int *p_idx)
+{
+	int channel = *p_channel;
+	int _idx = *p_idx;
+
+	if (_idx < 0)
+		return -EINVAL;
+	/*
+	 * Negative channel means caller explicitly use real index.
+	 */
+	if (channel < 0) {
+		channel = perf_evlist__idx_channel(evlist, _idx);
+		_idx = _idx % evlist->nr_mmaps;
+	}
+	if (channel < 0)
+		return channel;
+	if (channel >= PERF_EVLIST__NR_CHANNELS)
+		return -E2BIG;
+	if (_idx >= evlist->nr_mmaps)
+		return -E2BIG;
+
+	*p_channel = channel;
+	*p_idx = evlist->nr_mmaps * channel + _idx;
+	return 0;
+}
+
+union perf_event *perf_evlist__mmap_read_ex(struct perf_evlist *evlist,
+					    int channel, int idx)
 {
+	int err = perf_evlist__channel_idx(evlist, &channel, &idx);
 	struct perf_mmap *md = &evlist->mmap[idx];
 	u64 head;
-	u64 old = md->prev;
-	unsigned char *data = md->base + page_size;
+	u64 old;
+	unsigned char *data;
 	union perf_event *event = NULL;
 
+	if (err || !perf_evlist__channel_is_enabled(evlist, channel)) {
+		pr_err("ERROR: invalid mmap index: channel %d, idx: %d\n",
+		       channel, idx);
+		return NULL;
+	}
+	old = md->prev;
+	data = md->base + page_size;
+
 	/*
 	 * Check if event was unmapped due to a POLLHUP/POLLERR.
 	 */
@@ -748,6 +785,11 @@ union perf_event *perf_evlist__mmap_read(struct perf_evlist *evlist, int idx)
 	return event;
 }
 
+union perf_event *perf_evlist__mmap_read(struct perf_evlist *evlist, int idx)
+{
+	return perf_evlist__mmap_read_ex(evlist, -1, idx);
+}
+
 static bool perf_mmap__empty(struct perf_mmap *md)
 {
 	return perf_mmap__read_head(md) == md->prev && !md->auxtrace_mmap.base;
@@ -766,10 +808,18 @@ static void perf_evlist__mmap_put(struct perf_evlist *evlist, int idx)
 		__perf_evlist__munmap(evlist, idx);
 }
 
-void perf_evlist__mmap_consume(struct perf_evlist *evlist, int idx)
+void perf_evlist__mmap_consume_ex(struct perf_evlist *evlist,
+				  int channel, int idx)
 {
+	int err = perf_evlist__channel_idx(evlist, &channel, &idx);
 	struct perf_mmap *md = &evlist->mmap[idx];
 
+	if (err || !perf_evlist__channel_is_enabled(evlist, channel)) {
+		pr_err("ERROR: invalid mmap index: channel %d, idx: %d\n",
+		       channel, idx);
+		return;
+	}
+
 	if (!evlist->overwrite) {
 		u64 old = md->prev;
 
@@ -780,6 +830,11 @@ void perf_evlist__mmap_consume(struct perf_evlist *evlist, int idx)
 		perf_evlist__mmap_put(evlist, idx);
 }
 
+void perf_evlist__mmap_consume(struct perf_evlist *evlist, int idx)
+{
+	perf_evlist__mmap_consume_ex(evlist, -1, idx);
+}
+
 int __weak auxtrace_mmap__mmap(struct auxtrace_mmap *mm __maybe_unused,
 			       struct auxtrace_mmap_params *mp __maybe_unused,
 			       void *userpg __maybe_unused,
@@ -825,7 +880,7 @@ void perf_evlist__munmap(struct perf_evlist *evlist)
 	if (evlist->mmap == NULL)
 		return;
 
-	for (i = 0; i < evlist->nr_mmaps; i++)
+	for (i = 0; i < perf_evlist__mmap_nr(evlist); i++)
 		__perf_evlist__munmap(evlist, i);
 
 	zfree(&evlist->mmap);
@@ -833,10 +888,17 @@ void perf_evlist__munmap(struct perf_evlist *evlist)
 
 static int perf_evlist__alloc_mmap(struct perf_evlist *evlist)
 {
+	int total_mmaps;
+
 	evlist->nr_mmaps = cpu_map__nr(evlist->cpus);
 	if (cpu_map__empty(evlist->cpus))
 		evlist->nr_mmaps = thread_map__nr(evlist->threads);
-	evlist->mmap = zalloc(evlist->nr_mmaps * sizeof(struct perf_mmap));
+
+	total_mmaps = perf_evlist__mmap_nr(evlist);
+	if (!total_mmaps)
+		return -EINVAL;
+
+	evlist->mmap = zalloc(total_mmaps * sizeof(struct perf_mmap));
 	return evlist->mmap != NULL ? 0 : -ENOMEM;
 }
 
@@ -1137,6 +1199,12 @@ int perf_evlist__mmap_ex(struct perf_evlist *evlist, unsigned int pages,
 int perf_evlist__mmap(struct perf_evlist *evlist, unsigned int pages,
 		      bool overwrite)
 {
+	int err;
+
+	perf_evlist__channel_reset(evlist);
+	err = perf_evlist__channel_add(evlist, 0, true);
+	if (err < 0)
+		return err;
 	return perf_evlist__mmap_ex(evlist, pages, overwrite, 0, false);
 }
 
@@ -1767,3 +1835,55 @@ perf_evlist__find_evsel_by_str(struct perf_evlist *evlist,
 
 	return NULL;
 }
+
+int perf_evlist__channel_nr(struct perf_evlist *evlist)
+{
+	int i;
+
+	for (i = PERF_EVLIST__NR_CHANNELS - 1; i >= 0; i--) {
+		unsigned long flags = evlist->channel_flags[i];
+
+		if (flags & PERF_EVLIST__CHANNEL_ENABLED)
+			return i + 1;
+	}
+	return 0;
+}
+
+int perf_evlist__mmap_nr(struct perf_evlist *evlist)
+{
+	return evlist->nr_mmaps * perf_evlist__channel_nr(evlist);
+}
+
+void perf_evlist__channel_reset(struct perf_evlist *evlist)
+{
+	int i;
+
+	BUG_ON(evlist->mmap);
+
+	for (i = 0; i < PERF_EVLIST__NR_CHANNELS; i++)
+		evlist->channel_flags[i] = 0;
+}
+
+int perf_evlist__channel_add(struct perf_evlist *evlist,
+			     unsigned long flag,
+			     bool is_default)
+{
+	int n = perf_evlist__channel_nr(evlist);
+	unsigned long *flags = evlist->channel_flags;
+
+	BUG_ON(evlist->mmap);
+
+	if (n >= PERF_EVLIST__NR_CHANNELS) {
+		pr_debug("ERROR: too many channels. Increase PERF_EVLIST__NR_CHANNELS\n");
+		return -ENOSPC;
+	}
+
+	if (is_default) {
+		memmove(&flags[1], &flags[0],
+			sizeof(evlist->channel_flags) -
+			sizeof(evlist->channel_flags[0]));
+		n = 0;
+	}
+	flags[n] = flag | PERF_EVLIST__CHANNEL_ENABLED;
+	return n;
+}
diff --git a/tools/perf/util/evlist.h b/tools/perf/util/evlist.h
index a0d1522..1812652 100644
--- a/tools/perf/util/evlist.h
+++ b/tools/perf/util/evlist.h
@@ -20,6 +20,11 @@ struct record_opts;
 #define PERF_EVLIST__HLIST_BITS 8
 #define PERF_EVLIST__HLIST_SIZE (1 << PERF_EVLIST__HLIST_BITS)
 
+#define PERF_EVLIST__NR_CHANNELS	1
+enum perf_evlist_mmap_flag {
+	PERF_EVLIST__CHANNEL_ENABLED	= 1,
+};
+
 /**
  * struct perf_mmap - perf's ring buffer mmap details
  *
@@ -52,6 +57,7 @@ struct perf_evlist {
 		pid_t	pid;
 	} workload;
 	struct fdarray	 pollfd;
+	unsigned long channel_flags[PERF_EVLIST__NR_CHANNELS];
 	struct perf_mmap *mmap;
 	struct thread_map *threads;
 	struct cpu_map	  *cpus;
@@ -116,9 +122,61 @@ struct perf_evsel *perf_evlist__id2evsel_strict(struct perf_evlist *evlist,
 
 struct perf_sample_id *perf_evlist__id2sid(struct perf_evlist *evlist, u64 id);
 
+union perf_event *perf_evlist__mmap_read_ex(struct perf_evlist *evlist,
+					    int channel, int idx);
 union perf_event *perf_evlist__mmap_read(struct perf_evlist *evlist, int idx);
 
+void perf_evlist__mmap_consume_ex(struct perf_evlist *evlist,
+				  int channel, int idx);
 void perf_evlist__mmap_consume(struct perf_evlist *evlist, int idx);
+int perf_evlist__mmap_nr(struct perf_evlist *evlist);
+
+int perf_evlist__channel_nr(struct perf_evlist *evlist);
+void perf_evlist__channel_reset(struct perf_evlist *evlist);
+int perf_evlist__channel_add(struct perf_evlist *evlist,
+			     unsigned long flag,
+			     bool is_default);
+
+static inline bool
+__perf_evlist__channel_check(struct perf_evlist *evlist, int channel,
+			     enum perf_evlist_mmap_flag bits)
+{
+	if (channel >= PERF_EVLIST__NR_CHANNELS)
+		return false;
+
+	return (evlist->channel_flags[channel] & bits) ? true : false;
+}
+#define perf_evlist__channel_check(e, c, b) \
+		__perf_evlist__channel_check(e, c, PERF_EVLIST__CHANNEL_##b)
+
+static inline bool
+perf_evlist__channel_is_enabled(struct perf_evlist *evlist, int channel)
+{
+	return perf_evlist__channel_check(evlist, channel, ENABLED);
+}
+
+static inline int
+perf_evlist__idx_channel(struct perf_evlist *evlist, int idx)
+{
+	int channel = idx / evlist->nr_mmaps;
+
+	if (channel >= PERF_EVLIST__NR_CHANNELS)
+		return -E2BIG;
+	return channel;
+}
+
+int perf_evlist__channel_idx(struct perf_evlist *evlist,
+			     int *p_channel, int *p_idx);
+
+static inline struct perf_mmap *
+perf_evlist__get_mmap(struct perf_evlist *evlist,
+		      int channel, int idx)
+{
+	if (perf_evlist__channel_idx(evlist, &channel, &idx))
+		return NULL;
+
+	return &evlist->mmap[idx];
+}
 
 int perf_evlist__open(struct perf_evlist *evlist);
 void perf_evlist__close(struct perf_evlist *evlist);
-- 
1.8.3.4

^ permalink raw reply related	[flat|nested] 60+ messages in thread

* [PATCH 32/46] perf tools: Automatically add new channel according to evlist
  2016-02-26  9:31 [PATCH 00/46] perf tools: Fix and improvements (bpf and overwrite) Wang Nan
                   ` (30 preceding siblings ...)
  2016-02-26  9:32 ` [PATCH 31/46] perf tools: Add evlist channel helpers Wang Nan
@ 2016-02-26  9:32 ` Wang Nan
  2016-02-26  9:32 ` [PATCH 33/46] perf tools: Operate multiple channels Wang Nan
                   ` (13 subsequent siblings)
  45 siblings, 0 replies; 60+ messages in thread
From: Wang Nan @ 2016-02-26  9:32 UTC (permalink / raw)
  To: Alexei Starovoitov, Arnaldo Carvalho de Melo, Arnaldo Carvalho de Melo
  Cc: Jiri Olsa, Li Zefan, Peter Zijlstra, pi3orama, Wang Nan,
	linux-kernel, He Kuang, Masami Hiramatsu, Namhyung Kim

perf_evlist__channel_find() can be used to find a proper channel based
on propreties of a evsel. If the channel doesn't exist, it can create
new one for it. After this patch there's no need to create default
channel explicitly.

Signed-off-by: Wang Nan <wangnan0@huawei.com>
Signed-off-by: He Kuang <hekuang@huawei.com>
Cc: Arnaldo Carvalho de Melo <acme@redhat.com>
Cc: Jiri Olsa <jolsa@kernel.org>
Cc: Masami Hiramatsu <masami.hiramatsu.pt@hitachi.com>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Zefan Li <lizefan@huawei.com>
Cc: pi3orama@163.com
---
 tools/perf/builtin-record.c |  5 -----
 tools/perf/util/evlist.c    | 47 ++++++++++++++++++++++++++++++++++++++++-----
 2 files changed, 42 insertions(+), 10 deletions(-)

diff --git a/tools/perf/builtin-record.c b/tools/perf/builtin-record.c
index 30389b4..b815bea 100644
--- a/tools/perf/builtin-record.c
+++ b/tools/perf/builtin-record.c
@@ -358,11 +358,6 @@ try_again:
 	}
 
 	perf_evlist__channel_reset(evlist);
-	rc = perf_evlist__channel_add(evlist, 0, true);
-	if (rc < 0)
-		goto out;
-	rc = 0;
-
 	if (perf_evlist__mmap_ex(evlist, opts->mmap_pages, false,
 				 opts->auxtrace_mmap_pages,
 				 opts->auxtrace_snapshot_mode) < 0) {
diff --git a/tools/perf/util/evlist.c b/tools/perf/util/evlist.c
index 2a888fe0..056f870 100644
--- a/tools/perf/util/evlist.c
+++ b/tools/perf/util/evlist.c
@@ -943,6 +943,43 @@ static int __perf_evlist__mmap(struct perf_evlist *evlist, int idx,
 	return 0;
 }
 
+static unsigned long
+perf_evlist__channel_for_evsel(struct perf_evsel *evsel __maybe_unused)
+{
+	return 0;
+}
+
+static int
+perf_evlist__channel_find(struct perf_evlist *evlist,
+			  struct perf_evsel *evsel,
+			  bool add_new)
+{
+	unsigned long flag = perf_evlist__channel_for_evsel(evsel);
+	int i;
+
+	flag |= PERF_EVLIST__CHANNEL_ENABLED;
+	for (i = 0; i < perf_evlist__channel_nr(evlist); i++)
+		if (evlist->channel_flags[i] == flag)
+			return i;
+	if (add_new)
+		return perf_evlist__channel_add(evlist, flag, false);
+	return -ENOENT;
+}
+
+static int
+perf_evlist__channel_complete(struct perf_evlist *evlist)
+{
+	struct perf_evsel *evsel;
+	int err;
+
+	evlist__for_each(evlist, evsel) {
+		err = perf_evlist__channel_find(evlist, evsel, true);
+		if (err < 0)
+			return err;
+	}
+	return 0;
+}
+
 static int perf_evlist__mmap_per_evsel(struct perf_evlist *evlist, int idx,
 				       struct mmap_params *mp, int cpu,
 				       int thread, int *output)
@@ -1162,6 +1199,7 @@ int perf_evlist__mmap_ex(struct perf_evlist *evlist, unsigned int pages,
 			 bool overwrite, unsigned int auxtrace_pages,
 			 bool auxtrace_overwrite)
 {
+	int err;
 	struct perf_evsel *evsel;
 	const struct cpu_map *cpus = evlist->cpus;
 	const struct thread_map *threads = evlist->threads;
@@ -1169,6 +1207,10 @@ int perf_evlist__mmap_ex(struct perf_evlist *evlist, unsigned int pages,
 		.prot = PROT_READ | (overwrite ? 0 : PROT_WRITE),
 	};
 
+	err = perf_evlist__channel_complete(evlist);
+	if (err)
+		return err;
+
 	if (evlist->mmap == NULL && perf_evlist__alloc_mmap(evlist) < 0)
 		return -ENOMEM;
 
@@ -1199,12 +1241,7 @@ int perf_evlist__mmap_ex(struct perf_evlist *evlist, unsigned int pages,
 int perf_evlist__mmap(struct perf_evlist *evlist, unsigned int pages,
 		      bool overwrite)
 {
-	int err;
-
 	perf_evlist__channel_reset(evlist);
-	err = perf_evlist__channel_add(evlist, 0, true);
-	if (err < 0)
-		return err;
 	return perf_evlist__mmap_ex(evlist, pages, overwrite, 0, false);
 }
 
-- 
1.8.3.4

^ permalink raw reply related	[flat|nested] 60+ messages in thread

* [PATCH 33/46] perf tools: Operate multiple channels
  2016-02-26  9:31 [PATCH 00/46] perf tools: Fix and improvements (bpf and overwrite) Wang Nan
                   ` (31 preceding siblings ...)
  2016-02-26  9:32 ` [PATCH 32/46] perf tools: Automatically add new channel according to evlist Wang Nan
@ 2016-02-26  9:32 ` Wang Nan
  2016-02-26  9:32 ` [PATCH 34/46] perf tools: Squash overwrite setting into channel Wang Nan
                   ` (12 subsequent siblings)
  45 siblings, 0 replies; 60+ messages in thread
From: Wang Nan @ 2016-02-26  9:32 UTC (permalink / raw)
  To: Alexei Starovoitov, Arnaldo Carvalho de Melo, Arnaldo Carvalho de Melo
  Cc: Jiri Olsa, Li Zefan, Peter Zijlstra, pi3orama, Wang Nan,
	linux-kernel, He Kuang, Masami Hiramatsu, Namhyung Kim

Before this patch perf operates on only the first channel. Make perf
mmap and read from multiple channels.

Signed-off-by: Wang Nan <wangnan0@huawei.com>
Signed-off-by: He Kuang <hekuang@huawei.com>
Cc: Arnaldo Carvalho de Melo <acme@redhat.com>
Cc: Jiri Olsa <jolsa@kernel.org>
Cc: Masami Hiramatsu <masami.hiramatsu.pt@hitachi.com>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Zefan Li <lizefan@huawei.com>
Cc: pi3orama@163.com
---
 tools/perf/builtin-record.c |  3 ++-
 tools/perf/util/evlist.c    | 55 ++++++++++++++++++++++++++++++++++-----------
 tools/perf/util/evlist.h    |  2 +-
 3 files changed, 45 insertions(+), 15 deletions(-)

diff --git a/tools/perf/builtin-record.c b/tools/perf/builtin-record.c
index b815bea..d48065f 100644
--- a/tools/perf/builtin-record.c
+++ b/tools/perf/builtin-record.c
@@ -467,8 +467,9 @@ static int record__mmap_read_all(struct record *rec)
 	u64 bytes_written = rec->bytes_written;
 	int i;
 	int rc = 0;
+	int total_mmaps = perf_evlist__mmap_nr(rec->evlist);
 
-	for (i = 0; i < rec->evlist->nr_mmaps; i++) {
+	for (i = 0; i < total_mmaps; i++) {
 		struct auxtrace_mmap *mm = &rec->evlist->mmap[i].auxtrace_mmap;
 
 		if (rec->evlist->mmap[i].base) {
diff --git a/tools/perf/util/evlist.c b/tools/perf/util/evlist.c
index 056f870..c4e3185 100644
--- a/tools/perf/util/evlist.c
+++ b/tools/perf/util/evlist.c
@@ -873,6 +873,21 @@ static void __perf_evlist__munmap(struct perf_evlist *evlist, int idx)
 	auxtrace_mmap__munmap(&evlist->mmap[idx].auxtrace_mmap);
 }
 
+static void
+__perf_evlist__munmap_channels(struct perf_evlist *evlist, int _idx)
+{
+	int _ch;
+
+	for (_ch = 0; _ch < perf_evlist__channel_nr(evlist); _ch++) {
+		int err, idx = _idx, ch = _ch;
+
+		err = perf_evlist__channel_idx(evlist, &ch, &idx);
+		if (err < 0)
+			continue;
+		__perf_evlist__munmap(evlist, idx);
+	}
+}
+
 void perf_evlist__munmap(struct perf_evlist *evlist)
 {
 	int i;
@@ -980,26 +995,38 @@ perf_evlist__channel_complete(struct perf_evlist *evlist)
 	return 0;
 }
 
-static int perf_evlist__mmap_per_evsel(struct perf_evlist *evlist, int idx,
+static int perf_evlist__mmap_per_evsel(struct perf_evlist *evlist, int _idx,
 				       struct mmap_params *mp, int cpu,
-				       int thread, int *output)
+				       int thread, int *outputs)
 {
 	struct perf_evsel *evsel;
 
 	evlist__for_each(evlist, evsel) {
-		int fd;
+		int fd, channel, idx, err;
+
+		channel = perf_evlist__channel_find(evlist, evsel, false);
+		if (channel < 0) {
+			pr_err("ERROR: unable to find suitable channel for %s\n",
+			       evsel->name);
+			return -1;
+		}
+
+		idx = _idx;
+		err = perf_evlist__channel_idx(evlist, &channel, &idx);
+		if (err < 0)
+			return err;
 
 		if (evsel->system_wide && thread)
 			continue;
 
 		fd = FD(evsel, cpu, thread);
 
-		if (*output == -1) {
-			*output = fd;
-			if (__perf_evlist__mmap(evlist, idx, mp, *output) < 0)
+		if (outputs[channel] == -1) {
+			outputs[channel] = fd;
+			if (__perf_evlist__mmap(evlist, idx, mp, outputs[channel]) < 0)
 				return -1;
 		} else {
-			if (ioctl(fd, PERF_EVENT_IOC_SET_OUTPUT, *output) != 0)
+			if (ioctl(fd, PERF_EVENT_IOC_SET_OUTPUT, outputs[channel]) != 0)
 				return -1;
 
 			perf_evlist__mmap_get(evlist, idx);
@@ -1039,14 +1066,15 @@ static int perf_evlist__mmap_per_cpu(struct perf_evlist *evlist,
 
 	pr_debug2("perf event ring buffer mmapped per cpu\n");
 	for (cpu = 0; cpu < nr_cpus; cpu++) {
-		int output = -1;
+		int outputs[PERF_EVLIST__NR_CHANNELS];
 
+		memset(outputs, -1, sizeof(outputs));
 		auxtrace_mmap_params__set_idx(&mp->auxtrace_mp, evlist, cpu,
 					      true);
 
 		for (thread = 0; thread < nr_threads; thread++) {
 			if (perf_evlist__mmap_per_evsel(evlist, cpu, mp, cpu,
-							thread, &output))
+							thread, outputs))
 				goto out_unmap;
 		}
 	}
@@ -1055,7 +1083,7 @@ static int perf_evlist__mmap_per_cpu(struct perf_evlist *evlist,
 
 out_unmap:
 	for (cpu = 0; cpu < nr_cpus; cpu++)
-		__perf_evlist__munmap(evlist, cpu);
+		__perf_evlist__munmap_channels(evlist, cpu);
 	return -1;
 }
 
@@ -1067,13 +1095,14 @@ static int perf_evlist__mmap_per_thread(struct perf_evlist *evlist,
 
 	pr_debug2("perf event ring buffer mmapped per thread\n");
 	for (thread = 0; thread < nr_threads; thread++) {
-		int output = -1;
+		int outputs[PERF_EVLIST__NR_CHANNELS];
 
+		memset(outputs, -1, sizeof(outputs));
 		auxtrace_mmap_params__set_idx(&mp->auxtrace_mp, evlist, thread,
 					      false);
 
 		if (perf_evlist__mmap_per_evsel(evlist, thread, mp, 0, thread,
-						&output))
+						outputs))
 			goto out_unmap;
 	}
 
@@ -1081,7 +1110,7 @@ static int perf_evlist__mmap_per_thread(struct perf_evlist *evlist,
 
 out_unmap:
 	for (thread = 0; thread < nr_threads; thread++)
-		__perf_evlist__munmap(evlist, thread);
+		__perf_evlist__munmap_channels(evlist, thread);
 	return -1;
 }
 
diff --git a/tools/perf/util/evlist.h b/tools/perf/util/evlist.h
index 1812652..b652587 100644
--- a/tools/perf/util/evlist.h
+++ b/tools/perf/util/evlist.h
@@ -20,7 +20,7 @@ struct record_opts;
 #define PERF_EVLIST__HLIST_BITS 8
 #define PERF_EVLIST__HLIST_SIZE (1 << PERF_EVLIST__HLIST_BITS)
 
-#define PERF_EVLIST__NR_CHANNELS	1
+#define PERF_EVLIST__NR_CHANNELS	2
 enum perf_evlist_mmap_flag {
 	PERF_EVLIST__CHANNEL_ENABLED	= 1,
 };
-- 
1.8.3.4

^ permalink raw reply related	[flat|nested] 60+ messages in thread

* [PATCH 34/46] perf tools: Squash overwrite setting into channel
  2016-02-26  9:31 [PATCH 00/46] perf tools: Fix and improvements (bpf and overwrite) Wang Nan
                   ` (32 preceding siblings ...)
  2016-02-26  9:32 ` [PATCH 33/46] perf tools: Operate multiple channels Wang Nan
@ 2016-02-26  9:32 ` Wang Nan
  2016-02-26  9:32 ` [PATCH 35/46] perf record: Don't read from and poll overwrite channel Wang Nan
                   ` (11 subsequent siblings)
  45 siblings, 0 replies; 60+ messages in thread
From: Wang Nan @ 2016-02-26  9:32 UTC (permalink / raw)
  To: Alexei Starovoitov, Arnaldo Carvalho de Melo, Arnaldo Carvalho de Melo
  Cc: Jiri Olsa, Li Zefan, Peter Zijlstra, pi3orama, Wang Nan,
	linux-kernel, He Kuang, Masami Hiramatsu, Namhyung Kim

Make 'overwrite' a channel configuration other than a evlist global
option. With this setting an evlist can have two channels, one is
normal channel, another is overwritable channel.
perf_evlist__channel_for_evsel() ensures events with 'overwrite'
configuration inserted to overwritable channel.

Signed-off-by: Wang Nan <wangnan0@huawei.com>
Signed-off-by: He Kuang <hekuang@huawei.com>
Cc: Arnaldo Carvalho de Melo <acme@redhat.com>
Cc: Jiri Olsa <jolsa@kernel.org>
Cc: Masami Hiramatsu <masami.hiramatsu.pt@hitachi.com>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Zefan Li <lizefan@huawei.com>
Cc: pi3orama@163.com
---
 tools/perf/builtin-record.c |  2 +-
 tools/perf/util/evlist.c    | 42 +++++++++++++++++++++++++++---------------
 tools/perf/util/evlist.h    |  5 ++---
 tools/perf/util/evsel.h     |  1 +
 4 files changed, 31 insertions(+), 19 deletions(-)

diff --git a/tools/perf/builtin-record.c b/tools/perf/builtin-record.c
index d48065f..92eccf1 100644
--- a/tools/perf/builtin-record.c
+++ b/tools/perf/builtin-record.c
@@ -358,7 +358,7 @@ try_again:
 	}
 
 	perf_evlist__channel_reset(evlist);
-	if (perf_evlist__mmap_ex(evlist, opts->mmap_pages, false,
+	if (perf_evlist__mmap_ex(evlist, opts->mmap_pages,
 				 opts->auxtrace_mmap_pages,
 				 opts->auxtrace_snapshot_mode) < 0) {
 		if (errno == EPERM) {
diff --git a/tools/perf/util/evlist.c b/tools/perf/util/evlist.c
index c4e3185..067c89d 100644
--- a/tools/perf/util/evlist.c
+++ b/tools/perf/util/evlist.c
@@ -731,7 +731,7 @@ union perf_event *perf_evlist__mmap_read_ex(struct perf_evlist *evlist,
 		return NULL;
 
 	head = perf_mmap__read_head(md);
-	if (evlist->overwrite) {
+	if (perf_evlist__channel_check(evlist, channel, RDONLY)) {
 		/*
 		 * If we're further behind than half the buffer, there's a chance
 		 * the writer will bite our tail and mess up the samples under us.
@@ -820,7 +820,7 @@ void perf_evlist__mmap_consume_ex(struct perf_evlist *evlist,
 		return;
 	}
 
-	if (!evlist->overwrite) {
+	if (!perf_evlist__channel_check(evlist, channel, RDONLY)) {
 		u64 old = md->prev;
 
 		perf_mmap__write_tail(md, old);
@@ -918,7 +918,6 @@ static int perf_evlist__alloc_mmap(struct perf_evlist *evlist)
 }
 
 struct mmap_params {
-	int prot;
 	int mask;
 	struct auxtrace_mmap_params auxtrace_mp;
 };
@@ -926,6 +925,15 @@ struct mmap_params {
 static int __perf_evlist__mmap(struct perf_evlist *evlist, int idx,
 			       struct mmap_params *mp, int fd)
 {
+	int channel = perf_evlist__idx_channel(evlist, idx);
+	int prot = PROT_READ;
+
+	if (channel < 0)
+		return -1;
+
+	if (!perf_evlist__channel_check(evlist, channel, RDONLY))
+		prot |= PROT_WRITE;
+
 	/*
 	 * The last one will be done at perf_evlist__mmap_consume(), so that we
 	 * make sure we don't prevent tools from consuming every last event in
@@ -942,7 +950,7 @@ static int __perf_evlist__mmap(struct perf_evlist *evlist, int idx,
 	atomic_set(&evlist->mmap[idx].refcnt, 2);
 	evlist->mmap[idx].prev = 0;
 	evlist->mmap[idx].mask = mp->mask;
-	evlist->mmap[idx].base = mmap(NULL, evlist->mmap_len, mp->prot,
+	evlist->mmap[idx].base = mmap(NULL, evlist->mmap_len, prot,
 				      MAP_SHARED, fd, 0);
 	if (evlist->mmap[idx].base == MAP_FAILED) {
 		pr_debug2("failed to mmap perf event ring buffer, error %d\n",
@@ -959,9 +967,13 @@ static int __perf_evlist__mmap(struct perf_evlist *evlist, int idx,
 }
 
 static unsigned long
-perf_evlist__channel_for_evsel(struct perf_evsel *evsel __maybe_unused)
+perf_evlist__channel_for_evsel(struct perf_evsel *evsel)
 {
-	return 0;
+	unsigned long flag = 0;
+
+	if (evsel->overwrite)
+		flag |= PERF_EVLIST__CHANNEL_RDONLY;
+	return flag;
 }
 
 static int
@@ -1211,11 +1223,10 @@ int perf_evlist__parse_mmap_pages(const struct option *opt, const char *str,
  * perf_evlist__mmap_ex - Create mmaps to receive events.
  * @evlist: list of events
  * @pages: map length in pages
- * @overwrite: overwrite older events?
  * @auxtrace_pages - auxtrace map length in pages
  * @auxtrace_overwrite - overwrite older auxtrace data?
  *
- * If @overwrite is %false the user needs to signal event consumption using
+ * For writable channel, the user needs to signal event consumption using
  * perf_mmap__write_tail().  Using perf_evlist__mmap_read() does this
  * automatically.
  *
@@ -1225,16 +1236,13 @@ int perf_evlist__parse_mmap_pages(const struct option *opt, const char *str,
  * Return: %0 on success, negative error code otherwise.
  */
 int perf_evlist__mmap_ex(struct perf_evlist *evlist, unsigned int pages,
-			 bool overwrite, unsigned int auxtrace_pages,
-			 bool auxtrace_overwrite)
+			 unsigned int auxtrace_pages, bool auxtrace_overwrite)
 {
 	int err;
 	struct perf_evsel *evsel;
 	const struct cpu_map *cpus = evlist->cpus;
 	const struct thread_map *threads = evlist->threads;
-	struct mmap_params mp = {
-		.prot = PROT_READ | (overwrite ? 0 : PROT_WRITE),
-	};
+	struct mmap_params mp;
 
 	err = perf_evlist__channel_complete(evlist);
 	if (err)
@@ -1246,7 +1254,6 @@ int perf_evlist__mmap_ex(struct perf_evlist *evlist, unsigned int pages,
 	if (evlist->pollfd.entries == NULL && perf_evlist__alloc_pollfd(evlist) < 0)
 		return -ENOMEM;
 
-	evlist->overwrite = overwrite;
 	evlist->mmap_len = perf_evlist__mmap_size(pages);
 	pr_debug("mmap size %zuB\n", evlist->mmap_len);
 	mp.mask = evlist->mmap_len - page_size - 1;
@@ -1270,8 +1277,13 @@ int perf_evlist__mmap_ex(struct perf_evlist *evlist, unsigned int pages,
 int perf_evlist__mmap(struct perf_evlist *evlist, unsigned int pages,
 		      bool overwrite)
 {
+	struct perf_evsel *evsel;
+
 	perf_evlist__channel_reset(evlist);
-	return perf_evlist__mmap_ex(evlist, pages, overwrite, 0, false);
+	evlist__for_each(evlist, evsel)
+		evsel->overwrite = overwrite;
+
+	return perf_evlist__mmap_ex(evlist, pages, 0, false);
 }
 
 int perf_evlist__create_maps(struct perf_evlist *evlist, struct target *target)
diff --git a/tools/perf/util/evlist.h b/tools/perf/util/evlist.h
index b652587..21a8b85 100644
--- a/tools/perf/util/evlist.h
+++ b/tools/perf/util/evlist.h
@@ -23,6 +23,7 @@ struct record_opts;
 #define PERF_EVLIST__NR_CHANNELS	2
 enum perf_evlist_mmap_flag {
 	PERF_EVLIST__CHANNEL_ENABLED	= 1,
+	PERF_EVLIST__CHANNEL_RDONLY	= 2,
 };
 
 /**
@@ -45,7 +46,6 @@ struct perf_evlist {
 	int		 nr_entries;
 	int		 nr_groups;
 	int		 nr_mmaps;
-	bool		 overwrite;
 	bool		 enabled;
 	bool		 has_user_cpus;
 	size_t		 mmap_len;
@@ -203,8 +203,7 @@ int perf_evlist__parse_mmap_pages(const struct option *opt,
 				  int unset);
 
 int perf_evlist__mmap_ex(struct perf_evlist *evlist, unsigned int pages,
-			 bool overwrite, unsigned int auxtrace_pages,
-			 bool auxtrace_overwrite);
+			 unsigned int auxtrace_pages, bool auxtrace_overwrite);
 int perf_evlist__mmap(struct perf_evlist *evlist, unsigned int pages,
 		      bool overwrite);
 void perf_evlist__munmap(struct perf_evlist *evlist);
diff --git a/tools/perf/util/evsel.h b/tools/perf/util/evsel.h
index efad78f..03c70e5 100644
--- a/tools/perf/util/evsel.h
+++ b/tools/perf/util/evsel.h
@@ -114,6 +114,7 @@ struct perf_evsel {
 	bool			tracking;
 	bool			per_pkg;
 	bool			precise_max;
+	bool			overwrite;
 	/* parse modifier helper */
 	int			exclude_GH;
 	int			nr_members;
-- 
1.8.3.4

^ permalink raw reply related	[flat|nested] 60+ messages in thread

* [PATCH 35/46] perf record: Don't read from and poll overwrite channel
  2016-02-26  9:31 [PATCH 00/46] perf tools: Fix and improvements (bpf and overwrite) Wang Nan
                   ` (33 preceding siblings ...)
  2016-02-26  9:32 ` [PATCH 34/46] perf tools: Squash overwrite setting into channel Wang Nan
@ 2016-02-26  9:32 ` Wang Nan
  2016-02-26  9:32 ` [PATCH 36/46] perf record: Don't poll on " Wang Nan
                   ` (10 subsequent siblings)
  45 siblings, 0 replies; 60+ messages in thread
From: Wang Nan @ 2016-02-26  9:32 UTC (permalink / raw)
  To: Alexei Starovoitov, Arnaldo Carvalho de Melo, Arnaldo Carvalho de Melo
  Cc: Jiri Olsa, Li Zefan, Peter Zijlstra, pi3orama, Wang Nan,
	linux-kernel, He Kuang, Masami Hiramatsu, Namhyung Kim

Reading from overwritable ring buffer is unreliable. Introduce
record__mmap_should_read() and prevent reading from overwrite ring
buffer in 'perf record'. The rule in record__mmap_should_read() will
be changed when perf support reading from backward writing ring buffer.

Signed-off-by: Wang Nan <wangnan0@huawei.com>
Signed-off-by: He Kuang <hekuang@huawei.com>
Cc: Arnaldo Carvalho de Melo <acme@redhat.com>
Cc: Jiri Olsa <jolsa@kernel.org>
Cc: Masami Hiramatsu <masami.hiramatsu.pt@hitachi.com>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Zefan Li <lizefan@huawei.com>
Cc: pi3orama@163.com
---
 tools/perf/builtin-record.c | 15 ++++++++++++++-
 1 file changed, 14 insertions(+), 1 deletion(-)

diff --git a/tools/perf/builtin-record.c b/tools/perf/builtin-record.c
index 92eccf1..120b3bb 100644
--- a/tools/perf/builtin-record.c
+++ b/tools/perf/builtin-record.c
@@ -462,6 +462,19 @@ static struct perf_event_header finished_round_event = {
 	.type = PERF_RECORD_FINISHED_ROUND,
 };
 
+static bool record__mmap_should_read(struct record *rec, int idx)
+{
+	int channel = -1;
+
+	if (!rec->evlist->mmap[idx].base)
+		return false;
+	if (perf_evlist__channel_idx(rec->evlist, &channel, &idx))
+		return false;
+	if (perf_evlist__channel_check(rec->evlist, channel, RDONLY))
+		return false;
+	return true;
+}
+
 static int record__mmap_read_all(struct record *rec)
 {
 	u64 bytes_written = rec->bytes_written;
@@ -472,7 +485,7 @@ static int record__mmap_read_all(struct record *rec)
 	for (i = 0; i < total_mmaps; i++) {
 		struct auxtrace_mmap *mm = &rec->evlist->mmap[i].auxtrace_mmap;
 
-		if (rec->evlist->mmap[i].base) {
+		if (record__mmap_should_read(rec, i)) {
 			if (record__mmap_read(rec, i) != 0) {
 				rc = -1;
 				goto out;
-- 
1.8.3.4

^ permalink raw reply related	[flat|nested] 60+ messages in thread

* [PATCH 36/46] perf record: Don't poll on overwrite channel
  2016-02-26  9:31 [PATCH 00/46] perf tools: Fix and improvements (bpf and overwrite) Wang Nan
                   ` (34 preceding siblings ...)
  2016-02-26  9:32 ` [PATCH 35/46] perf record: Don't read from and poll overwrite channel Wang Nan
@ 2016-02-26  9:32 ` Wang Nan
  2016-02-26  9:32 ` [PATCH 37/46] perf tools: Detect avalibility of write_backward Wang Nan
                   ` (9 subsequent siblings)
  45 siblings, 0 replies; 60+ messages in thread
From: Wang Nan @ 2016-02-26  9:32 UTC (permalink / raw)
  To: Alexei Starovoitov, Arnaldo Carvalho de Melo, Arnaldo Carvalho de Melo
  Cc: Jiri Olsa, Li Zefan, Peter Zijlstra, pi3orama, Wang Nan,
	linux-kernel, He Kuang, Masami Hiramatsu, Namhyung Kim

There's no need to receive events from overwrite ring buffer. Instead,
perf should make them run background until something happen. This patch
makes events from overwrite ring buffer is ignored except POLLERR and
POLLHUP.

Signed-off-by: Wang Nan <wangnan0@huawei.com>
Signed-off-by: He Kuang <hekuang@huawei.com>
Cc: Arnaldo Carvalho de Melo <acme@redhat.com>
Cc: Jiri Olsa <jolsa@kernel.org>
Cc: Masami Hiramatsu <masami.hiramatsu.pt@hitachi.com>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Zefan Li <lizefan@huawei.com>
Cc: pi3orama@163.com
---
 tools/perf/util/evlist.c | 23 +++++++++++++++++++----
 1 file changed, 19 insertions(+), 4 deletions(-)

diff --git a/tools/perf/util/evlist.c b/tools/perf/util/evlist.c
index 067c89d..c8112805 100644
--- a/tools/perf/util/evlist.c
+++ b/tools/perf/util/evlist.c
@@ -461,9 +461,9 @@ int perf_evlist__alloc_pollfd(struct perf_evlist *evlist)
 	return 0;
 }
 
-static int __perf_evlist__add_pollfd(struct perf_evlist *evlist, int fd, int idx)
+static int __perf_evlist__add_pollfd(struct perf_evlist *evlist, int fd, int idx, short revent)
 {
-	int pos = fdarray__add(&evlist->pollfd, fd, POLLIN | POLLERR | POLLHUP);
+	int pos = fdarray__add(&evlist->pollfd, fd, revent | POLLERR | POLLHUP);
 	/*
 	 * Save the idx so that when we filter out fds POLLHUP'ed we can
 	 * close the associated evlist->mmap[] entry.
@@ -479,7 +479,7 @@ static int __perf_evlist__add_pollfd(struct perf_evlist *evlist, int fd, int idx
 
 int perf_evlist__add_pollfd(struct perf_evlist *evlist, int fd)
 {
-	return __perf_evlist__add_pollfd(evlist, fd, -1);
+	return __perf_evlist__add_pollfd(evlist, fd, -1, POLLIN);
 }
 
 static void perf_evlist__munmap_filtered(struct fdarray *fda, int fd)
@@ -1007,6 +1007,18 @@ perf_evlist__channel_complete(struct perf_evlist *evlist)
 	return 0;
 }
 
+static bool
+perf_evlist__should_poll(struct perf_evlist *evlist,
+			 struct perf_evsel *evsel,
+			 int channel)
+{
+	if (evsel->system_wide)
+		return false;
+	if (perf_evlist__channel_check(evlist, channel, RDONLY))
+		return false;
+	return true;
+}
+
 static int perf_evlist__mmap_per_evsel(struct perf_evlist *evlist, int _idx,
 				       struct mmap_params *mp, int cpu,
 				       int thread, int *outputs)
@@ -1015,6 +1027,7 @@ static int perf_evlist__mmap_per_evsel(struct perf_evlist *evlist, int _idx,
 
 	evlist__for_each(evlist, evsel) {
 		int fd, channel, idx, err;
+		short revent = POLLIN;
 
 		channel = perf_evlist__channel_find(evlist, evsel, false);
 		if (channel < 0) {
@@ -1044,6 +1057,8 @@ static int perf_evlist__mmap_per_evsel(struct perf_evlist *evlist, int _idx,
 			perf_evlist__mmap_get(evlist, idx);
 		}
 
+		if (!perf_evlist__should_poll(evlist, evsel, channel))
+			revent = 0;
 		/*
 		 * The system_wide flag causes a selected event to be opened
 		 * always without a pid.  Consequently it will never get a
@@ -1052,7 +1067,7 @@ static int perf_evlist__mmap_per_evsel(struct perf_evlist *evlist, int _idx,
 		 * Therefore don't add it for polling.
 		 */
 		if (!evsel->system_wide &&
-		    __perf_evlist__add_pollfd(evlist, fd, idx) < 0) {
+		    __perf_evlist__add_pollfd(evlist, fd, idx, revent) < 0) {
 			perf_evlist__mmap_put(evlist, idx);
 			return -1;
 		}
-- 
1.8.3.4

^ permalink raw reply related	[flat|nested] 60+ messages in thread

* [PATCH 37/46] perf tools: Detect avalibility of write_backward
  2016-02-26  9:31 [PATCH 00/46] perf tools: Fix and improvements (bpf and overwrite) Wang Nan
                   ` (35 preceding siblings ...)
  2016-02-26  9:32 ` [PATCH 36/46] perf record: Don't poll on " Wang Nan
@ 2016-02-26  9:32 ` Wang Nan
  2016-02-26  9:32 ` [PATCH 38/46] perf tools: Enable overwrite settings Wang Nan
                   ` (8 subsequent siblings)
  45 siblings, 0 replies; 60+ messages in thread
From: Wang Nan @ 2016-02-26  9:32 UTC (permalink / raw)
  To: Alexei Starovoitov, Arnaldo Carvalho de Melo, Arnaldo Carvalho de Melo
  Cc: Jiri Olsa, Li Zefan, Peter Zijlstra, pi3orama, Wang Nan,
	linux-kernel, He Kuang, Masami Hiramatsu, Namhyung Kim

Detect avalibility of write_backward and save the result into
record_opts. With write_backward the start pointer of a ring
buffer mapped read only can be found reliably.

Signed-off-by: Wang Nan <wangnan0@huawei.com>
Signed-off-by: He Kuang <hekuang@huawei.com>
Cc: Arnaldo Carvalho de Melo <acme@redhat.com>
Cc: Jiri Olsa <jolsa@kernel.org>
Cc: Masami Hiramatsu <masami.hiramatsu.pt@hitachi.com>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Zefan Li <lizefan@huawei.com>
Cc: pi3orama@163.com
---
 tools/perf/perf.h        |  1 +
 tools/perf/util/record.c | 11 +++++++++++
 2 files changed, 12 insertions(+)

diff --git a/tools/perf/perf.h b/tools/perf/perf.h
index 5381a01..198345e 100644
--- a/tools/perf/perf.h
+++ b/tools/perf/perf.h
@@ -73,6 +73,7 @@ struct record_opts {
 	bool	     sample_transaction;
 	unsigned     initial_delay;
 	bool         use_clockid;
+	bool	     has_write_backward;
 	clockid_t    clockid;
 	unsigned int proc_map_timeout;
 };
diff --git a/tools/perf/util/record.c b/tools/perf/util/record.c
index 0467367..d01f155 100644
--- a/tools/perf/util/record.c
+++ b/tools/perf/util/record.c
@@ -85,6 +85,11 @@ static void perf_probe_comm_exec(struct perf_evsel *evsel)
 	evsel->attr.comm_exec = 1;
 }
 
+static void perf_probe_write_backward(struct perf_evsel *evsel)
+{
+	evsel->attr.write_backward = 1;
+}
+
 static void perf_probe_context_switch(struct perf_evsel *evsel)
 {
 	evsel->attr.context_switch = 1;
@@ -105,6 +110,11 @@ bool perf_can_record_switch_events(void)
 	return perf_probe_api(perf_probe_context_switch);
 }
 
+static bool perf_can_write_backward(void)
+{
+	return perf_probe_api(perf_probe_write_backward);
+}
+
 bool perf_can_record_cpu_wide(void)
 {
 	struct perf_event_attr attr = {
@@ -235,6 +245,7 @@ static int record_opts__config_freq(struct record_opts *opts)
 
 int record_opts__config(struct record_opts *opts)
 {
+	opts->has_write_backward = perf_can_write_backward();
 	return record_opts__config_freq(opts);
 }
 
-- 
1.8.3.4

^ permalink raw reply related	[flat|nested] 60+ messages in thread

* [PATCH 38/46] perf tools: Enable overwrite settings
  2016-02-26  9:31 [PATCH 00/46] perf tools: Fix and improvements (bpf and overwrite) Wang Nan
                   ` (36 preceding siblings ...)
  2016-02-26  9:32 ` [PATCH 37/46] perf tools: Detect avalibility of write_backward Wang Nan
@ 2016-02-26  9:32 ` Wang Nan
  2016-02-26  9:32 ` [PATCH 39/46] perf tools: Set write_backward attribut bit for overwrite events Wang Nan
                   ` (7 subsequent siblings)
  45 siblings, 0 replies; 60+ messages in thread
From: Wang Nan @ 2016-02-26  9:32 UTC (permalink / raw)
  To: Alexei Starovoitov, Arnaldo Carvalho de Melo, Arnaldo Carvalho de Melo
  Cc: Jiri Olsa, Li Zefan, Peter Zijlstra, pi3orama, Wang Nan,
	linux-kernel, He Kuang, Masami Hiramatsu, Namhyung Kim

This patch allows following config terms and option:

 # perf record --overwrite ...

   Globally set following events to overwrite;

 # perf record --event cycles/overwrite/ ...
 # perf record --event cycles/no-overwrite/ ...

Set specific events to be overwrite or no-overwrite.

Signed-off-by: Wang Nan <wangnan0@huawei.com>
Signed-off-by: He Kuang <hekuang@huawei.com>
Cc: Arnaldo Carvalho de Melo <acme@redhat.com>
Cc: Jiri Olsa <jolsa@kernel.org>
Cc: Masami Hiramatsu <masami.hiramatsu.pt@hitachi.com>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Zefan Li <lizefan@huawei.com>
Cc: pi3orama@163.com
---
 tools/perf/builtin-record.c    |  1 +
 tools/perf/perf.h              |  1 +
 tools/perf/util/evsel.c        |  4 ++++
 tools/perf/util/evsel.h        |  2 ++
 tools/perf/util/parse-events.c | 14 ++++++++++++++
 tools/perf/util/parse-events.h |  2 ++
 tools/perf/util/parse-events.l |  2 ++
 7 files changed, 26 insertions(+)

diff --git a/tools/perf/builtin-record.c b/tools/perf/builtin-record.c
index 120b3bb..56e796b 100644
--- a/tools/perf/builtin-record.c
+++ b/tools/perf/builtin-record.c
@@ -1273,6 +1273,7 @@ struct option __record_options[] = {
 	OPT_BOOLEAN_SET('i', "no-inherit", &record.opts.no_inherit,
 			&record.opts.no_inherit_set,
 			"child tasks do not inherit counters"),
+	OPT_BOOLEAN(0, "overwrite", &record.opts.overwrite, "use overwrite mode"),
 	OPT_UINTEGER('F', "freq", &record.opts.user_freq, "profile at this frequency"),
 	OPT_CALLBACK('m', "mmap-pages", &record.opts, "pages[,pages]",
 		     "number of mmap data pages and AUX area tracing mmap pages",
diff --git a/tools/perf/perf.h b/tools/perf/perf.h
index 198345e..7a65a92 100644
--- a/tools/perf/perf.h
+++ b/tools/perf/perf.h
@@ -60,6 +60,7 @@ struct record_opts {
 	bool	     record_switch_events;
 	bool	     all_kernel;
 	bool	     all_user;
+	bool	     overwrite;
 	unsigned int freq;
 	unsigned int mmap_pages;
 	unsigned int auxtrace_mmap_pages;
diff --git a/tools/perf/util/evsel.c b/tools/perf/util/evsel.c
index 510afa4..10dfdd1 100644
--- a/tools/perf/util/evsel.c
+++ b/tools/perf/util/evsel.c
@@ -670,6 +670,9 @@ static void apply_config_terms(struct perf_evsel *evsel,
 			 */
 			attr->inherit = term->val.inherit ? 1 : 0;
 			break;
+		case PERF_EVSEL__CONFIG_TERM_OVERWRITE:
+			evsel->overwrite = term->val.overwrite ? 1 : 0;
+			break;
 		default:
 			break;
 		}
@@ -745,6 +748,7 @@ void perf_evsel__config(struct perf_evsel *evsel, struct record_opts *opts)
 
 	attr->sample_id_all = perf_missing_features.sample_id_all ? 0 : 1;
 	attr->inherit	    = !opts->no_inherit;
+	evsel->overwrite    = opts->overwrite;
 
 	perf_evsel__set_sample_bit(evsel, IP);
 	perf_evsel__set_sample_bit(evsel, TID);
diff --git a/tools/perf/util/evsel.h b/tools/perf/util/evsel.h
index 03c70e5..aa976f9 100644
--- a/tools/perf/util/evsel.h
+++ b/tools/perf/util/evsel.h
@@ -44,6 +44,7 @@ enum {
 	PERF_EVSEL__CONFIG_TERM_CALLGRAPH,
 	PERF_EVSEL__CONFIG_TERM_STACK_USER,
 	PERF_EVSEL__CONFIG_TERM_INHERIT,
+	PERF_EVSEL__CONFIG_TERM_OVERWRITE,
 	PERF_EVSEL__CONFIG_TERM_MAX,
 };
 
@@ -57,6 +58,7 @@ struct perf_evsel_config_term {
 		char	*callgraph;
 		u64	stack_user;
 		bool	inherit;
+		bool	overwrite;
 	} val;
 };
 
diff --git a/tools/perf/util/parse-events.c b/tools/perf/util/parse-events.c
index 4c19d5e..707e514 100644
--- a/tools/perf/util/parse-events.c
+++ b/tools/perf/util/parse-events.c
@@ -992,6 +992,12 @@ do {									   \
 	case PARSE_EVENTS__TERM_TYPE_NOINHERIT:
 		CHECK_TYPE_VAL(NUM);
 		break;
+	case PARSE_EVENTS__TERM_TYPE_OVERWRITE:
+		CHECK_TYPE_VAL(NUM);
+		break;
+	case PARSE_EVENTS__TERM_TYPE_NOOVERWRITE:
+		CHECK_TYPE_VAL(NUM);
+		break;
 	case PARSE_EVENTS__TERM_TYPE_NAME:
 		CHECK_TYPE_VAL(STR);
 		break;
@@ -1040,6 +1046,8 @@ static int config_term_tracepoint(struct perf_event_attr *attr,
 	case PARSE_EVENTS__TERM_TYPE_STACKSIZE:
 	case PARSE_EVENTS__TERM_TYPE_INHERIT:
 	case PARSE_EVENTS__TERM_TYPE_NOINHERIT:
+	case PARSE_EVENTS__TERM_TYPE_OVERWRITE:
+	case PARSE_EVENTS__TERM_TYPE_NOOVERWRITE:
 		return config_term_common(attr, term, err);
 	default:
 		if (err) {
@@ -1109,6 +1117,12 @@ do {								\
 		case PARSE_EVENTS__TERM_TYPE_NOINHERIT:
 			ADD_CONFIG_TERM(INHERIT, inherit, term->val.num ? 0 : 1);
 			break;
+		case PARSE_EVENTS__TERM_TYPE_OVERWRITE:
+			ADD_CONFIG_TERM(OVERWRITE, overwrite, term->val.num ? 1 : 0);
+			break;
+		case PARSE_EVENTS__TERM_TYPE_NOOVERWRITE:
+			ADD_CONFIG_TERM(OVERWRITE, overwrite, term->val.num ? 0 : 1);
+			break;
 		default:
 			break;
 		}
diff --git a/tools/perf/util/parse-events.h b/tools/perf/util/parse-events.h
index 67e4930..c7e6e51 100644
--- a/tools/perf/util/parse-events.h
+++ b/tools/perf/util/parse-events.h
@@ -69,6 +69,8 @@ enum {
 	PARSE_EVENTS__TERM_TYPE_STACKSIZE,
 	PARSE_EVENTS__TERM_TYPE_NOINHERIT,
 	PARSE_EVENTS__TERM_TYPE_INHERIT,
+	PARSE_EVENTS__TERM_TYPE_NOOVERWRITE,
+	PARSE_EVENTS__TERM_TYPE_OVERWRITE,
 	__PARSE_EVENTS__TERM_TYPE_NR,
 };
 
diff --git a/tools/perf/util/parse-events.l b/tools/perf/util/parse-events.l
index 1477fbc..cc4c426 100644
--- a/tools/perf/util/parse-events.l
+++ b/tools/perf/util/parse-events.l
@@ -201,6 +201,8 @@ call-graph		{ return term(yyscanner, PARSE_EVENTS__TERM_TYPE_CALLGRAPH); }
 stack-size		{ return term(yyscanner, PARSE_EVENTS__TERM_TYPE_STACKSIZE); }
 inherit			{ return term(yyscanner, PARSE_EVENTS__TERM_TYPE_INHERIT); }
 no-inherit		{ return term(yyscanner, PARSE_EVENTS__TERM_TYPE_NOINHERIT); }
+overwrite		{ return term(yyscanner, PARSE_EVENTS__TERM_TYPE_OVERWRITE); }
+no-overwrite		{ return term(yyscanner, PARSE_EVENTS__TERM_TYPE_NOOVERWRITE); }
 ,			{ return ','; }
 "/"			{ BEGIN(INITIAL); return '/'; }
 {name_minus}		{ return str(yyscanner, PE_NAME); }
-- 
1.8.3.4

^ permalink raw reply related	[flat|nested] 60+ messages in thread

* [PATCH 39/46] perf tools: Set write_backward attribut bit for overwrite events
  2016-02-26  9:31 [PATCH 00/46] perf tools: Fix and improvements (bpf and overwrite) Wang Nan
                   ` (37 preceding siblings ...)
  2016-02-26  9:32 ` [PATCH 38/46] perf tools: Enable overwrite settings Wang Nan
@ 2016-02-26  9:32 ` Wang Nan
  2016-02-26  9:32 ` [PATCH 40/46] perf tools: Record fd into perf_mmap Wang Nan
                   ` (6 subsequent siblings)
  45 siblings, 0 replies; 60+ messages in thread
From: Wang Nan @ 2016-02-26  9:32 UTC (permalink / raw)
  To: Alexei Starovoitov, Arnaldo Carvalho de Melo, Arnaldo Carvalho de Melo
  Cc: Jiri Olsa, Li Zefan, Peter Zijlstra, pi3orama, Wang Nan,
	linux-kernel, He Kuang, Masami Hiramatsu, Namhyung Kim

write_backward attribute makes kernel filling ring buffer from the end
of it, makes reading from overwrite ring buffer possible.

This patch select this attribute if evsel->overwrite is selected
explicitly by user.

Overwrite and write_backward are still controled separatly for legacy
readonly mmap users (most of them are in perf/tests).

Signed-off-by: Wang Nan <wangnan0@huawei.com>
Signed-off-by: He Kuang <hekuang@huawei.com>
Cc: Arnaldo Carvalho de Melo <acme@redhat.com>
Cc: Jiri Olsa <jolsa@kernel.org>
Cc: Masami Hiramatsu <masami.hiramatsu.pt@hitachi.com>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Zefan Li <lizefan@huawei.com>
Cc: pi3orama@163.com
---
 tools/perf/builtin-record.c |  7 +++++++
 tools/perf/util/evlist.c    |  2 ++
 tools/perf/util/evlist.h    |  1 +
 tools/perf/util/evsel.c     | 13 +++++++++++++
 4 files changed, 23 insertions(+)

diff --git a/tools/perf/builtin-record.c b/tools/perf/builtin-record.c
index 56e796b..a069f75 100644
--- a/tools/perf/builtin-record.c
+++ b/tools/perf/builtin-record.c
@@ -333,6 +333,13 @@ static int record__open(struct record *rec)
 	perf_evlist__config(evlist, opts);
 
 	evlist__for_each(evlist, pos) {
+		if (pos->overwrite) {
+			if (!pos->attr.write_backward) {
+				ui__warning("Unable to read from overwrite ring buffer\n\n");
+				rc = -ENOSYS;
+				goto out;
+			}
+		}
 try_again:
 		if (perf_evsel__open(pos, pos->cpus, pos->threads) < 0) {
 			if (perf_evsel__fallback(pos, errno, msg, sizeof(msg))) {
diff --git a/tools/perf/util/evlist.c b/tools/perf/util/evlist.c
index c8112805..7877061 100644
--- a/tools/perf/util/evlist.c
+++ b/tools/perf/util/evlist.c
@@ -973,6 +973,8 @@ perf_evlist__channel_for_evsel(struct perf_evsel *evsel)
 
 	if (evsel->overwrite)
 		flag |= PERF_EVLIST__CHANNEL_RDONLY;
+	if (evsel->attr.write_backward)
+		flag |= PERF_EVLIST__CHANNEL_BACKWARD;
 	return flag;
 }
 
diff --git a/tools/perf/util/evlist.h b/tools/perf/util/evlist.h
index 21a8b85..321224c 100644
--- a/tools/perf/util/evlist.h
+++ b/tools/perf/util/evlist.h
@@ -24,6 +24,7 @@ struct record_opts;
 enum perf_evlist_mmap_flag {
 	PERF_EVLIST__CHANNEL_ENABLED	= 1,
 	PERF_EVLIST__CHANNEL_RDONLY	= 2,
+	PERF_EVLIST__CHANNEL_BACKWARD	= 4,
 };
 
 /**
diff --git a/tools/perf/util/evsel.c b/tools/perf/util/evsel.c
index 10dfdd1..0bbd5ef 100644
--- a/tools/perf/util/evsel.c
+++ b/tools/perf/util/evsel.c
@@ -678,6 +678,19 @@ static void apply_config_terms(struct perf_evsel *evsel,
 		}
 	}
 
+	/*
+	 * Set backward after config term processing because it is
+	 * possible to set overwrite globally, without config
+	 * terms.
+	 */
+	if (evsel->overwrite) {
+		if (opts->has_write_backward)
+			attr->write_backward = 1;
+		else
+			pr_err("Reading from overwrite event %s is not supported\n",
+			       evsel->name);
+	}
+
 	/* User explicitly set per-event callgraph, clear the old setting and reset. */
 	if ((callgraph_buf != NULL) || (dump_size > 0)) {
 
-- 
1.8.3.4

^ permalink raw reply related	[flat|nested] 60+ messages in thread

* [PATCH 40/46] perf tools: Record fd into perf_mmap
  2016-02-26  9:31 [PATCH 00/46] perf tools: Fix and improvements (bpf and overwrite) Wang Nan
                   ` (38 preceding siblings ...)
  2016-02-26  9:32 ` [PATCH 39/46] perf tools: Set write_backward attribut bit for overwrite events Wang Nan
@ 2016-02-26  9:32 ` Wang Nan
  2016-02-26  9:32 ` [PATCH 41/46] perf tools: Add API to pause a channel Wang Nan
                   ` (5 subsequent siblings)
  45 siblings, 0 replies; 60+ messages in thread
From: Wang Nan @ 2016-02-26  9:32 UTC (permalink / raw)
  To: Alexei Starovoitov, Arnaldo Carvalho de Melo, Arnaldo Carvalho de Melo
  Cc: Jiri Olsa, Li Zefan, Peter Zijlstra, pi3orama, Wang Nan,
	linux-kernel, He Kuang, Masami Hiramatsu, Namhyung Kim

Add a fd field into perf_mmap so perf can backtrack the fd from mmap.
This feature will be used to toggle overwrite ring buffers.

Signed-off-by: Wang Nan <wangnan0@huawei.com>
Signed-off-by: He Kuang <hekuang@huawei.com>
Cc: Arnaldo Carvalho de Melo <acme@redhat.com>
Cc: Jiri Olsa <jolsa@kernel.org>
Cc: Masami Hiramatsu <masami.hiramatsu.pt@hitachi.com>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Zefan Li <lizefan@huawei.com>
Cc: pi3orama@163.com
---
 tools/perf/util/evlist.c | 15 +++++++++++++--
 tools/perf/util/evlist.h |  1 +
 2 files changed, 14 insertions(+), 2 deletions(-)

diff --git a/tools/perf/util/evlist.c b/tools/perf/util/evlist.c
index 7877061..c72905d 100644
--- a/tools/perf/util/evlist.c
+++ b/tools/perf/util/evlist.c
@@ -868,6 +868,7 @@ static void __perf_evlist__munmap(struct perf_evlist *evlist, int idx)
 	if (evlist->mmap[idx].base != NULL) {
 		munmap(evlist->mmap[idx].base, evlist->mmap_len);
 		evlist->mmap[idx].base = NULL;
+		evlist->mmap[idx].fd = -1;
 		atomic_set(&evlist->mmap[idx].refcnt, 0);
 	}
 	auxtrace_mmap__munmap(&evlist->mmap[idx].auxtrace_mmap);
@@ -903,7 +904,7 @@ void perf_evlist__munmap(struct perf_evlist *evlist)
 
 static int perf_evlist__alloc_mmap(struct perf_evlist *evlist)
 {
-	int total_mmaps;
+	int total_mmaps, i;
 
 	evlist->nr_mmaps = cpu_map__nr(evlist->cpus);
 	if (cpu_map__empty(evlist->cpus))
@@ -914,7 +915,12 @@ static int perf_evlist__alloc_mmap(struct perf_evlist *evlist)
 		return -EINVAL;
 
 	evlist->mmap = zalloc(total_mmaps * sizeof(struct perf_mmap));
-	return evlist->mmap != NULL ? 0 : -ENOMEM;
+	if (!evlist->mmap)
+		return -ENOMEM;
+
+	for (i = 0; i < total_mmaps; i++)
+		evlist->mmap[i].fd = -1;
+	return 0;
 }
 
 struct mmap_params {
@@ -934,6 +940,10 @@ static int __perf_evlist__mmap(struct perf_evlist *evlist, int idx,
 	if (!perf_evlist__channel_check(evlist, channel, RDONLY))
 		prot |= PROT_WRITE;
 
+	if (evlist->mmap[idx].fd >= 0) {
+		pr_err("idx %d already mapped\n", idx);
+		return -1;
+	}
 	/*
 	 * The last one will be done at perf_evlist__mmap_consume(), so that we
 	 * make sure we don't prevent tools from consuming every last event in
@@ -958,6 +968,7 @@ static int __perf_evlist__mmap(struct perf_evlist *evlist, int idx,
 		evlist->mmap[idx].base = NULL;
 		return -1;
 	}
+	evlist->mmap[idx].fd = fd;
 
 	if (auxtrace_mmap__mmap(&evlist->mmap[idx].auxtrace_mmap,
 				&mp->auxtrace_mp, evlist->mmap[idx].base, fd))
diff --git a/tools/perf/util/evlist.h b/tools/perf/util/evlist.h
index 321224c..bc6d787 100644
--- a/tools/perf/util/evlist.h
+++ b/tools/perf/util/evlist.h
@@ -35,6 +35,7 @@ enum perf_evlist_mmap_flag {
 struct perf_mmap {
 	void		 *base;
 	int		 mask;
+	int		 fd;
 	atomic_t	 refcnt;
 	u64		 prev;
 	struct auxtrace_mmap auxtrace_mmap;
-- 
1.8.3.4

^ permalink raw reply related	[flat|nested] 60+ messages in thread

* [PATCH 41/46] perf tools: Add API to pause a channel
  2016-02-26  9:31 [PATCH 00/46] perf tools: Fix and improvements (bpf and overwrite) Wang Nan
                   ` (39 preceding siblings ...)
  2016-02-26  9:32 ` [PATCH 40/46] perf tools: Record fd into perf_mmap Wang Nan
@ 2016-02-26  9:32 ` Wang Nan
  2016-02-26  9:32 ` [PATCH 42/46] perf record: Toggle overwrite ring buffer for reading Wang Nan
                   ` (4 subsequent siblings)
  45 siblings, 0 replies; 60+ messages in thread
From: Wang Nan @ 2016-02-26  9:32 UTC (permalink / raw)
  To: Alexei Starovoitov, Arnaldo Carvalho de Melo, Arnaldo Carvalho de Melo
  Cc: Jiri Olsa, Li Zefan, Peter Zijlstra, pi3orama, Wang Nan,
	linux-kernel, He Kuang, Masami Hiramatsu, Namhyung Kim

perf_evlist__channel_toggle_paused() is introduced to pause/resume a
channel in an evlist. Utilize PERF_EVENT_IOC_PAUSE_OUTPUT ioctl.
Following commits use perf_evlist__channel_toggle_paused() to ensure
overwrite ring buffer is turned off before reading.

Signed-off-by: Wang Nan <wangnan0@huawei.com>
Signed-off-by: He Kuang <hekuang@huawei.com>
Cc: Arnaldo Carvalho de Melo <acme@redhat.com>
Cc: Jiri Olsa <jolsa@kernel.org>
Cc: Masami Hiramatsu <masami.hiramatsu.pt@hitachi.com>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Zefan Li <lizefan@huawei.com>
Cc: pi3orama@163.com
---
 tools/perf/util/evlist.c | 28 ++++++++++++++++++++++++++++
 tools/perf/util/evlist.h |  2 ++
 2 files changed, 30 insertions(+)

diff --git a/tools/perf/util/evlist.c b/tools/perf/util/evlist.c
index c72905d..b5da393 100644
--- a/tools/perf/util/evlist.c
+++ b/tools/perf/util/evlist.c
@@ -706,6 +706,34 @@ int perf_evlist__channel_idx(struct perf_evlist *evlist,
 	return 0;
 }
 
+int perf_evlist__channel_toggle_paused(struct perf_evlist *evlist,
+				       int channel, bool pause)
+{
+	int i;
+
+	if (channel >= perf_evlist__channel_nr(evlist))
+		return -E2BIG;
+	if (!evlist->mmap)
+		return -EFAULT;
+	for (i = 0; i < evlist->nr_mmaps; i++) {
+		int n = channel * evlist->nr_mmaps + i;
+		int fd = evlist->mmap[n].fd;
+		int err;
+
+		if (fd < 0)
+			continue;
+		err = ioctl(fd, PERF_EVENT_IOC_PAUSE_OUTPUT,
+			    pause ? 1 : 0);
+		if (err) {
+			err = (errno == 0 ? -EINVAL : -errno);
+			pr_err("Unable to pause output on %d: %s\n",
+			       fd, strerror(-err));
+			return err;
+		}
+	}
+	return 0;
+}
+
 union perf_event *perf_evlist__mmap_read_ex(struct perf_evlist *evlist,
 					    int channel, int idx)
 {
diff --git a/tools/perf/util/evlist.h b/tools/perf/util/evlist.h
index bc6d787..c1831a9 100644
--- a/tools/perf/util/evlist.h
+++ b/tools/perf/util/evlist.h
@@ -180,6 +180,8 @@ perf_evlist__get_mmap(struct perf_evlist *evlist,
 	return &evlist->mmap[idx];
 }
 
+int perf_evlist__channel_toggle_paused(struct perf_evlist *evlist,
+				       int channel, bool pause);
 int perf_evlist__open(struct perf_evlist *evlist);
 void perf_evlist__close(struct perf_evlist *evlist);
 
-- 
1.8.3.4

^ permalink raw reply related	[flat|nested] 60+ messages in thread

* [PATCH 42/46] perf record: Toggle overwrite ring buffer for reading
  2016-02-26  9:31 [PATCH 00/46] perf tools: Fix and improvements (bpf and overwrite) Wang Nan
                   ` (40 preceding siblings ...)
  2016-02-26  9:32 ` [PATCH 41/46] perf tools: Add API to pause a channel Wang Nan
@ 2016-02-26  9:32 ` Wang Nan
  2016-02-26  9:32 ` [PATCH 43/46] perf record: Rename variable to make code clear Wang Nan
                   ` (3 subsequent siblings)
  45 siblings, 0 replies; 60+ messages in thread
From: Wang Nan @ 2016-02-26  9:32 UTC (permalink / raw)
  To: Alexei Starovoitov, Arnaldo Carvalho de Melo, Arnaldo Carvalho de Melo
  Cc: Jiri Olsa, Li Zefan, Peter Zijlstra, pi3orama, Wang Nan,
	linux-kernel, He Kuang, Masami Hiramatsu, Namhyung Kim

Reading from a overwrite ring buffer is unrelible.
perf_evlist__channel_toggle_paused() should be called before
reading from them.

Toggel overwrite_evt_paused director after receiving done or switch
output.

Signed-off-by: Wang Nan <wangnan0@huawei.com>
Signed-off-by: He Kuang <hekuang@huawei.com>
Cc: Arnaldo Carvalho de Melo <acme@redhat.com>
Cc: Jiri Olsa <jolsa@kernel.org>
Cc: Masami Hiramatsu <masami.hiramatsu.pt@hitachi.com>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Zefan Li <lizefan@huawei.com>
Cc: pi3orama@163.com
---
 tools/perf/builtin-record.c | 79 +++++++++++++++++++++++++++++++++++++++++++++
 1 file changed, 79 insertions(+)

diff --git a/tools/perf/builtin-record.c b/tools/perf/builtin-record.c
index a069f75..8d247cf 100644
--- a/tools/perf/builtin-record.c
+++ b/tools/perf/builtin-record.c
@@ -40,6 +40,11 @@
 #include <sys/mman.h>
 #include <asm/bug.h>
 
+enum overwrite_evt_state {
+	OVERWRITE_EVT_RUNNING,
+	OVERWRITE_EVT_DATA_PENDING,
+	OVERWRITE_EVT_EMPTY,
+};
 
 struct record {
 	struct perf_tool	tool;
@@ -58,6 +63,7 @@ struct record {
 	bool			buildid_all;
 	bool			timestamp_filename;
 	bool			switch_output;
+	enum overwrite_evt_state overwrite_evt_state;
 	unsigned long long	samples;
 };
 
@@ -389,6 +395,7 @@ try_again:
 
 	session->evlist = evlist;
 	perf_session__set_id_hdr_size(session);
+	rec->overwrite_evt_state = OVERWRITE_EVT_RUNNING;
 out:
 	return rc;
 }
@@ -469,6 +476,52 @@ static struct perf_event_header finished_round_event = {
 	.type = PERF_RECORD_FINISHED_ROUND,
 };
 
+static void
+record__toggle_overwrite_evsels(struct record *rec,
+				enum overwrite_evt_state state)
+{
+	struct perf_evlist *evlist = rec->evlist;
+	enum overwrite_evt_state old_state = rec->overwrite_evt_state;
+	enum action {
+		NONE,
+		PAUSE,
+		RESUME,
+	} action = NONE;
+	int ch, nr_channels;
+
+	switch (old_state) {
+	case OVERWRITE_EVT_RUNNING:
+		if (state != OVERWRITE_EVT_RUNNING)
+			action = PAUSE;
+		break;
+	case OVERWRITE_EVT_DATA_PENDING:
+		if (state == OVERWRITE_EVT_RUNNING)
+			action = RESUME;
+		break;
+	case OVERWRITE_EVT_EMPTY:
+		if (state == OVERWRITE_EVT_RUNNING)
+			action = RESUME;
+		if (state == OVERWRITE_EVT_DATA_PENDING)
+			state = OVERWRITE_EVT_EMPTY;
+		break;
+	default:
+		WARN_ONCE(1, "Shouldn't get there\n");
+	}
+
+	rec->overwrite_evt_state = state;
+
+	if (action == NONE)
+		return;
+
+	nr_channels = perf_evlist__channel_nr(evlist);
+	for (ch = 0; ch < nr_channels; ch++) {
+		if (!perf_evlist__channel_check(evlist, ch, RDONLY))
+			continue;
+		perf_evlist__channel_toggle_paused(evlist, ch,
+						   action == PAUSE);
+	}
+}
+
 static bool record__mmap_should_read(struct record *rec, int idx)
 {
 	int channel = -1;
@@ -513,6 +566,8 @@ static int record__mmap_read_all(struct record *rec)
 	if (bytes_written != rec->bytes_written)
 		rc = record__write(rec, &finished_round_event, sizeof(finished_round_event));
 
+	if (rec->overwrite_evt_state == OVERWRITE_EVT_DATA_PENDING)
+		record__toggle_overwrite_evsels(rec, OVERWRITE_EVT_EMPTY);
 out:
 	return rc;
 }
@@ -872,6 +927,17 @@ static int __cmd_record(struct record *rec, int argc, const char **argv)
 	for (;;) {
 		unsigned long long hits = rec->samples;
 
+		/*
+		 * rec->overwrite_evt_state is possible to be
+		 * OVERWRITE_EVT_EMPTY here: when done == true and
+		 * hits != rec->samples after previous reading.
+		 *
+		 * record__toggle_overwrite_evsels ensure we never
+		 * convert OVERWRITE_EVT_EMPTY to OVERWRITE_EVT_DATA_PENDING.
+		 */
+		if (switch_output_started || done || draining)
+			record__toggle_overwrite_evsels(rec, OVERWRITE_EVT_DATA_PENDING);
+
 		if (record__mmap_read_all(rec) < 0) {
 			auxtrace_snapshot_disable();
 			err = -1;
@@ -890,7 +956,20 @@ static int __cmd_record(struct record *rec, int argc, const char **argv)
 		}
 
 		if (switch_output_started) {
+			/*
+			 * SIGUSR2 raise after or during record__mmap_read_all().
+			 * continue to read again.
+			 */
+			if (rec->overwrite_evt_state == OVERWRITE_EVT_RUNNING)
+				continue;
+
 			switch_output_started = 0;
+			/*
+			 * Reenable events in overwrite ring buffer after
+			 * record__mmap_read_all(): we should have collected
+			 * data from it.
+			 */
+			record__toggle_overwrite_evsels(rec, OVERWRITE_EVT_RUNNING);
 
 			if (!quiet)
 				fprintf(stderr, "[ perf record: dump data: Woken up %ld times ]\n",
-- 
1.8.3.4

^ permalink raw reply related	[flat|nested] 60+ messages in thread

* [PATCH 43/46] perf record: Rename variable to make code clear
  2016-02-26  9:31 [PATCH 00/46] perf tools: Fix and improvements (bpf and overwrite) Wang Nan
                   ` (41 preceding siblings ...)
  2016-02-26  9:32 ` [PATCH 42/46] perf record: Toggle overwrite ring buffer for reading Wang Nan
@ 2016-02-26  9:32 ` Wang Nan
  2016-02-26  9:32 ` [PATCH 44/46] perf record: Read from backward ring buffer Wang Nan
                   ` (2 subsequent siblings)
  45 siblings, 0 replies; 60+ messages in thread
From: Wang Nan @ 2016-02-26  9:32 UTC (permalink / raw)
  To: Alexei Starovoitov, Arnaldo Carvalho de Melo, Arnaldo Carvalho de Melo
  Cc: Jiri Olsa, Li Zefan, Peter Zijlstra, pi3orama, Wang Nan,
	linux-kernel, He Kuang, Masami Hiramatsu, Namhyung Kim

record__mmap_read() write data from ring buffer into perf.data.
'head' is maintained by kernel, points to the last writtend record.
'old' is maintained by perf, points to the record read in previous
round. record__mmap_read() saves data from 'old' to 'head' to
perf.data. The naming of variables are not easy to read. In addition,
when dealing with backward writing ring buffer, the md->prev pointer
should point to 'head' instead of the last byte it got.

Add start and end pointer to make code clear and set md->prev to 'head'
instead of the moved 'old' pointer. This patch doesn't change
behavior since:

    buf = &data[old & md->mask];
    size = head - old;
    old += size;     <--- Here, old == head

Signed-off-by: Wang Nan <wangnan0@huawei.com>
Signed-off-by: He Kuang <hekuang@huawei.com>
Cc: Arnaldo Carvalho de Melo <acme@redhat.com>
Cc: Jiri Olsa <jolsa@kernel.org>
Cc: Masami Hiramatsu <masami.hiramatsu.pt@hitachi.com>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Zefan Li <lizefan@huawei.com>
Cc: pi3orama@163.com
---
 tools/perf/builtin-record.c | 21 +++++++++++----------
 1 file changed, 11 insertions(+), 10 deletions(-)

diff --git a/tools/perf/builtin-record.c b/tools/perf/builtin-record.c
index 8d247cf..eee3436 100644
--- a/tools/perf/builtin-record.c
+++ b/tools/perf/builtin-record.c
@@ -92,17 +92,18 @@ static int record__mmap_read(struct record *rec, int idx)
 	struct perf_mmap *md = &rec->evlist->mmap[idx];
 	u64 head = perf_mmap__read_head(md);
 	u64 old = md->prev;
+	u64 end = head, start = old;
 	unsigned char *data = md->base + page_size;
 	unsigned long size;
 	void *buf;
 	int rc = 0;
 
-	if (old == head)
+	if (start == end)
 		return 0;
 
 	rec->samples++;
 
-	size = head - old;
+	size = end - start;
 	if (size > (unsigned long)(md->mask) + 1) {
 		WARN_ONCE(1, "failed to keep up with mmap data. (warn only once)\n");
 
@@ -111,10 +112,10 @@ static int record__mmap_read(struct record *rec, int idx)
 		return 0;
 	}
 
-	if ((old & md->mask) + size != (head & md->mask)) {
-		buf = &data[old & md->mask];
-		size = md->mask + 1 - (old & md->mask);
-		old += size;
+	if ((start & md->mask) + size != (end & md->mask)) {
+		buf = &data[start & md->mask];
+		size = md->mask + 1 - (start & md->mask);
+		start += size;
 
 		if (record__write(rec, buf, size) < 0) {
 			rc = -1;
@@ -122,16 +123,16 @@ static int record__mmap_read(struct record *rec, int idx)
 		}
 	}
 
-	buf = &data[old & md->mask];
-	size = head - old;
-	old += size;
+	buf = &data[start & md->mask];
+	size = end - start;
+	start += size;
 
 	if (record__write(rec, buf, size) < 0) {
 		rc = -1;
 		goto out;
 	}
 
-	md->prev = old;
+	md->prev = head;
 	perf_evlist__mmap_consume(rec->evlist, idx);
 out:
 	return rc;
-- 
1.8.3.4

^ permalink raw reply related	[flat|nested] 60+ messages in thread

* [PATCH 44/46] perf record: Read from backward ring buffer
  2016-02-26  9:31 [PATCH 00/46] perf tools: Fix and improvements (bpf and overwrite) Wang Nan
                   ` (42 preceding siblings ...)
  2016-02-26  9:32 ` [PATCH 43/46] perf record: Rename variable to make code clear Wang Nan
@ 2016-02-26  9:32 ` Wang Nan
  2016-02-26  9:32 ` [PATCH 45/46] perf record: Allow generate tracking events at the end of output Wang Nan
  2016-02-26  9:32 ` [PATCH 46/46] perf tools: Don't warn about out of order event if write_backward is used Wang Nan
  45 siblings, 0 replies; 60+ messages in thread
From: Wang Nan @ 2016-02-26  9:32 UTC (permalink / raw)
  To: Alexei Starovoitov, Arnaldo Carvalho de Melo, Arnaldo Carvalho de Melo
  Cc: Jiri Olsa, Li Zefan, Peter Zijlstra, pi3orama, Wang Nan,
	linux-kernel, He Kuang, Masami Hiramatsu, Namhyung Kim

Introduce rb_find_range() to find start and end position from a backward
ring buffer.

Signed-off-by: Wang Nan <wangnan0@huawei.com>
Signed-off-by: He Kuang <hekuang@huawei.com>
Cc: Arnaldo Carvalho de Melo <acme@redhat.com>
Cc: Jiri Olsa <jolsa@kernel.org>
Cc: Masami Hiramatsu <masami.hiramatsu.pt@hitachi.com>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Zefan Li <lizefan@huawei.com>
Cc: pi3orama@163.com
---
 tools/perf/builtin-record.c | 69 +++++++++++++++++++++++++++++++++++++++++++--
 1 file changed, 67 insertions(+), 2 deletions(-)

diff --git a/tools/perf/builtin-record.c b/tools/perf/builtin-record.c
index eee3436..0ce7f52 100644
--- a/tools/perf/builtin-record.c
+++ b/tools/perf/builtin-record.c
@@ -87,6 +87,61 @@ static int process_synthesized_event(struct perf_tool *tool,
 	return record__write(rec, event, event->header.size);
 }
 
+static int
+backward_rb_find_range(void *buf, int mask, u64 head, u64 *start, u64 *end)
+{
+	struct perf_event_header *pheader;
+	u64 evt_head = head;
+	int size = mask + 1;
+
+	pr_debug2("backward_rb_find_range: buf=%p, head=%"PRIx64"\n", buf, head);
+	pheader = (struct perf_event_header *)(buf + (head & mask));
+	*start = head;
+	while (true) {
+		if (evt_head - head >= (unsigned int)size) {
+			pr_debug("Finshed reading backward ring buffer: rewind\n");
+			if (evt_head - head > (unsigned int)size)
+				evt_head -= pheader->size;
+			*end = evt_head;
+			return 0;
+		}
+
+		pheader = (struct perf_event_header *)(buf + (evt_head & mask));
+
+		if (pheader->size == 0) {
+			pr_debug("Finshed reading backward ring buffer: get start\n");
+			*end = evt_head;
+			return 0;
+		}
+
+		evt_head += pheader->size;
+		pr_debug3("move evt_head: %"PRIx64"\n", evt_head);
+	}
+	WARN_ONCE(1, "Shouldn't get here\n");
+	return -1;
+}
+
+static int
+rb_find_range(struct perf_evlist *evlist, int idx,
+	      void *data, int mask, u64 head, u64 old,
+	      u64 *start, u64 *end)
+{
+	int channel;
+
+	channel = perf_evlist__idx_channel(evlist, idx);
+	if (!perf_evlist__channel_check(evlist, channel, RDONLY)) {
+		*start = old;
+		*end = head;
+		return 0;
+	}
+
+	if (perf_evlist__channel_check(evlist, channel, BACKWARD))
+		return backward_rb_find_range(data, mask, head, start, end);
+
+	WARN_ONCE(1, "Unable to find start position from a read-only ring buffer\n");
+	return -1;
+}
+
 static int record__mmap_read(struct record *rec, int idx)
 {
 	struct perf_mmap *md = &rec->evlist->mmap[idx];
@@ -98,6 +153,10 @@ static int record__mmap_read(struct record *rec, int idx)
 	void *buf;
 	int rc = 0;
 
+	if (rb_find_range(rec->evlist, idx, data, md->mask, head,
+			  old, &start, &end))
+		return -1;
+
 	if (start == end)
 		return 0;
 
@@ -531,8 +590,14 @@ static bool record__mmap_should_read(struct record *rec, int idx)
 		return false;
 	if (perf_evlist__channel_idx(rec->evlist, &channel, &idx))
 		return false;
-	if (perf_evlist__channel_check(rec->evlist, channel, RDONLY))
-		return false;
+	if (perf_evlist__channel_check(rec->evlist, channel, RDONLY)) {
+		if (rec->overwrite_evt_state != OVERWRITE_EVT_DATA_PENDING)
+			return false;
+		if (perf_evlist__channel_check(rec->evlist, channel, BACKWARD))
+			return true;
+		else
+			return false;
+	}
 	return true;
 }
 
-- 
1.8.3.4

^ permalink raw reply related	[flat|nested] 60+ messages in thread

* [PATCH 45/46] perf record: Allow generate tracking events at the end of output
  2016-02-26  9:31 [PATCH 00/46] perf tools: Fix and improvements (bpf and overwrite) Wang Nan
                   ` (43 preceding siblings ...)
  2016-02-26  9:32 ` [PATCH 44/46] perf record: Read from backward ring buffer Wang Nan
@ 2016-02-26  9:32 ` Wang Nan
  2016-02-26  9:32 ` [PATCH 46/46] perf tools: Don't warn about out of order event if write_backward is used Wang Nan
  45 siblings, 0 replies; 60+ messages in thread
From: Wang Nan @ 2016-02-26  9:32 UTC (permalink / raw)
  To: Alexei Starovoitov, Arnaldo Carvalho de Melo, Arnaldo Carvalho de Melo
  Cc: Jiri Olsa, Li Zefan, Peter Zijlstra, pi3orama, Wang Nan,
	linux-kernel, He Kuang, Masami Hiramatsu, Namhyung Kim

Before this patch tracking events are generated based on information in
/proc before all samples. However, with the introducing of overwrite
evsel in perf record, it becomes inconvenience: 'perf record' now can
executed as a daemon for sereval hours and only capture the last
snapshot when it receives SIGUSR2. The tracking events generated at
the head of output 'perf.data' becomes too old, but most of tracking
events during 'perf record' running are dropped.

This patch generates tracking events at the end of output. The output
events series would better reflecting status of system when SIGUSR2
received.

Signed-off-by: Wang Nan <wangnan0@huawei.com>
Signed-off-by: He Kuang <hekuang@huawei.com>
Cc: Arnaldo Carvalho de Melo <acme@redhat.com>
Cc: Jiri Olsa <jolsa@kernel.org>
Cc: Masami Hiramatsu <masami.hiramatsu.pt@hitachi.com>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Zefan Li <lizefan@huawei.com>
Cc: pi3orama@163.com
---
 tools/perf/builtin-record.c | 45 +++++++++++++++++++++++++++++++++++++++++----
 1 file changed, 41 insertions(+), 4 deletions(-)

diff --git a/tools/perf/builtin-record.c b/tools/perf/builtin-record.c
index 0ce7f52..4cb651a 100644
--- a/tools/perf/builtin-record.c
+++ b/tools/perf/builtin-record.c
@@ -64,6 +64,7 @@ struct record {
 	bool			timestamp_filename;
 	bool			switch_output;
 	enum overwrite_evt_state overwrite_evt_state;
+	bool			tail_tracking;
 	unsigned long long	samples;
 };
 
@@ -703,6 +704,26 @@ static int record__synthesize_workload(struct record *rec)
 
 static int record__synthesize(struct record *rec);
 
+static void record__synthesize_target(struct record *rec)
+{
+	if (target__none(&rec->opts.target)) {
+		struct {
+			struct thread_map map;
+			struct thread_map_data map_data;
+		} thread_map;
+
+		thread_map.map.nr = 1;
+		thread_map.map.map[0].pid = rec->evlist->workload.pid;
+		thread_map.map.map[0].comm = NULL;
+		perf_event__synthesize_thread_map(&rec->tool,
+				&thread_map.map,
+				process_synthesized_event,
+				&rec->session->machines.host,
+				rec->opts.sample_address,
+				rec->opts.proc_map_timeout);
+	}
+}
+
 static int
 record__switch_output(struct record *rec, bool at_exit)
 {
@@ -712,6 +733,11 @@ record__switch_output(struct record *rec, bool at_exit)
 	/* Same Size:      "2015122520103046"*/
 	char timestamp[] = "InvalidTimestamp";
 
+	if (rec->tail_tracking) {
+		record__synthesize(rec);
+		record__synthesize_target(rec);
+	}
+
 	rec->samples = 0;
 	record__finish_output(rec);
 	err = fetch_current_timestamp(timestamp, sizeof(timestamp));
@@ -733,7 +759,7 @@ record__switch_output(struct record *rec, bool at_exit)
 			file->path, timestamp);
 
 	/* Output tracking events */
-	if (!at_exit) {
+	if (!at_exit && !rec->tail_tracking) {
 		record__synthesize(rec);
 
 		/*
@@ -934,9 +960,11 @@ static int __cmd_record(struct record *rec, int argc, const char **argv)
 
 	machine = &session->machines.host;
 
-	err = record__synthesize(rec);
-	if (err < 0)
-		goto out_child;
+	if (!rec->tail_tracking) {
+		err = record__synthesize(rec);
+		if (err < 0)
+			goto out_child;
+	}
 
 	if (rec->realtime_prio) {
 		struct sched_param param;
@@ -1077,6 +1105,13 @@ static int __cmd_record(struct record *rec, int argc, const char **argv)
 			disabled = true;
 		}
 	}
+
+	if (rec->tail_tracking) {
+		err = record__synthesize(rec);
+		if (err < 0)
+			goto out_child;
+	}
+
 	auxtrace_snapshot_disable();
 
 	if (forks && workload_exec_errno) {
@@ -1509,6 +1544,8 @@ struct option __record_options[] = {
 		    "append timestamp to output filename"),
 	OPT_BOOLEAN(0, "switch-output", &record.switch_output,
 		    "Switch output when receive SIGUSR2"),
+	OPT_BOOLEAN(0, "tail-tracking", &record.tail_tracking,
+		    "Generate tracking events at the end of output"),
 	OPT_END()
 };
 
-- 
1.8.3.4

^ permalink raw reply related	[flat|nested] 60+ messages in thread

* [PATCH 46/46] perf tools: Don't warn about out of order event if write_backward is used
  2016-02-26  9:31 [PATCH 00/46] perf tools: Fix and improvements (bpf and overwrite) Wang Nan
                   ` (44 preceding siblings ...)
  2016-02-26  9:32 ` [PATCH 45/46] perf record: Allow generate tracking events at the end of output Wang Nan
@ 2016-02-26  9:32 ` Wang Nan
  45 siblings, 0 replies; 60+ messages in thread
From: Wang Nan @ 2016-02-26  9:32 UTC (permalink / raw)
  To: Alexei Starovoitov, Arnaldo Carvalho de Melo, Arnaldo Carvalho de Melo
  Cc: Jiri Olsa, Li Zefan, Peter Zijlstra, pi3orama, Wang Nan,
	linux-kernel, He Kuang, Masami Hiramatsu, Namhyung Kim

If write_backward attribute is set, records are written into kernel
ring buffer from end to beginning, but read from beginning to end.
To avoid 'XX out of order events recorded' warning message (timestamps
of records is in reverse order when using write_backward), suppress the
warning message if write_backward is selected by at lease one event.

Result:

Before this patch:
 # perf record -m 1 -e raw_syscalls:sys_exit/overwrite/ \
                    -e raw_syscalls:sys_enter \
                    dd if=/dev/zero of=/dev/null count=300
 300+0 records in
 300+0 records out
 153600 bytes (154 kB) copied, 0.000601617 s, 255 MB/s
 [ perf record: Woken up 5 times to write data ]
 Warning:
 40 out of order events recorded.
 [ perf record: Captured and wrote 0.096 MB perf.data (696 samples) ]

After this patch:
 # perf record -m 1 -e raw_syscalls:sys_exit/overwrite/ \
                    -e raw_syscalls:sys_enter \
                    dd if=/dev/zero of=/dev/null count=300
 300+0 records in
 300+0 records out
 153600 bytes (154 kB) copied, 0.000644873 s, 238 MB/s
 [ perf record: Woken up 5 times to write data ]
 [ perf record: Captured and wrote 0.096 MB perf.data (696 samples) ]

Signed-off-by: Wang Nan <wangnan0@huawei.com>
Signed-off-by: He Kuang <hekuang@huawei.com>
Cc: Arnaldo Carvalho de Melo <acme@redhat.com>
Cc: Jiri Olsa <jolsa@kernel.org>
Cc: Masami Hiramatsu <masami.hiramatsu.pt@hitachi.com>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Zefan Li <lizefan@huawei.com>
Cc: pi3orama@163.com
---
 tools/perf/util/session.c | 22 +++++++++++++++++++---
 1 file changed, 19 insertions(+), 3 deletions(-)

diff --git a/tools/perf/util/session.c b/tools/perf/util/session.c
index ab3a296..e41ad39 100644
--- a/tools/perf/util/session.c
+++ b/tools/perf/util/session.c
@@ -1516,10 +1516,27 @@ int perf_session__register_idle_thread(struct perf_session *session)
 	return err;
 }
 
+static void
+perf_session__warn_order(const struct perf_session *session)
+{
+	const struct ordered_events *oe = &session->ordered_events;
+	struct perf_evsel *evsel;
+	bool should_warn = true;
+
+	evlist__for_each(session->evlist, evsel) {
+		if (evsel->attr.write_backward)
+			should_warn = false;
+	}
+
+	if (!should_warn)
+		return;
+	if (oe->nr_unordered_events != 0)
+		ui__warning("%u out of order events recorded.\n", oe->nr_unordered_events);
+}
+
 static void perf_session__warn_about_errors(const struct perf_session *session)
 {
 	const struct events_stats *stats = &session->evlist->stats;
-	const struct ordered_events *oe = &session->ordered_events;
 
 	if (session->tool->lost == perf_event__process_lost &&
 	    stats->nr_events[PERF_RECORD_LOST] != 0) {
@@ -1576,8 +1593,7 @@ static void perf_session__warn_about_errors(const struct perf_session *session)
 			    stats->nr_unprocessable_samples);
 	}
 
-	if (oe->nr_unordered_events != 0)
-		ui__warning("%u out of order events recorded.\n", oe->nr_unordered_events);
+	perf_session__warn_order(session);
 
 	events_stats__auxtrace_error_warn(stats);
 
-- 
1.8.3.4

^ permalink raw reply related	[flat|nested] 60+ messages in thread

* [tip:perf/core] perf config: Bring perf_default_config to the very beginning at main()
  2016-02-26  9:31 ` [PATCH 03/46] perf config: Bring perf_default_config to the very beginning at main() Wang Nan
@ 2016-02-27  9:44   ` tip-bot for Wang Nan
  0 siblings, 0 replies; 60+ messages in thread
From: tip-bot for Wang Nan @ 2016-02-27  9:44 UTC (permalink / raw)
  To: linux-tip-commits
  Cc: mingo, tglx, ast, acme, wangnan0, jolsa, lizefan,
	masami.hiramatsu.pt, linux-kernel, hpa, namhyung

Commit-ID:  b8cbb349061edda648463b086cfa869a7ab583af
Gitweb:     http://git.kernel.org/tip/b8cbb349061edda648463b086cfa869a7ab583af
Author:     Wang Nan <wangnan0@huawei.com>
AuthorDate: Fri, 26 Feb 2016 09:31:51 +0000
Committer:  Arnaldo Carvalho de Melo <acme@redhat.com>
CommitDate: Fri, 26 Feb 2016 19:49:16 -0300

perf config: Bring perf_default_config to the very beginning at main()

Before this patch each subcommand calls perf_config() by themself,
reading the default configuration together with subcommand specific
options. If a subcommand doesn't have it own options, it needs to call
'perf_config(perf_default_config, NULL)' to ensure .perfconfig is
loaded.

This patch brings perf_config(perf_default_config, NULL) to the very
start of main(), so subcommands don't need to do it.

After this patch, 'llvm.clang-path' works for 'perf trace'.

Signed-off-by: Wang Nan <wangnan0@huawei.com>
Suggested-and-Tested-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Cc: Alexei Starovoitov <ast@kernel.org>
Cc: Jiri Olsa <jolsa@kernel.org>
Cc: Li Zefan <lizefan@huawei.com>
Cc: Masami Hiramatsu <masami.hiramatsu.pt@hitachi.com>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: pi3orama@163.com
Link: http://lkml.kernel.org/r/1456479154-136027-4-git-send-email-wangnan0@huawei.com
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
---
 tools/perf/builtin-diff.c          | 2 --
 tools/perf/builtin-help.c          | 2 +-
 tools/perf/builtin-kmem.c          | 4 ++--
 tools/perf/builtin-report.c        | 2 +-
 tools/perf/builtin-top.c           | 4 ++--
 tools/perf/perf.c                  | 2 ++
 tools/perf/tests/llvm.c            | 8 --------
 tools/perf/util/color.c            | 5 +++--
 tools/perf/util/data-convert-bt.c  | 2 +-
 tools/perf/util/help-unknown-cmd.c | 5 +++--
 10 files changed, 15 insertions(+), 21 deletions(-)

diff --git a/tools/perf/builtin-diff.c b/tools/perf/builtin-diff.c
index 36ccc2b..4d72359 100644
--- a/tools/perf/builtin-diff.c
+++ b/tools/perf/builtin-diff.c
@@ -1264,8 +1264,6 @@ int cmd_diff(int argc, const char **argv, const char *prefix __maybe_unused)
 	if (ret < 0)
 		return ret;
 
-	perf_config(perf_default_config, NULL);
-
 	argc = parse_options(argc, argv, options, diff_usage, 0);
 
 	if (symbol__init(NULL) < 0)
diff --git a/tools/perf/builtin-help.c b/tools/perf/builtin-help.c
index f4dd2b4..49d55e2 100644
--- a/tools/perf/builtin-help.c
+++ b/tools/perf/builtin-help.c
@@ -272,7 +272,7 @@ static int perf_help_config(const char *var, const char *value, void *cb)
 	if (!prefixcmp(var, "man."))
 		return add_man_viewer_info(var, value);
 
-	return perf_default_config(var, value, cb);
+	return 0;
 }
 
 static struct cmdnames main_cmds, other_cmds;
diff --git a/tools/perf/builtin-kmem.c b/tools/perf/builtin-kmem.c
index 1180105..4d3340c 100644
--- a/tools/perf/builtin-kmem.c
+++ b/tools/perf/builtin-kmem.c
@@ -1834,7 +1834,7 @@ static int __cmd_record(int argc, const char **argv)
 	return cmd_record(i, rec_argv, NULL);
 }
 
-static int kmem_config(const char *var, const char *value, void *cb)
+static int kmem_config(const char *var, const char *value, void *cb __maybe_unused)
 {
 	if (!strcmp(var, "kmem.default")) {
 		if (!strcmp(value, "slab"))
@@ -1847,7 +1847,7 @@ static int kmem_config(const char *var, const char *value, void *cb)
 		return 0;
 	}
 
-	return perf_default_config(var, value, cb);
+	return 0;
 }
 
 int cmd_kmem(int argc, const char **argv, const char *prefix __maybe_unused)
diff --git a/tools/perf/builtin-report.c b/tools/perf/builtin-report.c
index f4d8244..7eea49f 100644
--- a/tools/perf/builtin-report.c
+++ b/tools/perf/builtin-report.c
@@ -90,7 +90,7 @@ static int report__config(const char *var, const char *value, void *cb)
 		return 0;
 	}
 
-	return perf_default_config(var, value, cb);
+	return 0;
 }
 
 static int hist_iter__report_callback(struct hist_entry_iter *iter,
diff --git a/tools/perf/builtin-top.c b/tools/perf/builtin-top.c
index b86b623..94af190 100644
--- a/tools/perf/builtin-top.c
+++ b/tools/perf/builtin-top.c
@@ -1065,7 +1065,7 @@ parse_callchain_opt(const struct option *opt, const char *arg, int unset)
 	return parse_callchain_top_opt(arg);
 }
 
-static int perf_top_config(const char *var, const char *value, void *cb)
+static int perf_top_config(const char *var, const char *value, void *cb __maybe_unused)
 {
 	if (!strcmp(var, "top.call-graph"))
 		var = "call-graph.record-mode"; /* fall-through */
@@ -1074,7 +1074,7 @@ static int perf_top_config(const char *var, const char *value, void *cb)
 		return 0;
 	}
 
-	return perf_default_config(var, value, cb);
+	return 0;
 }
 
 static int
diff --git a/tools/perf/perf.c b/tools/perf/perf.c
index f632119..aaee0a7 100644
--- a/tools/perf/perf.c
+++ b/tools/perf/perf.c
@@ -548,6 +548,8 @@ int main(int argc, const char **argv)
 
 	srandom(time(NULL));
 
+	perf_config(perf_default_config, NULL);
+
 	/* get debugfs/tracefs mount point from /proc/mounts */
 	tracing_path_mount();
 
diff --git a/tools/perf/tests/llvm.c b/tools/perf/tests/llvm.c
index 70edcdf..cff564f 100644
--- a/tools/perf/tests/llvm.c
+++ b/tools/perf/tests/llvm.c
@@ -6,12 +6,6 @@
 #include "tests.h"
 #include "debug.h"
 
-static int perf_config_cb(const char *var, const char *val,
-			  void *arg __maybe_unused)
-{
-	return perf_default_config(var, val, arg);
-}
-
 #ifdef HAVE_LIBBPF_SUPPORT
 static int test__bpf_parsing(void *obj_buf, size_t obj_buf_sz)
 {
@@ -77,8 +71,6 @@ test_llvm__fetch_bpf_obj(void **p_obj_buf,
 	if (should_load_fail)
 		*should_load_fail = bpf_source_table[idx].should_load_fail;
 
-	perf_config(perf_config_cb, NULL);
-
 	/*
 	 * Skip this test if user's .perfconfig doesn't set [llvm] section
 	 * and clang is not found in $PATH, and this is not perf test -v
diff --git a/tools/perf/util/color.c b/tools/perf/util/color.c
index e5fb88b..43e84aa 100644
--- a/tools/perf/util/color.c
+++ b/tools/perf/util/color.c
@@ -32,14 +32,15 @@ int perf_config_colorbool(const char *var, const char *value, int stdout_is_tty)
 	return 0;
 }
 
-int perf_color_default_config(const char *var, const char *value, void *cb)
+int perf_color_default_config(const char *var, const char *value,
+			      void *cb __maybe_unused)
 {
 	if (!strcmp(var, "color.ui")) {
 		perf_use_color_default = perf_config_colorbool(var, value, -1);
 		return 0;
 	}
 
-	return perf_default_config(var, value, cb);
+	return 0;
 }
 
 static int __color_vsnprintf(char *bf, size_t size, const char *color,
diff --git a/tools/perf/util/data-convert-bt.c b/tools/perf/util/data-convert-bt.c
index b722e57..6729f4d 100644
--- a/tools/perf/util/data-convert-bt.c
+++ b/tools/perf/util/data-convert-bt.c
@@ -1117,7 +1117,7 @@ static int convert__config(const char *var, const char *value, void *cb)
 		return 0;
 	}
 
-	return perf_default_config(var, value, cb);
+	return 0;
 }
 
 int bt_convert__perf2ctf(const char *input, const char *path, bool force)
diff --git a/tools/perf/util/help-unknown-cmd.c b/tools/perf/util/help-unknown-cmd.c
index dc1e41c..43a98a4 100644
--- a/tools/perf/util/help-unknown-cmd.c
+++ b/tools/perf/util/help-unknown-cmd.c
@@ -6,7 +6,8 @@
 static int autocorrect;
 static struct cmdnames aliases;
 
-static int perf_unknown_cmd_config(const char *var, const char *value, void *cb)
+static int perf_unknown_cmd_config(const char *var, const char *value,
+				   void *cb __maybe_unused)
 {
 	if (!strcmp(var, "help.autocorrect"))
 		autocorrect = perf_config_int(var,value);
@@ -14,7 +15,7 @@ static int perf_unknown_cmd_config(const char *var, const char *value, void *cb)
 	if (!prefixcmp(var, "alias."))
 		add_cmdname(&aliases, var + 6, strlen(var + 6));
 
-	return perf_default_config(var, value, cb);
+	return 0;
 }
 
 static int levenshtein_compare(const void *p1, const void *p2)

^ permalink raw reply related	[flat|nested] 60+ messages in thread

* [tip:perf/core] perf tools: Only set filter for tracepoints events
  2016-02-26  9:31 ` [PATCH 05/46] perf tools: Only set filter for tracepoints events Wang Nan
@ 2016-02-27  9:45   ` tip-bot for Wang Nan
  0 siblings, 0 replies; 60+ messages in thread
From: tip-bot for Wang Nan @ 2016-02-27  9:45 UTC (permalink / raw)
  To: linux-tip-commits
  Cc: tglx, hpa, peterz, ast, lizefan, mingo, wangnan0, jolsa, acme,
	linux-kernel

Commit-ID:  fdf14720fbd02d406ac2c1c50444774b4c7eed9a
Gitweb:     http://git.kernel.org/tip/fdf14720fbd02d406ac2c1c50444774b4c7eed9a
Author:     Wang Nan <wangnan0@huawei.com>
AuthorDate: Fri, 26 Feb 2016 09:31:53 +0000
Committer:  Arnaldo Carvalho de Melo <acme@redhat.com>
CommitDate: Fri, 26 Feb 2016 19:50:01 -0300

perf tools: Only set filter for tracepoints events

perf_evlist__set_filter() tries to set filter to every evsel linked in
the evlist. However, since filters can only be applied to tracepoints,
checking type of evsel before calling perf_evsel__set_filter() would be
better.

Signed-off-by: Wang Nan <wangnan0@huawei.com>
Cc: Alexei Starovoitov <ast@kernel.org>
Cc: Jiri Olsa <jolsa@kernel.org>
Cc: Li Zefan <lizefan@huawei.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: pi3orama@163.com
Link: http://lkml.kernel.org/r/1456479154-136027-6-git-send-email-wangnan0@huawei.com
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
---
 tools/perf/util/evlist.c | 3 +++
 1 file changed, 3 insertions(+)

diff --git a/tools/perf/util/evlist.c b/tools/perf/util/evlist.c
index c42e196..86a0383 100644
--- a/tools/perf/util/evlist.c
+++ b/tools/perf/util/evlist.c
@@ -1223,6 +1223,9 @@ int perf_evlist__set_filter(struct perf_evlist *evlist, const char *filter)
 	int err = 0;
 
 	evlist__for_each(evlist, evsel) {
+		if (evsel->attr.type != PERF_TYPE_TRACEPOINT)
+			continue;
+
 		err = perf_evsel__set_filter(evsel, filter);
 		if (err)
 			break;

^ permalink raw reply related	[flat|nested] 60+ messages in thread

* [tip:perf/core] perf trace: Call bpf__apply_obj_config in 'perf trace'
  2016-02-26  9:31 ` [PATCH 06/46] perf trace: Call bpf__apply_obj_config in 'perf trace' Wang Nan
@ 2016-02-27  9:45   ` tip-bot for Wang Nan
  0 siblings, 0 replies; 60+ messages in thread
From: tip-bot for Wang Nan @ 2016-02-27  9:45 UTC (permalink / raw)
  To: linux-tip-commits
  Cc: linux-kernel, wangnan0, lizefan, peterz, acme, mingo, ast, tglx,
	hpa, jolsa

Commit-ID:  ba50423530200659d4deb703a8f72d3b69bc13ce
Gitweb:     http://git.kernel.org/tip/ba50423530200659d4deb703a8f72d3b69bc13ce
Author:     Wang Nan <wangnan0@huawei.com>
AuthorDate: Fri, 26 Feb 2016 09:31:54 +0000
Committer:  Arnaldo Carvalho de Melo <acme@redhat.com>
CommitDate: Fri, 26 Feb 2016 19:50:40 -0300

perf trace: Call bpf__apply_obj_config in 'perf trace'

Without this patch BPF map configuration is not applied.

Command like this:
 # ./perf trace --ev bpf-output/no-inherit,name=evt/ \
                --ev ./test_bpf_trace.c/map:channel.event=evt/ \
                usleep 100000

Load BPF files without error, but since map:channel.event=evt is not
applied, bpf-output event not work.

This patch allows 'perf trace' load and run BPF scripts.

Signed-off-by: Wang Nan <wangnan0@huawei.com>
Cc: Alexei Starovoitov <ast@kernel.org>
Cc: Jiri Olsa <jolsa@kernel.org>
Cc: Li Zefan <lizefan@huawei.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: pi3orama@163.com
Link: http://lkml.kernel.org/r/1456479154-136027-7-git-send-email-wangnan0@huawei.com
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
---
 tools/perf/builtin-trace.c | 11 +++++++++++
 1 file changed, 11 insertions(+)

diff --git a/tools/perf/builtin-trace.c b/tools/perf/builtin-trace.c
index 20916dd..254149c 100644
--- a/tools/perf/builtin-trace.c
+++ b/tools/perf/builtin-trace.c
@@ -33,6 +33,7 @@
 #include "util/stat.h"
 #include "trace-event.h"
 #include "util/parse-events.h"
+#include "util/bpf-loader.h"
 
 #include <libaudit.h>
 #include <stdlib.h>
@@ -2586,6 +2587,16 @@ static int trace__run(struct trace *trace, int argc, const char **argv)
 	if (err < 0)
 		goto out_error_open;
 
+	err = bpf__apply_obj_config();
+	if (err) {
+		char errbuf[BUFSIZ];
+
+		bpf__strerror_apply_obj_config(err, errbuf, sizeof(errbuf));
+		pr_err("ERROR: Apply config to BPF failed: %s\n",
+			 errbuf);
+		goto out_error_open;
+	}
+
 	/*
 	 * Better not use !target__has_task() here because we need to cover the
 	 * case where no threads were specified in the command line, but a

^ permalink raw reply related	[flat|nested] 60+ messages in thread

* [tip:perf/core] perf trace: Print content of bpf-output event
  2016-02-26  9:31 ` [PATCH 07/46] perf trace: Print content of bpf-output event Wang Nan
@ 2016-02-27  9:45   ` tip-bot for Wang Nan
  0 siblings, 0 replies; 60+ messages in thread
From: tip-bot for Wang Nan @ 2016-02-27  9:45 UTC (permalink / raw)
  To: linux-tip-commits
  Cc: jolsa, linux-kernel, acme, ast, peterz, lizefan, tglx, wangnan0,
	hpa, mingo

Commit-ID:  1d6c9407d45dd622b277ca9f725da3cc9e95b5de
Gitweb:     http://git.kernel.org/tip/1d6c9407d45dd622b277ca9f725da3cc9e95b5de
Author:     Wang Nan <wangnan0@huawei.com>
AuthorDate: Fri, 26 Feb 2016 09:31:55 +0000
Committer:  Arnaldo Carvalho de Melo <acme@redhat.com>
CommitDate: Fri, 26 Feb 2016 19:57:07 -0300

perf trace: Print content of bpf-output event

With this patch the contend of BPF output event is printed by
'perf trace'. For example:

 # ./perf trace -a --ev bpf-output/no-inherit,name=evt/ \
                   --ev ./test_bpf_trace.c/map:channel.event=evt/ \
                   usleep 100000
  ...
    1.787 ( 0.004 ms): usleep/3832 nanosleep(rqtp: 0x7ffc78b18980                                        ) ...
    1.787 (         ): evt:Raise a BPF event!..)
    1.788 (         ): perf_bpf_probe:func_begin:(ffffffff810e97d0))
  ...
  101.866 (87.038 ms): gmain/1654 poll(ufds: 0x7f57a80008c0, nfds: 2, timeout_msecs: 1000               ) ...
  101.866 (         ): evt:Raise a BPF event!..)
  101.867 (         ): perf_bpf_probe:func_end:(ffffffff810e97d0 <- ffffffff81796173))
  101.869 (100.087 ms): usleep/3832  ... [continued]: nanosleep()) = 0
  ...

 (There is an extra ')' at the end of several lines. However, it is
  another problem, unrelated to this commit.)

Where test_bpf_trace.c is:

  /************************ BEGIN **************************/
  #include <uapi/linux/bpf.h>
  struct bpf_map_def {
        unsigned int type;
        unsigned int key_size;
        unsigned int value_size;
        unsigned int max_entries;
  };
  #define SEC(NAME) __attribute__((section(NAME), used))
  static u64 (*ktime_get_ns)(void) =
        (void *)BPF_FUNC_ktime_get_ns;
  static int (*trace_printk)(const char *fmt, int fmt_size, ...) =
        (void *)BPF_FUNC_trace_printk;
  static int (*get_smp_processor_id)(void) =
        (void *)BPF_FUNC_get_smp_processor_id;
  static int (*perf_event_output)(void *, struct bpf_map_def *, int, void *, unsigned long) =
        (void *)BPF_FUNC_perf_event_output;

  struct bpf_map_def SEC("maps") channel = {
        .type = BPF_MAP_TYPE_PERF_EVENT_ARRAY,
        .key_size = sizeof(int),
        .value_size = sizeof(u32),
        .max_entries = __NR_CPUS__,
  };

  static inline int __attribute__((always_inline))
  func(void *ctx, int type)
  {
	char output_str[] = "Raise a BPF event!";
	char err_str[] = "BAD %d\n";
	int err;

        err = perf_event_output(ctx, &channel, get_smp_processor_id(),
			        &output_str, sizeof(output_str));
	if (err)
		trace_printk(err_str, sizeof(err_str), err);
        return 1;
  }
  SEC("func_begin=sys_nanosleep")
  int func_begin(void *ctx) {return func(ctx, 1);}
  SEC("func_end=sys_nanosleep%return")
  int func_end(void *ctx) { return func(ctx, 2);}
  char _license[] SEC("license") = "GPL";
  int _version SEC("version") = LINUX_VERSION_CODE;
  /************************* END ***************************/

Signed-off-by: Wang Nan <wangnan0@huawei.com>
Cc: Alexei Starovoitov <ast@kernel.org>
Cc: Jiri Olsa <jolsa@kernel.org>
Cc: Li Zefan <lizefan@huawei.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: pi3orama@163.com
Link: http://lkml.kernel.org/r/1456479154-136027-8-git-send-email-wangnan0@huawei.com
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
---
 tools/perf/builtin-trace.c | 35 ++++++++++++++++++++++++++++++++++-
 1 file changed, 34 insertions(+), 1 deletion(-)

diff --git a/tools/perf/builtin-trace.c b/tools/perf/builtin-trace.c
index 254149c..26a337f 100644
--- a/tools/perf/builtin-trace.c
+++ b/tools/perf/builtin-trace.c
@@ -2178,6 +2178,37 @@ out_dump:
 	return 0;
 }
 
+static void bpf_output__printer(enum binary_printer_ops op,
+				unsigned int val, void *extra)
+{
+	FILE *output = extra;
+	unsigned char ch = (unsigned char)val;
+
+	switch (op) {
+	case BINARY_PRINT_CHAR_DATA:
+		fprintf(output, "%c", isprint(ch) ? ch : '.');
+		break;
+	case BINARY_PRINT_DATA_BEGIN:
+	case BINARY_PRINT_LINE_BEGIN:
+	case BINARY_PRINT_ADDR:
+	case BINARY_PRINT_NUM_DATA:
+	case BINARY_PRINT_NUM_PAD:
+	case BINARY_PRINT_SEP:
+	case BINARY_PRINT_CHAR_PAD:
+	case BINARY_PRINT_LINE_END:
+	case BINARY_PRINT_DATA_END:
+	default:
+		break;
+	}
+}
+
+static void bpf_output__fprintf(struct trace *trace,
+				struct perf_sample *sample)
+{
+	print_binary(sample->raw_data, sample->raw_size, 8,
+		     bpf_output__printer, trace->output);
+}
+
 static int trace__event_handler(struct trace *trace, struct perf_evsel *evsel,
 				union perf_event *event __maybe_unused,
 				struct perf_sample *sample)
@@ -2190,7 +2221,9 @@ static int trace__event_handler(struct trace *trace, struct perf_evsel *evsel,
 
 	fprintf(trace->output, "%s:", evsel->name);
 
-	if (evsel->tp_format) {
+	if (perf_evsel__is_bpf_output(evsel)) {
+		bpf_output__fprintf(trace, sample);
+	} else if (evsel->tp_format) {
 		event_format__fprintf(evsel->tp_format, sample->cpu,
 				      sample->raw_data, sample->raw_size,
 				      trace->output);

^ permalink raw reply related	[flat|nested] 60+ messages in thread

* Re: [PATCH 10/46] perf core: Introduce new ioctl options to pause and resume ring buffer
  2016-02-26  9:31 ` [PATCH 10/46] perf core: Introduce new ioctl options to pause and resume ring buffer Wang Nan
@ 2016-02-29 15:39   ` Arnaldo Carvalho de Melo
  2016-03-03  2:03     ` Wangnan (F)
  0 siblings, 1 reply; 60+ messages in thread
From: Arnaldo Carvalho de Melo @ 2016-02-29 15:39 UTC (permalink / raw)
  To: Peter Zijlstra
  Cc: Wang Nan, Alexei Starovoitov, Arnaldo Carvalho de Melo,
	Jiri Olsa, Li Zefan, pi3orama, linux-kernel, He Kuang,
	Brendan Gregg, Masami Hiramatsu, Namhyung Kim

Em Fri, Feb 26, 2016 at 09:31:58AM +0000, Wang Nan escreveu:
> Add new ioctl() to pause/resume ring-buffer output.
> 
> In some situations we want to read from ring buffer only when we
> ensure nothing can write to the ring buffer during reading. Without
> this patch we have to turn off all events attached to this ring buffer
> to achieve this.
> 
> This patch is for supporting overwrite ring buffer. Following
> commits will introduce new methods support reading from overwrite ring
> buffer. Before reading caller must ensure the ring buffer is frozen, or
> the reading is unreliable.

Peter, have you have the chance too look at this and the other kernel
bits in this kit?

- Arnaldo
 
> Signed-off-by: Wang Nan <wangnan0@huawei.com>
> Cc: He Kuang <hekuang@huawei.com>
> Cc: Alexei Starovoitov <ast@kernel.org>
> Cc: Arnaldo Carvalho de Melo <acme@redhat.com>
> Cc: Brendan Gregg <brendan.d.gregg@gmail.com>
> Cc: Jiri Olsa <jolsa@kernel.org>
> Cc: Masami Hiramatsu <masami.hiramatsu.pt@hitachi.com>
> Cc: Namhyung Kim <namhyung@kernel.org>
> Cc: Peter Zijlstra <peterz@infradead.org>
> Cc: Zefan Li <lizefan@huawei.com>
> Cc: pi3orama@163.com
> ---
>  include/uapi/linux/perf_event.h |  1 +
>  kernel/events/core.c            | 13 +++++++++++++
>  kernel/events/internal.h        | 11 +++++++++++
>  kernel/events/ring_buffer.c     |  7 ++++++-
>  4 files changed, 31 insertions(+), 1 deletion(-)
> 
> diff --git a/include/uapi/linux/perf_event.h b/include/uapi/linux/perf_event.h
> index 1afe962..a3c1903 100644
> --- a/include/uapi/linux/perf_event.h
> +++ b/include/uapi/linux/perf_event.h
> @@ -401,6 +401,7 @@ struct perf_event_attr {
>  #define PERF_EVENT_IOC_SET_FILTER	_IOW('$', 6, char *)
>  #define PERF_EVENT_IOC_ID		_IOR('$', 7, __u64 *)
>  #define PERF_EVENT_IOC_SET_BPF		_IOW('$', 8, __u32)
> +#define PERF_EVENT_IOC_PAUSE_OUTPUT	_IOW('$', 9, __u32)
>  
>  enum perf_event_ioc_flags {
>  	PERF_IOC_FLAG_GROUP		= 1U << 0,
> diff --git a/kernel/events/core.c b/kernel/events/core.c
> index 94c47e3..a7075ae 100644
> --- a/kernel/events/core.c
> +++ b/kernel/events/core.c
> @@ -4231,6 +4231,19 @@ static long _perf_ioctl(struct perf_event *event, unsigned int cmd, unsigned lon
>  	case PERF_EVENT_IOC_SET_BPF:
>  		return perf_event_set_bpf_prog(event, arg);
>  
> +	case PERF_EVENT_IOC_PAUSE_OUTPUT: {
> +		struct ring_buffer *rb;
> +
> +		rcu_read_lock();
> +		rb = rcu_dereference(event->rb);
> +		if (!event->rb) {
> +			rcu_read_unlock();
> +			return -EINVAL;
> +		}
> +		rb_toggle_paused(rb, !!arg);
> +		rcu_read_unlock();
> +		return 0;
> +	}
>  	default:
>  		return -ENOTTY;
>  	}
> diff --git a/kernel/events/internal.h b/kernel/events/internal.h
> index 2bbad9c..6a93d1b 100644
> --- a/kernel/events/internal.h
> +++ b/kernel/events/internal.h
> @@ -18,6 +18,7 @@ struct ring_buffer {
>  #endif
>  	int				nr_pages;	/* nr of data pages  */
>  	int				overwrite;	/* can overwrite itself */
> +	int				paused;		/* can write into ring buffer */
>  
>  	atomic_t			poll;		/* POLL_ for wakeups */
>  
> @@ -65,6 +66,16 @@ static inline void rb_free_rcu(struct rcu_head *rcu_head)
>  	rb_free(rb);
>  }
>  
> +static inline void
> +rb_toggle_paused(struct ring_buffer *rb,
> +		 bool pause)
> +{
> +	if (!pause && rb->nr_pages)
> +		rb->paused = 0;
> +	else
> +		rb->paused = 1;
> +}
> +
>  extern struct ring_buffer *
>  rb_alloc(int nr_pages, long watermark, int cpu, int flags);
>  extern void perf_event_wakeup(struct perf_event *event);
> diff --git a/kernel/events/ring_buffer.c b/kernel/events/ring_buffer.c
> index 1faad2c..22e1a47 100644
> --- a/kernel/events/ring_buffer.c
> +++ b/kernel/events/ring_buffer.c
> @@ -125,8 +125,11 @@ int perf_output_begin(struct perf_output_handle *handle,
>  	if (unlikely(!rb))
>  		goto out;
>  
> -	if (unlikely(!rb->nr_pages))
> +	if (unlikely(rb->paused)) {
> +		if (rb->nr_pages)
> +			local_inc(&rb->lost);
>  		goto out;
> +	}
>  
>  	handle->rb    = rb;
>  	handle->event = event;
> @@ -244,6 +247,8 @@ ring_buffer_init(struct ring_buffer *rb, long watermark, int flags)
>  	INIT_LIST_HEAD(&rb->event_list);
>  	spin_lock_init(&rb->event_lock);
>  	init_irq_work(&rb->irq_work, rb_irq_work);
> +
> +	rb->paused = rb->nr_pages ? 0 : 1;
>  }
>  
>  static void ring_buffer_put_async(struct ring_buffer *rb)
> -- 
> 1.8.3.4

^ permalink raw reply	[flat|nested] 60+ messages in thread

* Re: [PATCH 10/46] perf core: Introduce new ioctl options to pause and resume ring buffer
  2016-02-29 15:39   ` Arnaldo Carvalho de Melo
@ 2016-03-03  2:03     ` Wangnan (F)
  0 siblings, 0 replies; 60+ messages in thread
From: Wangnan (F) @ 2016-03-03  2:03 UTC (permalink / raw)
  To: Peter Zijlstra
  Cc: Arnaldo Carvalho de Melo, Alexei Starovoitov,
	Arnaldo Carvalho de Melo, Jiri Olsa, Li Zefan, pi3orama,
	linux-kernel, He Kuang, Brendan Gregg, Masami Hiramatsu,
	Namhyung Kim

Hi Peter,

  Patch 10/46 to 14/46 were sent separately to you and modified
follow your suggestion. Do you have further comment on it?

Thank you.

On 2016/2/29 23:39, Arnaldo Carvalho de Melo wrote:
> Em Fri, Feb 26, 2016 at 09:31:58AM +0000, Wang Nan escreveu:
>> Add new ioctl() to pause/resume ring-buffer output.
>>
>> In some situations we want to read from ring buffer only when we
>> ensure nothing can write to the ring buffer during reading. Without
>> this patch we have to turn off all events attached to this ring buffer
>> to achieve this.
>>
>> This patch is for supporting overwrite ring buffer. Following
>> commits will introduce new methods support reading from overwrite ring
>> buffer. Before reading caller must ensure the ring buffer is frozen, or
>> the reading is unreliable.
> Peter, have you have the chance too look at this and the other kernel
> bits in this kit?
>
> - Arnaldo
>   
>> Signed-off-by: Wang Nan <wangnan0@huawei.com>
>> Cc: He Kuang <hekuang@huawei.com>
>> Cc: Alexei Starovoitov <ast@kernel.org>
>> Cc: Arnaldo Carvalho de Melo <acme@redhat.com>
>> Cc: Brendan Gregg <brendan.d.gregg@gmail.com>
>> Cc: Jiri Olsa <jolsa@kernel.org>
>> Cc: Masami Hiramatsu <masami.hiramatsu.pt@hitachi.com>
>> Cc: Namhyung Kim <namhyung@kernel.org>
>> Cc: Peter Zijlstra <peterz@infradead.org>
>> Cc: Zefan Li <lizefan@huawei.com>
>> Cc: pi3orama@163.com
>> ---
>>   include/uapi/linux/perf_event.h |  1 +
>>   kernel/events/core.c            | 13 +++++++++++++
>>   kernel/events/internal.h        | 11 +++++++++++
>>   kernel/events/ring_buffer.c     |  7 ++++++-
>>   4 files changed, 31 insertions(+), 1 deletion(-)
>>
>> diff --git a/include/uapi/linux/perf_event.h b/include/uapi/linux/perf_event.h
>> index 1afe962..a3c1903 100644
>> --- a/include/uapi/linux/perf_event.h
>> +++ b/include/uapi/linux/perf_event.h
>> @@ -401,6 +401,7 @@ struct perf_event_attr {
>>   #define PERF_EVENT_IOC_SET_FILTER	_IOW('$', 6, char *)
>>   #define PERF_EVENT_IOC_ID		_IOR('$', 7, __u64 *)
>>   #define PERF_EVENT_IOC_SET_BPF		_IOW('$', 8, __u32)
>> +#define PERF_EVENT_IOC_PAUSE_OUTPUT	_IOW('$', 9, __u32)
>>   
>>   enum perf_event_ioc_flags {
>>   	PERF_IOC_FLAG_GROUP		= 1U << 0,
>> diff --git a/kernel/events/core.c b/kernel/events/core.c
>> index 94c47e3..a7075ae 100644
>> --- a/kernel/events/core.c
>> +++ b/kernel/events/core.c
>> @@ -4231,6 +4231,19 @@ static long _perf_ioctl(struct perf_event *event, unsigned int cmd, unsigned lon
>>   	case PERF_EVENT_IOC_SET_BPF:
>>   		return perf_event_set_bpf_prog(event, arg);
>>   
>> +	case PERF_EVENT_IOC_PAUSE_OUTPUT: {
>> +		struct ring_buffer *rb;
>> +
>> +		rcu_read_lock();
>> +		rb = rcu_dereference(event->rb);
>> +		if (!event->rb) {
>> +			rcu_read_unlock();
>> +			return -EINVAL;
>> +		}
>> +		rb_toggle_paused(rb, !!arg);
>> +		rcu_read_unlock();
>> +		return 0;
>> +	}
>>   	default:
>>   		return -ENOTTY;
>>   	}
>> diff --git a/kernel/events/internal.h b/kernel/events/internal.h
>> index 2bbad9c..6a93d1b 100644
>> --- a/kernel/events/internal.h
>> +++ b/kernel/events/internal.h
>> @@ -18,6 +18,7 @@ struct ring_buffer {
>>   #endif
>>   	int				nr_pages;	/* nr of data pages  */
>>   	int				overwrite;	/* can overwrite itself */
>> +	int				paused;		/* can write into ring buffer */
>>   
>>   	atomic_t			poll;		/* POLL_ for wakeups */
>>   
>> @@ -65,6 +66,16 @@ static inline void rb_free_rcu(struct rcu_head *rcu_head)
>>   	rb_free(rb);
>>   }
>>   
>> +static inline void
>> +rb_toggle_paused(struct ring_buffer *rb,
>> +		 bool pause)
>> +{
>> +	if (!pause && rb->nr_pages)
>> +		rb->paused = 0;
>> +	else
>> +		rb->paused = 1;
>> +}
>> +
>>   extern struct ring_buffer *
>>   rb_alloc(int nr_pages, long watermark, int cpu, int flags);
>>   extern void perf_event_wakeup(struct perf_event *event);
>> diff --git a/kernel/events/ring_buffer.c b/kernel/events/ring_buffer.c
>> index 1faad2c..22e1a47 100644
>> --- a/kernel/events/ring_buffer.c
>> +++ b/kernel/events/ring_buffer.c
>> @@ -125,8 +125,11 @@ int perf_output_begin(struct perf_output_handle *handle,
>>   	if (unlikely(!rb))
>>   		goto out;
>>   
>> -	if (unlikely(!rb->nr_pages))
>> +	if (unlikely(rb->paused)) {
>> +		if (rb->nr_pages)
>> +			local_inc(&rb->lost);
>>   		goto out;
>> +	}
>>   
>>   	handle->rb    = rb;
>>   	handle->event = event;
>> @@ -244,6 +247,8 @@ ring_buffer_init(struct ring_buffer *rb, long watermark, int flags)
>>   	INIT_LIST_HEAD(&rb->event_list);
>>   	spin_lock_init(&rb->event_lock);
>>   	init_irq_work(&rb->irq_work, rb_irq_work);
>> +
>> +	rb->paused = rb->nr_pages ? 0 : 1;
>>   }
>>   
>>   static void ring_buffer_put_async(struct ring_buffer *rb)
>> -- 
>> 1.8.3.4

^ permalink raw reply	[flat|nested] 60+ messages in thread

* [tip:perf/core] perf data: Support converting data from bpf_perf_event_output()
  2016-02-26  9:31 ` [PATCH 08/46] perf data: Support converting data from bpf_perf_event_output() Wang Nan
@ 2016-03-05  8:15   ` tip-bot for Wang Nan
  0 siblings, 0 replies; 60+ messages in thread
From: tip-bot for Wang Nan @ 2016-03-05  8:15 UTC (permalink / raw)
  To: linux-tip-commits
  Cc: tglx, lizefan, hpa, masami.hiramatsu.pt, ast, brendan.d.gregg,
	mingo, jolsa, acme, wangnan0, linux-kernel, peterz, namhyung

Commit-ID:  6122d57e9f7c6cb0f0aa276fbd3a12e3af826ef2
Gitweb:     http://git.kernel.org/tip/6122d57e9f7c6cb0f0aa276fbd3a12e3af826ef2
Author:     Wang Nan <wangnan0@huawei.com>
AuthorDate: Fri, 26 Feb 2016 09:31:56 +0000
Committer:  Arnaldo Carvalho de Melo <acme@redhat.com>
CommitDate: Thu, 3 Mar 2016 11:10:34 -0300

perf data: Support converting data from bpf_perf_event_output()

bpf_perf_event_output() outputs data through sample->raw_data. This
patch adds support to convert those data into CTF. A python script then
can be used to process output data from BPF programs.

Test result:

  # cat ./test_bpf_output_2.c
  /************************ BEGIN **************************/
  #include <uapi/linux/bpf.h>
  struct bpf_map_def {
 	unsigned int type;
 	unsigned int key_size;
 	unsigned int value_size;
 	unsigned int max_entries;
  };
  #define SEC(NAME) __attribute__((section(NAME), used))
  static u64 (*ktime_get_ns)(void) =
 	(void *)BPF_FUNC_ktime_get_ns;
  static int (*trace_printk)(const char *fmt, int fmt_size, ...) =
 	(void *)BPF_FUNC_trace_printk;
  static int (*get_smp_processor_id)(void) =
 	(void *)BPF_FUNC_get_smp_processor_id;
  static int (*perf_event_output)(void *, struct bpf_map_def *, int, void *, unsigned long) =
 	(void *)BPF_FUNC_perf_event_output;

  struct bpf_map_def SEC("maps") channel = {
 	.type = BPF_MAP_TYPE_PERF_EVENT_ARRAY,
 	.key_size = sizeof(int),
 	.value_size = sizeof(u32),
 	.max_entries = __NR_CPUS__,
  };

  static inline int __attribute__((always_inline))
  func(void *ctx, int type)
  {
 	struct {
 		u64 ktime;
 		int type;
 	} __attribute__((packed)) output_data;
 	char error_data[] = "Error: failed to output\n";
 	int err;

 	output_data.type = type;
 	output_data.ktime = ktime_get_ns();
 	err = perf_event_output(ctx, &channel, get_smp_processor_id(),
 				&output_data, sizeof(output_data));
 	if (err)
 		trace_printk(error_data, sizeof(error_data));
 	return 0;
  }
  SEC("func_begin=sys_nanosleep")
  int func_begin(void *ctx) {return func(ctx, 1);}
  SEC("func_end=sys_nanosleep%return")
  int func_end(void *ctx) { return func(ctx, 2);}
  char _license[] SEC("license") = "GPL";
  int _version SEC("version") = LINUX_VERSION_CODE;
  /************************* END ***************************/

  # ./perf record -e bpf-output/no-inherit,name=evt/ \
                 -e ./test_bpf_output_2.c/map:channel.event=evt/ \
                 usleep 100000
  [ perf record: Woken up 1 times to write data ]
  [ perf record: Captured and wrote 0.012 MB perf.data (2 samples) ]

  # ./perf script
          usleep 14942 92503.198504: evt:  ffffffff810e0ba1 sys_nanosleep (/lib/modules/4.3.0....
          usleep 14942 92503.298562: evt:  ffffffff810585e9 kretprobe_trampoline_holder (/lib....

  # ./perf data convert --to-ctf ./out.ctf
  [ perf data convert: Converted 'perf.data' into CTF data './out.ctf' ]
  [ perf data convert: Converted and wrote 0.000 MB (2 samples) ]

  # babeltrace ./out.ctf
  [01:41:43.198504134] (+?.?????????) evt: { cpu_id = 0 }, { perf_ip = 0xFFFFFFFF810E0BA1, perf_tid = 14942, perf_pid = 14942, perf_id = 1044, raw_len = 3, raw_data = [ [0] = 0x32C0C07B, [1] = 0x5421, [2] = 0x1 ] }
  [01:41:43.298562257] (+0.100058123) evt: { cpu_id = 0 }, { perf_ip = 0xFFFFFFFF810585E9, perf_tid = 14942, perf_pid = 14942, perf_id = 1044, raw_len = 3, raw_data = [ [0] = 0x38B77FAA, [1] = 0x5421, [2] = 0x2 ] }

  # cat ./test_bpf_output_2.py
  from babeltrace import TraceCollection
  tc = TraceCollection()
  tc.add_trace('./out.ctf', 'ctf')
  d = {1:[], 2:[]}
  for event in tc.events:
     if not event.name.startswith('evt'):
         continue
     raw_data = event['raw_data']
     (time, type) = ((raw_data[0] + (raw_data[1] << 32)), raw_data[2])
     d[type].append(time)
  print(list(map(lambda i: d[2][i] - d[1][i], range(len(d[1])))));

  # python3 ./test_bpf_output_2.py
  [100056879]

Committer note:

Make sure you have python3-devel installed, not python-devel, which may
be for python2, which will lead to some "PyInstance_Type" errors. Also
make sure that you use the right libbabeltrace, because it is shipped
in Fedora, for instance, but an older version.

To build libbabeltrace's python binding one also needs to use:

 ./configure --enable-python-bindings

And then set PYTHONPATH=/usr/local/lib64/python3.4/site-packages/.

Signed-off-by: Wang Nan <wangnan0@huawei.com>
Tested-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Acked-by: Jiri Olsa <jolsa@kernel.org>
Cc: Alexei Starovoitov <ast@kernel.org>
Cc: Brendan Gregg <brendan.d.gregg@gmail.com>
Cc: Li Zefan <lizefan@huawei.com>
Cc: Masami Hiramatsu <masami.hiramatsu.pt@hitachi.com>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Zefan Li <lizefan@huawei.com>
Cc: pi3orama@163.com
Link: http://lkml.kernel.org/r/1456479154-136027-9-git-send-email-wangnan0@huawei.com
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
---
 tools/perf/util/data-convert-bt.c | 112 +++++++++++++++++++++++++++++++++++++-
 1 file changed, 111 insertions(+), 1 deletion(-)

diff --git a/tools/perf/util/data-convert-bt.c b/tools/perf/util/data-convert-bt.c
index 6729f4d..1f608a6 100644
--- a/tools/perf/util/data-convert-bt.c
+++ b/tools/perf/util/data-convert-bt.c
@@ -352,6 +352,84 @@ static int add_tracepoint_values(struct ctf_writer *cw,
 	return ret;
 }
 
+static int
+add_bpf_output_values(struct bt_ctf_event_class *event_class,
+		      struct bt_ctf_event *event,
+		      struct perf_sample *sample)
+{
+	struct bt_ctf_field_type *len_type, *seq_type;
+	struct bt_ctf_field *len_field, *seq_field;
+	unsigned int raw_size = sample->raw_size;
+	unsigned int nr_elements = raw_size / sizeof(u32);
+	unsigned int i;
+	int ret;
+
+	if (nr_elements * sizeof(u32) != raw_size)
+		pr_warning("Incorrect raw_size (%u) in bpf output event, skip %lu bytes\n",
+			   raw_size, nr_elements * sizeof(u32) - raw_size);
+
+	len_type = bt_ctf_event_class_get_field_by_name(event_class, "raw_len");
+	len_field = bt_ctf_field_create(len_type);
+	if (!len_field) {
+		pr_err("failed to create 'raw_len' for bpf output event\n");
+		ret = -1;
+		goto put_len_type;
+	}
+
+	ret = bt_ctf_field_unsigned_integer_set_value(len_field, nr_elements);
+	if (ret) {
+		pr_err("failed to set field value for raw_len\n");
+		goto put_len_field;
+	}
+	ret = bt_ctf_event_set_payload(event, "raw_len", len_field);
+	if (ret) {
+		pr_err("failed to set payload to raw_len\n");
+		goto put_len_field;
+	}
+
+	seq_type = bt_ctf_event_class_get_field_by_name(event_class, "raw_data");
+	seq_field = bt_ctf_field_create(seq_type);
+	if (!seq_field) {
+		pr_err("failed to create 'raw_data' for bpf output event\n");
+		ret = -1;
+		goto put_seq_type;
+	}
+
+	ret = bt_ctf_field_sequence_set_length(seq_field, len_field);
+	if (ret) {
+		pr_err("failed to set length of 'raw_data'\n");
+		goto put_seq_field;
+	}
+
+	for (i = 0; i < nr_elements; i++) {
+		struct bt_ctf_field *elem_field =
+			bt_ctf_field_sequence_get_field(seq_field, i);
+
+		ret = bt_ctf_field_unsigned_integer_set_value(elem_field,
+				((u32 *)(sample->raw_data))[i]);
+
+		bt_ctf_field_put(elem_field);
+		if (ret) {
+			pr_err("failed to set raw_data[%d]\n", i);
+			goto put_seq_field;
+		}
+	}
+
+	ret = bt_ctf_event_set_payload(event, "raw_data", seq_field);
+	if (ret)
+		pr_err("failed to set payload for raw_data\n");
+
+put_seq_field:
+	bt_ctf_field_put(seq_field);
+put_seq_type:
+	bt_ctf_field_type_put(seq_type);
+put_len_field:
+	bt_ctf_field_put(len_field);
+put_len_type:
+	bt_ctf_field_type_put(len_type);
+	return ret;
+}
+
 static int add_generic_values(struct ctf_writer *cw,
 			      struct bt_ctf_event *event,
 			      struct perf_evsel *evsel,
@@ -597,6 +675,12 @@ static int process_sample_event(struct perf_tool *tool,
 			return -1;
 	}
 
+	if (perf_evsel__is_bpf_output(evsel)) {
+		ret = add_bpf_output_values(event_class, event, sample);
+		if (ret)
+			return -1;
+	}
+
 	cs = ctf_stream(cw, get_sample_cpu(cw, sample, evsel));
 	if (cs) {
 		if (is_flush_needed(cs))
@@ -744,6 +828,25 @@ static int add_tracepoint_types(struct ctf_writer *cw,
 	return ret;
 }
 
+static int add_bpf_output_types(struct ctf_writer *cw,
+				struct bt_ctf_event_class *class)
+{
+	struct bt_ctf_field_type *len_type = cw->data.u32;
+	struct bt_ctf_field_type *seq_base_type = cw->data.u32_hex;
+	struct bt_ctf_field_type *seq_type;
+	int ret;
+
+	ret = bt_ctf_event_class_add_field(class, len_type, "raw_len");
+	if (ret)
+		return ret;
+
+	seq_type = bt_ctf_field_type_sequence_create(seq_base_type, "raw_len");
+	if (!seq_type)
+		return -1;
+
+	return bt_ctf_event_class_add_field(class, seq_type, "raw_data");
+}
+
 static int add_generic_types(struct ctf_writer *cw, struct perf_evsel *evsel,
 			     struct bt_ctf_event_class *event_class)
 {
@@ -755,7 +858,8 @@ static int add_generic_types(struct ctf_writer *cw, struct perf_evsel *evsel,
 	 *                              ctf event header
 	 *   PERF_SAMPLE_READ         - TODO
 	 *   PERF_SAMPLE_CALLCHAIN    - TODO
-	 *   PERF_SAMPLE_RAW          - tracepoint fields are handled separately
+	 *   PERF_SAMPLE_RAW          - tracepoint fields and BPF output
+	 *                              are handled separately
 	 *   PERF_SAMPLE_BRANCH_STACK - TODO
 	 *   PERF_SAMPLE_REGS_USER    - TODO
 	 *   PERF_SAMPLE_STACK_USER   - TODO
@@ -824,6 +928,12 @@ static int add_event(struct ctf_writer *cw, struct perf_evsel *evsel)
 			goto err;
 	}
 
+	if (perf_evsel__is_bpf_output(evsel)) {
+		ret = add_bpf_output_types(cw, event_class);
+		if (ret)
+			goto err;
+	}
+
 	ret = bt_ctf_stream_class_add_event_class(cw->stream_class, event_class);
 	if (ret) {
 		pr("Failed to add event class into stream.\n");

^ permalink raw reply related	[flat|nested] 60+ messages in thread

* [tip:perf/core] perf data: Explicitly set byte order for integer types
  2016-02-26  9:31 ` [PATCH 09/46] perf data: Explicitly set byte order for integer types Wang Nan
@ 2016-03-05  8:15   ` tip-bot for Wang Nan
  0 siblings, 0 replies; 60+ messages in thread
From: tip-bot for Wang Nan @ 2016-03-05  8:15 UTC (permalink / raw)
  To: linux-tip-commits
  Cc: lizefan, acme, brendan.d.gregg, namhyung, wangnan0, mingo, ast,
	tglx, linux-kernel, peterz, jolsa, masami.hiramatsu.pt, hpa,
	jeremie.galarneau

Commit-ID:  f8dd2d5ff953bc498d682ae8022439c940a7d5c4
Gitweb:     http://git.kernel.org/tip/f8dd2d5ff953bc498d682ae8022439c940a7d5c4
Author:     Wang Nan <wangnan0@huawei.com>
AuthorDate: Fri, 26 Feb 2016 09:31:57 +0000
Committer:  Arnaldo Carvalho de Melo <acme@redhat.com>
CommitDate: Thu, 3 Mar 2016 11:10:34 -0300

perf data: Explicitly set byte order for integer types

After babeltrace commit 5cec03e402aa ("ir: copy variants and sequences
when setting a field path"), 'perf data convert' gets incorrect result
if there's bpf output data. For example:

 # perf data convert --to-ctf ./out.ctf
 # babeltrace ./out.ctf
 [10:44:31.186045346] (+?.?????????) evt: { cpu_id = 0 }, { perf_ip = 0xFFFFFFFF810E7DD1, perf_tid = 23819, perf_pid = 23819, perf_id = 518, raw_len = 3, raw_data = [ [0] = 0xC028E32F, [1] = 0x815D0100, [2] = 0x1000000 ] }
 [10:44:31.286101003] (+0.100055657) evt: { cpu_id = 0 }, { perf_ip = 0xFFFFFFFF8105B609, perf_tid = 23819, perf_pid = 23819, perf_id = 518, raw_len = 3, raw_data = [ [0] = 0x35D9F1EB, [1] = 0x15D81, [2] = 0x2 ] }

The expected result of the first sample should be:

 raw_data = [ [0] = 0x2FE328C0, [1] = 0x15D81, [2] = 0x1 ] }

however, 'perf data convert' output big endian value to resuling CTF
file.

The reason is a internal change (or a bug?) of babeltrace.

Before this patch, at the first add_bpf_output_values(), byte order of
all integer type is uncertain (is 0, neither 1234 (le) nor 4321 (be)).
It would be fixed by:

perf_evlist__deliver_sample
 -> process_sample_event
   -> ctf_stream
      ...
      ->bt_ctf_trace_add_stream_class
        ->bt_ctf_field_type_structure_set_byte_order
          ->bt_ctf_field_type_integer_set_byte_order

during creating the stream.

However, the babeltrace commit mentioned above duplicates types in
sequence to prevent potential conflict in following call stack and link
the newly allocated type into the 'raw_data' sequence:

perf_evlist__deliver_sample
 -> process_sample_event
   -> ctf_stream
      ...
      -> bt_ctf_trace_add_stream_class
        -> bt_ctf_stream_class_resolve_types
           ...
           -> bt_ctf_field_type_sequence_copy
             ->bt_ctf_field_type_integer_copy

This happens before byte order setting, so only the newly allocated
type is initialized, the byte order of original type perf choose to
create the first raw_data is still uncertain.

Byte order in CTF output is not related to byte order in perf.data.
Setting it to anything other than BT_CTF_BYTE_ORDER_NATIVE solves this
problem (only BT_CTF_BYTE_ORDER_NATIVE needs to be fixed). To reduce
behavior changing, set byte order according to compiling options.

Signed-off-by: Wang Nan <wangnan0@huawei.com>
Cc: Jeremie Galarneau <jeremie.galarneau@efficios.com>
Cc: Alexei Starovoitov <ast@kernel.org>
Cc: Brendan Gregg <brendan.d.gregg@gmail.com>
Cc: Jiri Olsa <jolsa@kernel.org>
Cc: Jérémie Galarneau <jeremie.galarneau@efficios.com>
Cc: Li Zefan <lizefan@huawei.com>
Cc: Masami Hiramatsu <masami.hiramatsu.pt@hitachi.com>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Zefan Li <lizefan@huawei.com>
Cc: pi3orama@163.com
Link: http://lkml.kernel.org/r/1456479154-136027-10-git-send-email-wangnan0@huawei.com
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
---
 tools/perf/util/data-convert-bt.c | 6 ++++++
 1 file changed, 6 insertions(+)

diff --git a/tools/perf/util/data-convert-bt.c b/tools/perf/util/data-convert-bt.c
index 1f608a6..811af89 100644
--- a/tools/perf/util/data-convert-bt.c
+++ b/tools/perf/util/data-convert-bt.c
@@ -1080,6 +1080,12 @@ static struct bt_ctf_field_type *create_int_type(int size, bool sign, bool hex)
 	    bt_ctf_field_type_integer_set_base(type, BT_CTF_INTEGER_BASE_HEXADECIMAL))
 		goto err;
 
+#if __BYTE_ORDER == __BIG_ENDIAN
+	bt_ctf_field_type_set_byte_order(type, BT_CTF_BYTE_ORDER_BIG_ENDIAN);
+#else
+	bt_ctf_field_type_set_byte_order(type, BT_CTF_BYTE_ORDER_LITTLE_ENDIAN);
+#endif
+
 	pr2("Created type: INTEGER %d-bit %ssigned %s\n",
 	    size, sign ? "un" : "", hex ? "hex" : "");
 	return type;

^ permalink raw reply related	[flat|nested] 60+ messages in thread

* [tip:perf/core] perf record: Use WARN_ONCE to replace 'if' condition
  2016-02-26  9:32 ` [PATCH 18/46] perf record: Use WARN_ONCE to replace 'if' condition Wang Nan
@ 2016-03-05  8:15   ` tip-bot for Wang Nan
  0 siblings, 0 replies; 60+ messages in thread
From: tip-bot for Wang Nan @ 2016-03-05  8:15 UTC (permalink / raw)
  To: linux-tip-commits
  Cc: linux-kernel, lizefan, ast, mingo, jolsa, acme, hekuang,
	namhyung, tglx, masami.hiramatsu.pt, wangnan0, hpa, peterz

Commit-ID:  d8871ea71281ed689dc3303d1b50eb00c5d06141
Gitweb:     http://git.kernel.org/tip/d8871ea71281ed689dc3303d1b50eb00c5d06141
Author:     Wang Nan <wangnan0@huawei.com>
AuthorDate: Fri, 26 Feb 2016 09:32:06 +0000
Committer:  Arnaldo Carvalho de Melo <acme@redhat.com>
CommitDate: Thu, 3 Mar 2016 11:10:34 -0300

perf record: Use WARN_ONCE to replace 'if' condition

Commits in a BPF patchkit will extract kernel and module synthesizing
code into a separated function and call it multiple times. This patch
replace 'if (err < 0)' using WARN_ONCE, makes sure the error message
show one time.

Signed-off-by: Wang Nan <wangnan0@huawei.com>
Cc: Alexei Starovoitov <ast@kernel.org>
Cc: He Kuang <hekuang@huawei.com>
Cc: Jiri Olsa <jolsa@kernel.org>
Cc: Li Zefan <lizefan@huawei.com>
Cc: Masami Hiramatsu <masami.hiramatsu.pt@hitachi.com>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Zefan Li <lizefan@huawei.com>
Cc: pi3orama@163.com
Link: http://lkml.kernel.org/r/1456479154-136027-19-git-send-email-wangnan0@huawei.com
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
---
 tools/perf/builtin-record.c | 15 +++++++--------
 1 file changed, 7 insertions(+), 8 deletions(-)

diff --git a/tools/perf/builtin-record.c b/tools/perf/builtin-record.c
index 7d11162..9dec7e5 100644
--- a/tools/perf/builtin-record.c
+++ b/tools/perf/builtin-record.c
@@ -33,6 +33,7 @@
 #include "util/parse-regs-options.h"
 #include "util/llvm-utils.h"
 #include "util/bpf-loader.h"
+#include "asm/bug.h"
 
 #include <unistd.h>
 #include <sched.h>
@@ -615,17 +616,15 @@ static int __cmd_record(struct record *rec, int argc, const char **argv)
 
 	err = perf_event__synthesize_kernel_mmap(tool, process_synthesized_event,
 						 machine);
-	if (err < 0)
-		pr_err("Couldn't record kernel reference relocation symbol\n"
-		       "Symbol resolution may be skewed if relocation was used (e.g. kexec).\n"
-		       "Check /proc/kallsyms permission or run as root.\n");
+	WARN_ONCE(err < 0, "Couldn't record kernel reference relocation symbol\n"
+			   "Symbol resolution may be skewed if relocation was used (e.g. kexec).\n"
+			   "Check /proc/kallsyms permission or run as root.\n");
 
 	err = perf_event__synthesize_modules(tool, process_synthesized_event,
 					     machine);
-	if (err < 0)
-		pr_err("Couldn't record kernel module information.\n"
-		       "Symbol resolution may be skewed if relocation was used (e.g. kexec).\n"
-		       "Check /proc/modules permission or run as root.\n");
+	WARN_ONCE(err < 0, "Couldn't record kernel module information.\n"
+			   "Symbol resolution may be skewed if relocation was used (e.g. kexec).\n"
+			   "Check /proc/modules permission or run as root.\n");
 
 	if (perf_guest) {
 		machines__process_guests(&session->machines,

^ permalink raw reply related	[flat|nested] 60+ messages in thread

* [tip:perf/core] perf record: Extract synthesize code to record__synthesize()
  2016-02-26  9:32 ` [PATCH 19/46] perf record: Extract synthesize code to record__synthesize() Wang Nan
@ 2016-03-05  8:16   ` tip-bot for Wang Nan
  0 siblings, 0 replies; 60+ messages in thread
From: tip-bot for Wang Nan @ 2016-03-05  8:16 UTC (permalink / raw)
  To: linux-tip-commits
  Cc: tglx, hekuang, lizefan, namhyung, wangnan0, peterz, ast,
	linux-kernel, mingo, masami.hiramatsu.pt, acme, hpa, jolsa

Commit-ID:  c45c86eb70964615bd13b4c1e647bf9ee60c3db9
Gitweb:     http://git.kernel.org/tip/c45c86eb70964615bd13b4c1e647bf9ee60c3db9
Author:     Wang Nan <wangnan0@huawei.com>
AuthorDate: Fri, 26 Feb 2016 09:32:07 +0000
Committer:  Arnaldo Carvalho de Melo <acme@redhat.com>
CommitDate: Thu, 3 Mar 2016 11:10:35 -0300

perf record: Extract synthesize code to record__synthesize()

Create record__synthesize(). It can be used to create tracking events
for each perf.data after perf supporting splitting into multiple
outputs.

Signed-off-by: He Kuang <hekuang@huawei.com>
Cc: Alexei Starovoitov <ast@kernel.org>
Cc: He Kuang <hekuang@huawei.com>
Cc: Jiri Olsa <jolsa@kernel.org>
Cc: Li Zefan <lizefan@huawei.com>
Cc: Masami Hiramatsu <masami.hiramatsu.pt@hitachi.com>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Zefan Li <lizefan@huawei.com>
Cc: pi3orama@163.com
Link: http://lkml.kernel.org/r/1456479154-136027-20-git-send-email-wangnan0@huawei.com
Signed-off-by: Wang Nan <wangnan0@huawei.com>
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
---
 tools/perf/builtin-record.c | 125 +++++++++++++++++++++++++-------------------
 1 file changed, 70 insertions(+), 55 deletions(-)

diff --git a/tools/perf/builtin-record.c b/tools/perf/builtin-record.c
index 9dec7e5..cb583b4 100644
--- a/tools/perf/builtin-record.c
+++ b/tools/perf/builtin-record.c
@@ -486,6 +486,74 @@ static void workload_exec_failed_signal(int signo __maybe_unused,
 
 static void snapshot_sig_handler(int sig);
 
+static int record__synthesize(struct record *rec)
+{
+	struct perf_session *session = rec->session;
+	struct machine *machine = &session->machines.host;
+	struct perf_data_file *file = &rec->file;
+	struct record_opts *opts = &rec->opts;
+	struct perf_tool *tool = &rec->tool;
+	int fd = perf_data_file__fd(file);
+	int err = 0;
+
+	if (file->is_pipe) {
+		err = perf_event__synthesize_attrs(tool, session,
+						   process_synthesized_event);
+		if (err < 0) {
+			pr_err("Couldn't synthesize attrs.\n");
+			goto out;
+		}
+
+		if (have_tracepoints(&rec->evlist->entries)) {
+			/*
+			 * FIXME err <= 0 here actually means that
+			 * there were no tracepoints so its not really
+			 * an error, just that we don't need to
+			 * synthesize anything.  We really have to
+			 * return this more properly and also
+			 * propagate errors that now are calling die()
+			 */
+			err = perf_event__synthesize_tracing_data(tool,	fd, rec->evlist,
+								  process_synthesized_event);
+			if (err <= 0) {
+				pr_err("Couldn't record tracing data.\n");
+				goto out;
+			}
+			rec->bytes_written += err;
+		}
+	}
+
+	if (rec->opts.full_auxtrace) {
+		err = perf_event__synthesize_auxtrace_info(rec->itr, tool,
+					session, process_synthesized_event);
+		if (err)
+			goto out;
+	}
+
+	err = perf_event__synthesize_kernel_mmap(tool, process_synthesized_event,
+						 machine);
+	WARN_ONCE(err < 0, "Couldn't record kernel reference relocation symbol\n"
+			   "Symbol resolution may be skewed if relocation was used (e.g. kexec).\n"
+			   "Check /proc/kallsyms permission or run as root.\n");
+
+	err = perf_event__synthesize_modules(tool, process_synthesized_event,
+					     machine);
+	WARN_ONCE(err < 0, "Couldn't record kernel module information.\n"
+			   "Symbol resolution may be skewed if relocation was used (e.g. kexec).\n"
+			   "Check /proc/modules permission or run as root.\n");
+
+	if (perf_guest) {
+		machines__process_guests(&session->machines,
+					 perf_event__synthesize_guest_os, tool);
+	}
+
+	err = __machine__synthesize_threads(machine, tool, &opts->target, rec->evlist->threads,
+					    process_synthesized_event, opts->sample_address,
+					    opts->proc_map_timeout);
+out:
+	return err;
+}
+
 static int __cmd_record(struct record *rec, int argc, const char **argv)
 {
 	int err;
@@ -580,61 +648,8 @@ static int __cmd_record(struct record *rec, int argc, const char **argv)
 
 	machine = &session->machines.host;
 
-	if (file->is_pipe) {
-		err = perf_event__synthesize_attrs(tool, session,
-						   process_synthesized_event);
-		if (err < 0) {
-			pr_err("Couldn't synthesize attrs.\n");
-			goto out_child;
-		}
-
-		if (have_tracepoints(&rec->evlist->entries)) {
-			/*
-			 * FIXME err <= 0 here actually means that
-			 * there were no tracepoints so its not really
-			 * an error, just that we don't need to
-			 * synthesize anything.  We really have to
-			 * return this more properly and also
-			 * propagate errors that now are calling die()
-			 */
-			err = perf_event__synthesize_tracing_data(tool,	fd, rec->evlist,
-								  process_synthesized_event);
-			if (err <= 0) {
-				pr_err("Couldn't record tracing data.\n");
-				goto out_child;
-			}
-			rec->bytes_written += err;
-		}
-	}
-
-	if (rec->opts.full_auxtrace) {
-		err = perf_event__synthesize_auxtrace_info(rec->itr, tool,
-					session, process_synthesized_event);
-		if (err)
-			goto out_delete_session;
-	}
-
-	err = perf_event__synthesize_kernel_mmap(tool, process_synthesized_event,
-						 machine);
-	WARN_ONCE(err < 0, "Couldn't record kernel reference relocation symbol\n"
-			   "Symbol resolution may be skewed if relocation was used (e.g. kexec).\n"
-			   "Check /proc/kallsyms permission or run as root.\n");
-
-	err = perf_event__synthesize_modules(tool, process_synthesized_event,
-					     machine);
-	WARN_ONCE(err < 0, "Couldn't record kernel module information.\n"
-			   "Symbol resolution may be skewed if relocation was used (e.g. kexec).\n"
-			   "Check /proc/modules permission or run as root.\n");
-
-	if (perf_guest) {
-		machines__process_guests(&session->machines,
-					 perf_event__synthesize_guest_os, tool);
-	}
-
-	err = __machine__synthesize_threads(machine, tool, &opts->target, rec->evlist->threads,
-					    process_synthesized_event, opts->sample_address,
-					    opts->proc_map_timeout);
-	if (err != 0)
+	err = record__synthesize(rec);
+	if (err < 0)
 		goto out_child;
 
 	if (rec->realtime_prio) {

^ permalink raw reply related	[flat|nested] 60+ messages in thread

* [tip:perf/core] perf record: Introduce record__finish_output() to finish a perf.data
  2016-02-26  9:32 ` [PATCH 22/46] perf record: Introduce record__finish_output() to finish a perf.data Wang Nan
@ 2016-03-05  8:16   ` tip-bot for Wang Nan
  0 siblings, 0 replies; 60+ messages in thread
From: tip-bot for Wang Nan @ 2016-03-05  8:16 UTC (permalink / raw)
  To: linux-tip-commits
  Cc: peterz, acme, hekuang, ast, jolsa, lizefan, namhyung, tglx, hpa,
	masami.hiramatsu.pt, mingo, linux-kernel, wangnan0

Commit-ID:  e1ab48ba63ee6b2494a67bb60bf99692ecdaca7c
Gitweb:     http://git.kernel.org/tip/e1ab48ba63ee6b2494a67bb60bf99692ecdaca7c
Author:     Wang Nan <wangnan0@huawei.com>
AuthorDate: Fri, 26 Feb 2016 09:32:10 +0000
Committer:  Arnaldo Carvalho de Melo <acme@redhat.com>
CommitDate: Thu, 3 Mar 2016 11:10:35 -0300

perf record: Introduce record__finish_output() to finish a perf.data

Move code for finalizing 'perf.data' to record__finish_output(). It will
be used by following commits to split output to multiple files.

Signed-off-by: He Kuang <hekuang@huawei.com>
Cc: Alexei Starovoitov <ast@kernel.org>
Cc: He Kuang <hekuang@huawei.com>
Cc: Jiri Olsa <jolsa@kernel.org>
Cc: Li Zefan <lizefan@huawei.com>
Cc: Masami Hiramatsu <masami.hiramatsu.pt@hitachi.com>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Zefan Li <lizefan@huawei.com>
Cc: pi3orama@163.com
Link: http://lkml.kernel.org/r/1456479154-136027-23-git-send-email-wangnan0@huawei.com
Signed-off-by: Wang Nan <wangnan0@huawei.com>
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
---
 tools/perf/builtin-record.c | 37 +++++++++++++++++++++++++------------
 1 file changed, 25 insertions(+), 12 deletions(-)

diff --git a/tools/perf/builtin-record.c b/tools/perf/builtin-record.c
index cb583b4..46e2772 100644
--- a/tools/perf/builtin-record.c
+++ b/tools/perf/builtin-record.c
@@ -468,6 +468,29 @@ static void record__init_features(struct record *rec)
 	perf_header__clear_feat(&session->header, HEADER_STAT);
 }
 
+static void
+record__finish_output(struct record *rec)
+{
+	struct perf_data_file *file = &rec->file;
+	int fd = perf_data_file__fd(file);
+
+	if (file->is_pipe)
+		return;
+
+	rec->session->header.data_size += rec->bytes_written;
+	file->size = lseek(perf_data_file__fd(file), 0, SEEK_CUR);
+
+	if (!rec->no_buildid) {
+		process_buildids(rec);
+
+		if (rec->buildid_all)
+			dsos__hit_all(rec->session);
+	}
+	perf_session__write_header(rec->session, rec->evlist, fd, true);
+
+	return;
+}
+
 static volatile int workload_exec_errno;
 
 /*
@@ -785,18 +808,8 @@ out_child:
 	/* this will be recalculated during process_buildids() */
 	rec->samples = 0;
 
-	if (!err && !file->is_pipe) {
-		rec->session->header.data_size += rec->bytes_written;
-		file->size = lseek(perf_data_file__fd(file), 0, SEEK_CUR);
-
-		if (!rec->no_buildid) {
-			process_buildids(rec);
-
-			if (rec->buildid_all)
-				dsos__hit_all(rec->session);
-		}
-		perf_session__write_header(rec->session, rec->evlist, fd, true);
-	}
+	if (!err)
+		record__finish_output(rec);
 
 	if (!err && !quiet) {
 		char samples[128];

^ permalink raw reply related	[flat|nested] 60+ messages in thread

* [tip:perf/core] perf record: Ensure return non-zero rc when mmap fail
  2016-02-26  9:32 ` [PATCH 29/46] perf record: Ensure return non-zero rc when mmap fail Wang Nan
@ 2016-03-05  8:17   ` tip-bot for Wang Nan
  0 siblings, 0 replies; 60+ messages in thread
From: tip-bot for Wang Nan @ 2016-03-05  8:17 UTC (permalink / raw)
  To: linux-tip-commits
  Cc: namhyung, tglx, linux-kernel, ast, jolsa, lizefan, acme,
	wangnan0, mingo, hekuang, masami.hiramatsu.pt, hpa

Commit-ID:  95c365617aa37878592f2f1c6c64e1abb19f0d4a
Gitweb:     http://git.kernel.org/tip/95c365617aa37878592f2f1c6c64e1abb19f0d4a
Author:     Wang Nan <wangnan0@huawei.com>
AuthorDate: Fri, 26 Feb 2016 09:32:17 +0000
Committer:  Arnaldo Carvalho de Melo <acme@redhat.com>
CommitDate: Thu, 3 Mar 2016 11:10:36 -0300

perf record: Ensure return non-zero rc when mmap fail

perf_evlist__mmap_ex() can fail without setting errno (for example, fail
in condition checking. In this case all syscall is success).

If this happen, record__open() incorrectly returns 0. Force setting rc
is a quick way to avoid this problem, or we have to follow all possible
code path in perf_evlist__mmap_ex() to make sure there's at least one
system call before returning an error.

Signed-off-by: Wang Nan <wangnan0@huawei.com>
Cc: Alexei Starovoitov <ast@kernel.org>
Cc: He Kuang <hekuang@huawei.com>
Cc: Jiri Olsa <jolsa@kernel.org>
Cc: Li Zefan <lizefan@huawei.com>
Cc: Masami Hiramatsu <masami.hiramatsu.pt@hitachi.com>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Zefan Li <lizefan@huawei.com>
Cc: pi3orama@163.com
Link: http://lkml.kernel.org/r/1456479154-136027-30-git-send-email-wangnan0@huawei.com
Signed-off-by: He Kuang <hekuang@huawei.com>
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
---
 tools/perf/builtin-record.c | 5 ++++-
 1 file changed, 4 insertions(+), 1 deletion(-)

diff --git a/tools/perf/builtin-record.c b/tools/perf/builtin-record.c
index 46e2772..515510e 100644
--- a/tools/perf/builtin-record.c
+++ b/tools/perf/builtin-record.c
@@ -324,7 +324,10 @@ try_again:
 		} else {
 			pr_err("failed to mmap with %d (%s)\n", errno,
 				strerror_r(errno, msg, sizeof(msg)));
-			rc = -errno;
+			if (errno)
+				rc = -errno;
+			else
+				rc = -EINVAL;
 		}
 		goto out;
 	}

^ permalink raw reply related	[flat|nested] 60+ messages in thread

* [tip:perf/urgent] perf symbols: Record text offset in dso to calculate objdump address
  2016-02-26  9:31 ` [PATCH 01/46] perf tools: Record text offset in dso to calculate objdump address Wang Nan
@ 2016-03-24  7:37   ` tip-bot for Wang Nan
  0 siblings, 0 replies; 60+ messages in thread
From: tip-bot for Wang Nan @ 2016-03-24  7:37 UTC (permalink / raw)
  To: linux-tip-commits
  Cc: namhyung, linux-kernel, lizefan, hpa, acme, ast, tglx,
	adrian.hunter, hekuang, kirr, peterz, masami.hiramatsu.pt, dev,
	mingo, wangnan0, jolsa

Commit-ID:  73cdf0c6ea9c597894924f3b91ad636389555c1b
Gitweb:     http://git.kernel.org/tip/73cdf0c6ea9c597894924f3b91ad636389555c1b
Author:     Wang Nan <wangnan0@huawei.com>
AuthorDate: Fri, 26 Feb 2016 09:31:49 +0000
Committer:  Arnaldo Carvalho de Melo <acme@redhat.com>
CommitDate: Fri, 18 Mar 2016 14:23:59 -0300

perf symbols: Record text offset in dso to calculate objdump address

Store DSO's .text offset into DSO, used for VDSOs and will also be used for
other needs, like handling kernel modules.

Signed-off-by: Wang Nan <wangnan0@huawei.com>
Cc: Adrian Hunter <adrian.hunter@intel.com>
Cc: Alexei Starovoitov <ast@kernel.org>
Cc: Cody P Schafer <dev@codyps.com>
Cc: He Kuang <hekuang@huawei.com>
Cc: Jiri Olsa <jolsa@kernel.org>
Cc: Kirill Smelkov <kirr@nexedi.com>
Cc: Li Zefan <lizefan@huawei.com>
Cc: Masami Hiramatsu <masami.hiramatsu.pt@hitachi.com>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: pi3orama@163.com
Link: http://lkml.kernel.org/r/1456479154-136027-2-git-send-email-wangnan0@huawei.com
[ Extracted from larger patch ]
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
---
 tools/perf/util/dso.h        |  1 +
 tools/perf/util/symbol-elf.c | 12 ++++++------
 2 files changed, 7 insertions(+), 6 deletions(-)

diff --git a/tools/perf/util/dso.h b/tools/perf/util/dso.h
index 45ec4d0..ef3dbc9 100644
--- a/tools/perf/util/dso.h
+++ b/tools/perf/util/dso.h
@@ -162,6 +162,7 @@ struct dso {
 	u8		 loaded;
 	u8		 rel;
 	u8		 build_id[BUILD_ID_SIZE];
+	u64		 text_offset;
 	const char	 *short_name;
 	const char	 *long_name;
 	u16		 long_name_len;
diff --git a/tools/perf/util/symbol-elf.c b/tools/perf/util/symbol-elf.c
index b1dd68f..bc229a7 100644
--- a/tools/perf/util/symbol-elf.c
+++ b/tools/perf/util/symbol-elf.c
@@ -793,6 +793,7 @@ int dso__load_sym(struct dso *dso, struct map *map,
 	uint32_t idx;
 	GElf_Ehdr ehdr;
 	GElf_Shdr shdr;
+	GElf_Shdr tshdr;
 	Elf_Data *syms, *opddata = NULL;
 	GElf_Sym sym;
 	Elf_Scn *sec, *sec_strndx;
@@ -832,6 +833,9 @@ int dso__load_sym(struct dso *dso, struct map *map,
 	sec = syms_ss->symtab;
 	shdr = syms_ss->symshdr;
 
+	if (elf_section_by_name(elf, &ehdr, &tshdr, ".text", NULL))
+		dso->text_offset = tshdr.sh_addr - tshdr.sh_offset;
+
 	if (runtime_ss->opdsec)
 		opddata = elf_rawdata(runtime_ss->opdsec, NULL);
 
@@ -880,12 +884,8 @@ int dso__load_sym(struct dso *dso, struct map *map,
 	 * Handle any relocation of vdso necessary because older kernels
 	 * attempted to prelink vdso to its virtual address.
 	 */
-	if (dso__is_vdso(dso)) {
-		GElf_Shdr tshdr;
-
-		if (elf_section_by_name(elf, &ehdr, &tshdr, ".text", NULL))
-			map->reloc = map->start - tshdr.sh_addr + tshdr.sh_offset;
-	}
+	if (dso__is_vdso(dso))
+		map->reloc = map->start - dso->text_offset;
 
 	dso->adjust_symbols = runtime_ss->adjust_symbols || ref_reloc(kmap);
 	/*

^ permalink raw reply related	[flat|nested] 60+ messages in thread

end of thread, other threads:[~2016-03-24  7:39 UTC | newest]

Thread overview: 60+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2016-02-26  9:31 [PATCH 00/46] perf tools: Fix and improvements (bpf and overwrite) Wang Nan
2016-02-26  9:31 ` [PATCH 01/46] perf tools: Record text offset in dso to calculate objdump address Wang Nan
2016-03-24  7:37   ` [tip:perf/urgent] perf symbols: " tip-bot for Wang Nan
2016-02-26  9:31 ` [PATCH 02/46] perf tools: Adjust symbol for shared objects Wang Nan
2016-02-26  9:31 ` [PATCH 03/46] perf config: Bring perf_default_config to the very beginning at main() Wang Nan
2016-02-27  9:44   ` [tip:perf/core] " tip-bot for Wang Nan
2016-02-26  9:31 ` [PATCH 04/46] perf trace: Improve error message when receive non-tracepoint events Wang Nan
2016-02-26  9:31 ` [PATCH 05/46] perf tools: Only set filter for tracepoints events Wang Nan
2016-02-27  9:45   ` [tip:perf/core] " tip-bot for Wang Nan
2016-02-26  9:31 ` [PATCH 06/46] perf trace: Call bpf__apply_obj_config in 'perf trace' Wang Nan
2016-02-27  9:45   ` [tip:perf/core] " tip-bot for Wang Nan
2016-02-26  9:31 ` [PATCH 07/46] perf trace: Print content of bpf-output event Wang Nan
2016-02-27  9:45   ` [tip:perf/core] " tip-bot for Wang Nan
2016-02-26  9:31 ` [PATCH 08/46] perf data: Support converting data from bpf_perf_event_output() Wang Nan
2016-03-05  8:15   ` [tip:perf/core] " tip-bot for Wang Nan
2016-02-26  9:31 ` [PATCH 09/46] perf data: Explicitly set byte order for integer types Wang Nan
2016-03-05  8:15   ` [tip:perf/core] " tip-bot for Wang Nan
2016-02-26  9:31 ` [PATCH 10/46] perf core: Introduce new ioctl options to pause and resume ring buffer Wang Nan
2016-02-29 15:39   ` Arnaldo Carvalho de Melo
2016-03-03  2:03     ` Wangnan (F)
2016-02-26  9:31 ` [PATCH 11/46] perf core: Set event's default overflow_handler Wang Nan
2016-02-26  9:32 ` [PATCH 12/46] perf core: Prepare writing into ring buffer from end Wang Nan
2016-02-26  9:32 ` [PATCH 13/46] perf core: Add backward attribute to perf event Wang Nan
2016-02-26  9:32 ` [PATCH 14/46] perf core: Reduce perf event output overhead by new overflow handler Wang Nan
2016-02-26  9:32 ` [PATCH 15/46] perf tools: Only validate is_pos for tracking evsels Wang Nan
2016-02-26  9:32 ` [PATCH 16/46] perf tools: Print write_backward value in perf_event_attr__fprintf Wang Nan
2016-02-26  9:32 ` [PATCH 17/46] perf tools: Make ordered_events reusable Wang Nan
2016-02-26  9:32 ` [PATCH 18/46] perf record: Use WARN_ONCE to replace 'if' condition Wang Nan
2016-03-05  8:15   ` [tip:perf/core] " tip-bot for Wang Nan
2016-02-26  9:32 ` [PATCH 19/46] perf record: Extract synthesize code to record__synthesize() Wang Nan
2016-03-05  8:16   ` [tip:perf/core] " tip-bot for Wang Nan
2016-02-26  9:32 ` [PATCH 20/46] perf tools: Add perf_data_file__switch() helper Wang Nan
2016-02-26  9:32 ` [PATCH 21/46] perf record: Turns auxtrace_snapshot_enable into 3 states Wang Nan
2016-02-26  9:32 ` [PATCH 22/46] perf record: Introduce record__finish_output() to finish a perf.data Wang Nan
2016-03-05  8:16   ` [tip:perf/core] " tip-bot for Wang Nan
2016-02-26  9:32 ` [PATCH 23/46] perf record: Add '--timestamp-filename' option to append timestamp to output filename Wang Nan
2016-02-26  9:32 ` [PATCH 24/46] perf record: Split output into multiple files via '--switch-output' Wang Nan
2016-02-26  9:32 ` [PATCH 25/46] perf record: Force enable --timestamp-filename when --switch-output is provided Wang Nan
2016-02-26  9:32 ` [PATCH 26/46] perf record: Disable buildid cache options by default in switch output mode Wang Nan
2016-02-26  9:32 ` [PATCH 27/46] perf record: Re-synthesize tracking events after output switching Wang Nan
2016-02-26  9:32 ` [PATCH 28/46] perf record: Generate tracking events for process forked by perf Wang Nan
2016-02-26  9:32 ` [PATCH 29/46] perf record: Ensure return non-zero rc when mmap fail Wang Nan
2016-03-05  8:17   ` [tip:perf/core] " tip-bot for Wang Nan
2016-02-26  9:32 ` [PATCH 30/46] perf record: Prevent reading invalid data in record__mmap_read Wang Nan
2016-02-26  9:32 ` [PATCH 31/46] perf tools: Add evlist channel helpers Wang Nan
2016-02-26  9:32 ` [PATCH 32/46] perf tools: Automatically add new channel according to evlist Wang Nan
2016-02-26  9:32 ` [PATCH 33/46] perf tools: Operate multiple channels Wang Nan
2016-02-26  9:32 ` [PATCH 34/46] perf tools: Squash overwrite setting into channel Wang Nan
2016-02-26  9:32 ` [PATCH 35/46] perf record: Don't read from and poll overwrite channel Wang Nan
2016-02-26  9:32 ` [PATCH 36/46] perf record: Don't poll on " Wang Nan
2016-02-26  9:32 ` [PATCH 37/46] perf tools: Detect avalibility of write_backward Wang Nan
2016-02-26  9:32 ` [PATCH 38/46] perf tools: Enable overwrite settings Wang Nan
2016-02-26  9:32 ` [PATCH 39/46] perf tools: Set write_backward attribut bit for overwrite events Wang Nan
2016-02-26  9:32 ` [PATCH 40/46] perf tools: Record fd into perf_mmap Wang Nan
2016-02-26  9:32 ` [PATCH 41/46] perf tools: Add API to pause a channel Wang Nan
2016-02-26  9:32 ` [PATCH 42/46] perf record: Toggle overwrite ring buffer for reading Wang Nan
2016-02-26  9:32 ` [PATCH 43/46] perf record: Rename variable to make code clear Wang Nan
2016-02-26  9:32 ` [PATCH 44/46] perf record: Read from backward ring buffer Wang Nan
2016-02-26  9:32 ` [PATCH 45/46] perf record: Allow generate tracking events at the end of output Wang Nan
2016-02-26  9:32 ` [PATCH 46/46] perf tools: Don't warn about out of order event if write_backward is used Wang Nan

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.