All of lore.kernel.org
 help / color / mirror / Atom feed
* [PATCH 0/3] perf tools: Speedup DWARF unwind
@ 2014-04-17 17:39 Jiri Olsa
  2014-04-17 17:39 ` [PATCH 1/3] perf tools: Cache register accesses for unwind processing Jiri Olsa
                   ` (5 more replies)
  0 siblings, 6 replies; 24+ messages in thread
From: Jiri Olsa @ 2014-04-17 17:39 UTC (permalink / raw)
  To: linux-kernel
  Cc: Corey Ashford, David Ahern, Frederic Weisbecker, Ingo Molnar,
	Namhyung Kim, Paul Mackerras, Peter Zijlstra,
	Arnaldo Carvalho de Melo, Jean Pihet, Jiri Olsa

hi,
trying to speedup DWARF unwind report code by factoring
related code:
  - caching sample's registers access
  - keep dso data file descriptor open for the
    life of the dso object
  - replace dso cache code by mapping dso data file
    directly for the life of the dso object

The speedup is mainly for libunwind unwind. The libdw will benefit
mainly from cached registers access, because it handles dso data
accesses by itself.. and anyway it's still faster ;-).

Also reachable in here:
  git://git.kernel.org/pub/scm/linux/kernel/git/jolsa/perf.git
  perf/core_unwind_speedup

thanks,
jirka

Cc: Corey Ashford <cjashfor@linux.vnet.ibm.com>
Cc: David Ahern <dsahern@gmail.com>
Cc: Frederic Weisbecker <fweisbec@gmail.com>
Cc: Ingo Molnar <mingo@kernel.org>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Paul Mackerras <paulus@samba.org>
Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
Cc: Arnaldo Carvalho de Melo <acme@ghostprotocols.net>
Cc: Jean Pihet <jean.pihet@linaro.org>
Signed-off-by: Jiri Olsa <jolsa@redhat.com>
---
Jiri Olsa (3):
      perf tools: Cache register accesses for unwind processing
      perf tools: Cache dso data file descriptor
      perf tools: Replace dso data cache with mapped data

 tools/perf/tests/dso-data.c        |   7 ++++
 tools/perf/util/dso.c              | 200 +++++++++++++++++++++++++++---------------------------------------------------------------------
 tools/perf/util/dso.h              |  14 ++-----
 tools/perf/util/event.h            |   5 +++
 tools/perf/util/perf_regs.c        |  10 ++++-
 tools/perf/util/perf_regs.h        |   4 +-
 tools/perf/util/unwind-libunwind.c |   2 -
 7 files changed, 83 insertions(+), 159 deletions(-)

^ permalink raw reply	[flat|nested] 24+ messages in thread

* [PATCH 1/3] perf tools: Cache register accesses for unwind processing
  2014-04-17 17:39 [PATCH 0/3] perf tools: Speedup DWARF unwind Jiri Olsa
@ 2014-04-17 17:39 ` Jiri Olsa
  2014-04-27 14:29   ` Namhyung Kim
  2014-04-28 10:39   ` Christian Borntraeger
  2014-04-17 17:39 ` [PATCH 2/3] perf tools: Cache dso data file descriptor Jiri Olsa
                   ` (4 subsequent siblings)
  5 siblings, 2 replies; 24+ messages in thread
From: Jiri Olsa @ 2014-04-17 17:39 UTC (permalink / raw)
  To: linux-kernel
  Cc: Jiri Olsa, Corey Ashford, David Ahern, Frederic Weisbecker,
	Ingo Molnar, Namhyung Kim, Paul Mackerras, Peter Zijlstra,
	Arnaldo Carvalho de Melo, Jean Pihet

Caching registers value into an array. Got about 4% speed up
of perf_reg_value function for report command processing
dwarf unwind stacks.

Output from report over 1.5 GB data with DWARF unwind stacks:
(TODO fix perf diff)

  current code:
   6.81%  perf.old  perf.old                   [.] perf_reg_value

  change:
   2.24%  perf      perf                       [.] perf_reg_value

And little bit of speed up:

 Performance counter stats for './perf.old report -i perf-test.data --stdio':

   134,664,011,577      cycles:u                  #    2.472 GHz
   189,677,227,475      instructions:u            #    1.41  insns per cycle
      54465.096050      task-clock (msec)         #    0.998 CPUs utilized

      54.598339009 seconds time elapsed

 Performance counter stats for './perf report -i perf-test.data --stdio':

   124,478,681,672      cycles:u                  #    2.466 GHz
   168,998,379,866      instructions:u            #    1.36  insns per cycle
      50487.110482      task-clock (msec)         #    0.997 CPUs utilized

      50.635824229 seconds time elapsed

Cc: Corey Ashford <cjashfor@linux.vnet.ibm.com>
Cc: David Ahern <dsahern@gmail.com>
Cc: Frederic Weisbecker <fweisbec@gmail.com>
Cc: Ingo Molnar <mingo@kernel.org>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Paul Mackerras <paulus@samba.org>
Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
Cc: Arnaldo Carvalho de Melo <acme@ghostprotocols.net>
Cc: Jean Pihet <jean.pihet@linaro.org>
Signed-off-by: Jiri Olsa <jolsa@redhat.com>
---
 tools/perf/util/event.h     |  5 +++++
 tools/perf/util/perf_regs.c | 10 +++++++++-
 tools/perf/util/perf_regs.h |  4 +++-
 3 files changed, 17 insertions(+), 2 deletions(-)

diff --git a/tools/perf/util/event.h b/tools/perf/util/event.h
index 38457d4..970d4eb 100644
--- a/tools/perf/util/event.h
+++ b/tools/perf/util/event.h
@@ -7,6 +7,7 @@
 #include "../perf.h"
 #include "map.h"
 #include "build-id.h"
+#include "perf_regs.h"
 
 struct mmap_event {
 	struct perf_event_header header;
@@ -87,6 +88,10 @@ struct regs_dump {
 	u64 abi;
 	u64 mask;
 	u64 *regs;
+
+	/* Cached values/mask filled by first register access. */
+	u64 cache_regs[PERF_REGS_MAX];
+	u64 cache_mask;
 };
 
 struct stack_dump {
diff --git a/tools/perf/util/perf_regs.c b/tools/perf/util/perf_regs.c
index a3539ef..43168fb 100644
--- a/tools/perf/util/perf_regs.c
+++ b/tools/perf/util/perf_regs.c
@@ -1,11 +1,15 @@
 #include <errno.h>
 #include "perf_regs.h"
+#include "event.h"
 
 int perf_reg_value(u64 *valp, struct regs_dump *regs, int id)
 {
 	int i, idx = 0;
 	u64 mask = regs->mask;
 
+	if (regs->cache_mask & (1 << id))
+		goto out;
+
 	if (!(mask & (1 << id)))
 		return -EINVAL;
 
@@ -14,6 +18,10 @@ int perf_reg_value(u64 *valp, struct regs_dump *regs, int id)
 			idx++;
 	}
 
-	*valp = regs->regs[idx];
+	regs->cache_mask |= (1 << id);
+	regs->cache_regs[id] = regs->regs[idx];
+
+out:
+	*valp = regs->cache_regs[id];
 	return 0;
 }
diff --git a/tools/perf/util/perf_regs.h b/tools/perf/util/perf_regs.h
index d6e8b6a..80d8ab1 100644
--- a/tools/perf/util/perf_regs.h
+++ b/tools/perf/util/perf_regs.h
@@ -2,15 +2,17 @@
 #define __PERF_REGS_H
 
 #include "types.h"
-#include "event.h"
 
 #ifdef HAVE_PERF_REGS_SUPPORT
 #include <perf_regs.h>
 
+struct regs_dump;
+
 int perf_reg_value(u64 *valp, struct regs_dump *regs, int id);
 
 #else
 #define PERF_REGS_MASK	0
+#define PERF_REGS_MAX	0
 
 static inline const char *perf_reg_name(int id __maybe_unused)
 {
-- 
1.8.3.1


^ permalink raw reply related	[flat|nested] 24+ messages in thread

* [PATCH 2/3] perf tools: Cache dso data file descriptor
  2014-04-17 17:39 [PATCH 0/3] perf tools: Speedup DWARF unwind Jiri Olsa
  2014-04-17 17:39 ` [PATCH 1/3] perf tools: Cache register accesses for unwind processing Jiri Olsa
@ 2014-04-17 17:39 ` Jiri Olsa
  2014-04-27 14:36   ` Namhyung Kim
  2014-04-17 17:39 ` [PATCH 3/3] perf tools: Replace dso data cache with mapped data Jiri Olsa
                   ` (3 subsequent siblings)
  5 siblings, 1 reply; 24+ messages in thread
From: Jiri Olsa @ 2014-04-17 17:39 UTC (permalink / raw)
  To: linux-kernel
  Cc: Jiri Olsa, Corey Ashford, David Ahern, Frederic Weisbecker,
	Ingo Molnar, Namhyung Kim, Paul Mackerras, Peter Zijlstra,
	Arnaldo Carvalho de Melo, Jean Pihet

Keeping the data file description open for the whole life
of the dso object.

The report shows just little speedup in dso__data_fd function
for report command processing dwarf unwind stacks.

Output from report over 1.5 GB data with DWARF unwind stacks:
(TODO fix perf diff)

  current code:
   0.22%  perf.old  perf.old                   [.] dso__data_fd

  change:
   0.15%  perf      perf                       [.] dso__data_fd

But a bigger overall speedup:

 Performance counter stats for './perf.old report -i perf-test.data --stdio':

   126,055,895,573      cycles:u                  #    2.463 GHz
   168,964,795,208      instructions:u            #    1.34  insns per cycle
      51174.366434      task-clock (msec)         #    0.997 CPUs utilized

      51.306236943 seconds time elapsed

 Performance counter stats for './perf report -i perf-test.data --stdio':

   112,531,906,656      cycles:u                  #    2.680 GHz
   163,466,037,207      instructions:u            #    1.45  insns per cycle
      41991.297576      task-clock (msec)         #    1.000 CPUs utilized

      41.985142753 seconds time elapsed

Cc: Corey Ashford <cjashfor@linux.vnet.ibm.com>
Cc: David Ahern <dsahern@gmail.com>
Cc: Frederic Weisbecker <fweisbec@gmail.com>
Cc: Ingo Molnar <mingo@kernel.org>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Paul Mackerras <paulus@samba.org>
Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
Cc: Arnaldo Carvalho de Melo <acme@ghostprotocols.net>
Cc: Jean Pihet <jean.pihet@linaro.org>
Signed-off-by: Jiri Olsa <jolsa@redhat.com>
---
 tools/perf/util/dso.c              | 15 +++++++++++----
 tools/perf/util/dso.h              |  1 +
 tools/perf/util/unwind-libunwind.c |  2 --
 3 files changed, 12 insertions(+), 6 deletions(-)

diff --git a/tools/perf/util/dso.c b/tools/perf/util/dso.c
index 64453d6..0dca5d6 100644
--- a/tools/perf/util/dso.c
+++ b/tools/perf/util/dso.c
@@ -159,6 +159,12 @@ static int open_dso(struct dso *dso, struct machine *machine)
 	return fd;
 }
 
+static void dso__data_close(struct dso *dso)
+{
+	if (dso->data_fd >= 0)
+		close(dso->data_fd);
+}
+
 int dso__data_fd(struct dso *dso, struct machine *machine)
 {
 	enum dso_binary_type binary_type_data[] = {
@@ -168,8 +174,8 @@ int dso__data_fd(struct dso *dso, struct machine *machine)
 	};
 	int i = 0;
 
-	if (dso->binary_type != DSO_BINARY_TYPE__NOT_FOUND)
-		return open_dso(dso, machine);
+	if (dso->data_fd >= 0)
+		return dso->data_fd;
 
 	do {
 		int fd;
@@ -178,7 +184,7 @@ int dso__data_fd(struct dso *dso, struct machine *machine)
 
 		fd = open_dso(dso, machine);
 		if (fd >= 0)
-			return fd;
+			return dso->data_fd = fd;
 
 	} while (dso->binary_type != DSO_BINARY_TYPE__NOT_FOUND);
 
@@ -301,7 +307,6 @@ dso_cache__read(struct dso *dso, struct machine *machine,
 	if (ret <= 0)
 		free(cache);
 
-	close(fd);
 	return ret;
 }
 
@@ -485,6 +490,7 @@ struct dso *dso__new(const char *name)
 		dso->kernel = DSO_TYPE_USER;
 		dso->needs_swap = DSO_SWAP__UNSET;
 		INIT_LIST_HEAD(&dso->node);
+		dso->data_fd = -1;
 	}
 
 	return dso;
@@ -506,6 +512,7 @@ void dso__delete(struct dso *dso)
 		dso->long_name_allocated = false;
 	}
 
+	dso__data_close(dso);
 	dso_cache__free(&dso->cache);
 	dso__free_a2l(dso);
 	zfree(&dso->symsrc_filename);
diff --git a/tools/perf/util/dso.h b/tools/perf/util/dso.h
index ab06f1c..6e48cdc 100644
--- a/tools/perf/util/dso.h
+++ b/tools/perf/util/dso.h
@@ -99,6 +99,7 @@ struct dso {
 	const char	 *long_name;
 	u16		 long_name_len;
 	u16		 short_name_len;
+	int		 data_fd;
 	char		 name[0];
 };
 
diff --git a/tools/perf/util/unwind-libunwind.c b/tools/perf/util/unwind-libunwind.c
index bd5768d..25578b9 100644
--- a/tools/perf/util/unwind-libunwind.c
+++ b/tools/perf/util/unwind-libunwind.c
@@ -250,7 +250,6 @@ static int read_unwind_spec_eh_frame(struct dso *dso, struct machine *machine,
 
 	/* Check the .eh_frame section for unwinding info */
 	offset = elf_section_offset(fd, ".eh_frame_hdr");
-	close(fd);
 
 	if (offset)
 		ret = unwind_spec_ehframe(dso, machine, offset,
@@ -271,7 +270,6 @@ static int read_unwind_spec_debug_frame(struct dso *dso,
 
 	/* Check the .debug_frame section for unwinding info */
 	*offset = elf_section_offset(fd, ".debug_frame");
-	close(fd);
 
 	if (*offset)
 		return 0;
-- 
1.8.3.1


^ permalink raw reply related	[flat|nested] 24+ messages in thread

* [PATCH 3/3] perf tools: Replace dso data cache with mapped data
  2014-04-17 17:39 [PATCH 0/3] perf tools: Speedup DWARF unwind Jiri Olsa
  2014-04-17 17:39 ` [PATCH 1/3] perf tools: Cache register accesses for unwind processing Jiri Olsa
  2014-04-17 17:39 ` [PATCH 2/3] perf tools: Cache dso data file descriptor Jiri Olsa
@ 2014-04-17 17:39 ` Jiri Olsa
  2014-04-18  7:51 ` [PATCH 0/3] perf tools: Speedup DWARF unwind Ingo Molnar
                   ` (2 subsequent siblings)
  5 siblings, 0 replies; 24+ messages in thread
From: Jiri Olsa @ 2014-04-17 17:39 UTC (permalink / raw)
  To: linux-kernel
  Cc: Jiri Olsa, Corey Ashford, David Ahern, Frederic Weisbecker,
	Ingo Molnar, Namhyung Kim, Paul Mackerras, Peter Zijlstra,
	Arnaldo Carvalho de Melo, Jean Pihet

Removing dso data cache processing and mapping
whole dso object instead when requested.

Got about 13% speed up in dso__data_read_offset function
for report command processing dwarf unwind stacks.

Output from report over 1.5 GB data with DWARF unwind stacks:
(TODO fix perf diff)

  13.63%  perf.old  perf.old                   [.] dso__data_read_offset

   0.32%     perf   perf                       [.] dso__data_read_offset

And overall speedup:

 Performance counter stats for './perf.old report -i perf-test.data --stdio':

   113,076,591,004      cycles:u                  #    2.675 GHz
   163,353,590,494      instructions:u            #    1.44  insns per cycle
      42269.774797      task-clock (msec)         #    1.000 CPUs utilized

      42.267550053 seconds time elapsed

 Performance counter stats for './perf report -i perf-test.data --stdio':

    92,953,167,072      cycles:u                  #    2.534 GHz
   132,967,448,023      instructions:u            #    1.43  insns per cycle
      36683.242639      task-clock (msec)         #    1.000 CPUs utilized

      36.682799394 seconds time elapsed

Cc: Corey Ashford <cjashfor@linux.vnet.ibm.com>
Cc: David Ahern <dsahern@gmail.com>
Cc: Frederic Weisbecker <fweisbec@gmail.com>
Cc: Ingo Molnar <mingo@kernel.org>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Paul Mackerras <paulus@samba.org>
Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
Cc: Arnaldo Carvalho de Melo <acme@ghostprotocols.net>
Cc: Jean Pihet <jean.pihet@linaro.org>
Signed-off-by: Jiri Olsa <jolsa@redhat.com>
---
 tools/perf/tests/dso-data.c |   7 ++
 tools/perf/util/dso.c       | 185 +++++++++++---------------------------------
 tools/perf/util/dso.h       |  13 +---
 3 files changed, 54 insertions(+), 151 deletions(-)

diff --git a/tools/perf/tests/dso-data.c b/tools/perf/tests/dso-data.c
index 9cc81a3..024c15f 100644
--- a/tools/perf/tests/dso-data.c
+++ b/tools/perf/tests/dso-data.c
@@ -40,6 +40,13 @@ static char *test_file(int size)
 	return templ;
 }
 
+/*
+ * The data access is now pure memory map of the file,
+ * so we dont need DSO__DATA_CACHE_SIZE anymore.
+ * Anyway keeping it for the sake of this test to
+ * ensure dso__data_read_offset interface works.
+ */
+#define DSO__DATA_CACHE_SIZE 4096
 #define TEST_FILE_SIZE (DSO__DATA_CACHE_SIZE * 20)
 
 struct test_data_offset {
diff --git a/tools/perf/util/dso.c b/tools/perf/util/dso.c
index 0dca5d6..f274c85 100644
--- a/tools/perf/util/dso.c
+++ b/tools/perf/util/dso.c
@@ -1,3 +1,5 @@
+#include <sys/mman.h>
+
 #include "symbol.h"
 #include "dso.h"
 #include "machine.h"
@@ -161,6 +163,14 @@ static int open_dso(struct dso *dso, struct machine *machine)
 
 static void dso__data_close(struct dso *dso)
 {
+	if (dso->data_mmap) {
+		size_t size = PERF_ALIGN(dso->data_size, page_size);
+
+		if (munmap(dso->data_mmap, size))
+			pr_err("dso mmap failed, munmap: %s\n",
+			       strerror(errno));
+	}
+
 	if (dso->data_fd >= 0)
 		close(dso->data_fd);
 }
@@ -191,164 +201,61 @@ int dso__data_fd(struct dso *dso, struct machine *machine)
 	return -EINVAL;
 }
 
-static void
-dso_cache__free(struct rb_root *root)
-{
-	struct rb_node *next = rb_first(root);
-
-	while (next) {
-		struct dso_cache *cache;
-
-		cache = rb_entry(next, struct dso_cache, rb_node);
-		next = rb_next(&cache->rb_node);
-		rb_erase(&cache->rb_node, root);
-		free(cache);
-	}
-}
-
-static struct dso_cache *dso_cache__find(const struct rb_root *root, u64 offset)
+static int dso__data_mmap(struct dso *dso, struct machine *machine, char **ptr)
 {
-	struct rb_node * const *p = &root->rb_node;
-	const struct rb_node *parent = NULL;
-	struct dso_cache *cache;
-
-	while (*p != NULL) {
-		u64 end;
-
-		parent = *p;
-		cache = rb_entry(parent, struct dso_cache, rb_node);
-		end = cache->offset + DSO__DATA_CACHE_SIZE;
-
-		if (offset < cache->offset)
-			p = &(*p)->rb_left;
-		else if (offset >= end)
-			p = &(*p)->rb_right;
-		else
-			return cache;
-	}
-	return NULL;
-}
-
-static void
-dso_cache__insert(struct rb_root *root, struct dso_cache *new)
-{
-	struct rb_node **p = &root->rb_node;
-	struct rb_node *parent = NULL;
-	struct dso_cache *cache;
-	u64 offset = new->offset;
-
-	while (*p != NULL) {
-		u64 end;
-
-		parent = *p;
-		cache = rb_entry(parent, struct dso_cache, rb_node);
-		end = cache->offset + DSO__DATA_CACHE_SIZE;
-
-		if (offset < cache->offset)
-			p = &(*p)->rb_left;
-		else if (offset >= end)
-			p = &(*p)->rb_right;
-	}
-
-	rb_link_node(&new->rb_node, parent, p);
-	rb_insert_color(&new->rb_node, root);
-}
-
-static ssize_t
-dso_cache__memcpy(struct dso_cache *cache, u64 offset,
-		  u8 *data, u64 size)
-{
-	u64 cache_offset = offset - cache->offset;
-	u64 cache_size   = min(cache->size - cache_offset, size);
-
-	memcpy(data, cache->data + cache_offset, cache_size);
-	return cache_size;
-}
-
-static ssize_t
-dso_cache__read(struct dso *dso, struct machine *machine,
-		 u64 offset, u8 *data, ssize_t size)
-{
-	struct dso_cache *cache;
-	ssize_t ret;
+	struct stat st;
 	int fd;
+	char *m;
+
+	if (dso->data_mmap)
+		goto out;
 
 	fd = dso__data_fd(dso, machine);
 	if (fd < 0)
-		return -1;
-
-	do {
-		u64 cache_offset;
-
-		ret = -ENOMEM;
-
-		cache = zalloc(sizeof(*cache) + DSO__DATA_CACHE_SIZE);
-		if (!cache)
-			break;
-
-		cache_offset = offset & DSO__DATA_CACHE_MASK;
-		ret = -EINVAL;
-
-		if (-1 == lseek(fd, cache_offset, SEEK_SET))
-			break;
+		return fd;
 
-		ret = read(fd, cache->data, DSO__DATA_CACHE_SIZE);
-		if (ret <= 0)
-			break;
-
-		cache->offset = cache_offset;
-		cache->size   = ret;
-		dso_cache__insert(&dso->cache, cache);
-
-		ret = dso_cache__memcpy(cache, offset, data, size);
-
-	} while (0);
+	if (fstat(fd, &st)) {
+		pr_err("dso mmap failed, fstat: %s\n", strerror(errno));
+		return -1;
+	}
 
-	if (ret <= 0)
-		free(cache);
+	dso->data_size = st.st_size;
 
-	return ret;
-}
+	m = mmap(0, PERF_ALIGN(dso->data_size, page_size),
+		 PROT_READ, MAP_SHARED, fd, 0);
+	if (m == MAP_FAILED) {
+		pr_err("dso mmap failed, mmap: %s\n", strerror(errno));
+		return -1;
+	}
 
-static ssize_t dso_cache_read(struct dso *dso, struct machine *machine,
-			      u64 offset, u8 *data, ssize_t size)
-{
-	struct dso_cache *cache;
+	dso->data_mmap = m;
 
-	cache = dso_cache__find(&dso->cache, offset);
-	if (cache)
-		return dso_cache__memcpy(cache, offset, data, size);
-	else
-		return dso_cache__read(dso, machine, offset, data, size);
+out:
+	*ptr = dso->data_mmap;
+	return 0;
 }
 
 ssize_t dso__data_read_offset(struct dso *dso, struct machine *machine,
 			      u64 offset, u8 *data, ssize_t size)
 {
-	ssize_t r = 0;
-	u8 *p = data;
+	ssize_t rsize = size;
+	char *m;
 
-	do {
-		ssize_t ret;
-
-		ret = dso_cache_read(dso, machine, offset, p, size);
-		if (ret < 0)
-			return ret;
-
-		/* Reached EOF, return what we have. */
-		if (!ret)
-			break;
+	if (dso__data_mmap(dso, machine, &m))
+		return -1;
 
-		BUG_ON(ret > size);
+	if (offset > dso->data_size)
+		return -1;
 
-		r      += ret;
-		p      += ret;
-		offset += ret;
-		size   -= ret;
+	/* unlikely, but anyway.. check overflow ;-) */
+	if (offset + size < offset)
+		return -1;
 
-	} while (size);
+	if (offset + size > dso->data_size)
+		rsize = dso->data_size - offset;
 
-	return r;
+	memcpy(data, m + offset, rsize);
+	return rsize;
 }
 
 ssize_t dso__data_read_addr(struct dso *dso, struct map *map,
@@ -478,7 +385,6 @@ struct dso *dso__new(const char *name)
 		dso__set_short_name(dso, dso->name, false);
 		for (i = 0; i < MAP__NR_TYPES; ++i)
 			dso->symbols[i] = dso->symbol_names[i] = RB_ROOT;
-		dso->cache = RB_ROOT;
 		dso->symtab_type = DSO_BINARY_TYPE__NOT_FOUND;
 		dso->binary_type = DSO_BINARY_TYPE__NOT_FOUND;
 		dso->loaded = 0;
@@ -513,7 +419,6 @@ void dso__delete(struct dso *dso)
 	}
 
 	dso__data_close(dso);
-	dso_cache__free(&dso->cache);
 	dso__free_a2l(dso);
 	zfree(&dso->symsrc_filename);
 	free(dso);
diff --git a/tools/perf/util/dso.h b/tools/perf/util/dso.h
index 6e48cdc..fe4e4aa 100644
--- a/tools/perf/util/dso.h
+++ b/tools/perf/util/dso.h
@@ -62,21 +62,10 @@ enum dso_swap_type {
 	____r;						\
 })
 
-#define DSO__DATA_CACHE_SIZE 4096
-#define DSO__DATA_CACHE_MASK ~(DSO__DATA_CACHE_SIZE - 1)
-
-struct dso_cache {
-	struct rb_node	rb_node;
-	u64 offset;
-	u64 size;
-	char data[0];
-};
-
 struct dso {
 	struct list_head node;
 	struct rb_root	 symbols[MAP__NR_TYPES];
 	struct rb_root	 symbol_names[MAP__NR_TYPES];
-	struct rb_root	 cache;
 	void		 *a2l;
 	char		 *symsrc_filename;
 	unsigned int	 a2l_fails;
@@ -100,6 +89,8 @@ struct dso {
 	u16		 long_name_len;
 	u16		 short_name_len;
 	int		 data_fd;
+	size_t		 data_size;
+	char		 *data_mmap;
 	char		 name[0];
 };
 
-- 
1.8.3.1


^ permalink raw reply related	[flat|nested] 24+ messages in thread

* Re: [PATCH 0/3] perf tools: Speedup DWARF unwind
  2014-04-17 17:39 [PATCH 0/3] perf tools: Speedup DWARF unwind Jiri Olsa
                   ` (2 preceding siblings ...)
  2014-04-17 17:39 ` [PATCH 3/3] perf tools: Replace dso data cache with mapped data Jiri Olsa
@ 2014-04-18  7:51 ` Ingo Molnar
  2014-04-18  7:55   ` Ingo Molnar
  2014-04-23 20:16 ` Jiri Olsa
  2014-04-25 13:08 ` Jiri Olsa
  5 siblings, 1 reply; 24+ messages in thread
From: Ingo Molnar @ 2014-04-18  7:51 UTC (permalink / raw)
  To: Jiri Olsa
  Cc: linux-kernel, Corey Ashford, David Ahern, Frederic Weisbecker,
	Namhyung Kim, Paul Mackerras, Peter Zijlstra,
	Arnaldo Carvalho de Melo, Jean Pihet


* Jiri Olsa <jolsa@redhat.com> wrote:

> hi,
> trying to speedup DWARF unwind report code by factoring
> related code:
>   - caching sample's registers access
>   - keep dso data file descriptor open for the
>     life of the dso object
>   - replace dso cache code by mapping dso data file
>     directly for the life of the dso object
> 
> The speedup is mainly for libunwind unwind. The libdw will benefit
> mainly from cached registers access, because it handles dso data
> accesses by itself.. and anyway it's still faster ;-).

Just curious: do you have any numbers about how much faster it got in 
practice?

Thanks,

	Ingo

^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: [PATCH 0/3] perf tools: Speedup DWARF unwind
  2014-04-18  7:51 ` [PATCH 0/3] perf tools: Speedup DWARF unwind Ingo Molnar
@ 2014-04-18  7:55   ` Ingo Molnar
  2014-04-18  9:35     ` Jiri Olsa
  0 siblings, 1 reply; 24+ messages in thread
From: Ingo Molnar @ 2014-04-18  7:55 UTC (permalink / raw)
  To: Jiri Olsa
  Cc: linux-kernel, Corey Ashford, David Ahern, Frederic Weisbecker,
	Namhyung Kim, Paul Mackerras, Peter Zijlstra,
	Arnaldo Carvalho de Melo, Jean Pihet


* Ingo Molnar <mingo@kernel.org> wrote:

> 
> * Jiri Olsa <jolsa@redhat.com> wrote:
> 
> > hi,
> > trying to speedup DWARF unwind report code by factoring
> > related code:
> >   - caching sample's registers access
> >   - keep dso data file descriptor open for the
> >     life of the dso object
> >   - replace dso cache code by mapping dso data file
> >     directly for the life of the dso object
> > 
> > The speedup is mainly for libunwind unwind. The libdw will benefit
> > mainly from cached registers access, because it handles dso data
> > accesses by itself.. and anyway it's still faster ;-).
> 
> Just curious: do you have any numbers about how much faster it got in 
> practice?

Oh, the numbers are all in the changelogs, never mind!

So in your test workload it went from 54.6 seconds to 36.7, a 48% 
speedup :-)

Thanks,

	Ingo

^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: [PATCH 0/3] perf tools: Speedup DWARF unwind
  2014-04-18  7:55   ` Ingo Molnar
@ 2014-04-18  9:35     ` Jiri Olsa
  0 siblings, 0 replies; 24+ messages in thread
From: Jiri Olsa @ 2014-04-18  9:35 UTC (permalink / raw)
  To: Ingo Molnar
  Cc: linux-kernel, Corey Ashford, David Ahern, Frederic Weisbecker,
	Namhyung Kim, Paul Mackerras, Peter Zijlstra,
	Arnaldo Carvalho de Melo, Jean Pihet

On Fri, Apr 18, 2014 at 09:55:25AM +0200, Ingo Molnar wrote:
> 
> * Ingo Molnar <mingo@kernel.org> wrote:
> 
> > 
> > * Jiri Olsa <jolsa@redhat.com> wrote:
> > 
> > > hi,
> > > trying to speedup DWARF unwind report code by factoring
> > > related code:
> > >   - caching sample's registers access
> > >   - keep dso data file descriptor open for the
> > >     life of the dso object
> > >   - replace dso cache code by mapping dso data file
> > >     directly for the life of the dso object
> > > 
> > > The speedup is mainly for libunwind unwind. The libdw will benefit
> > > mainly from cached registers access, because it handles dso data
> > > accesses by itself.. and anyway it's still faster ;-).
> > 
> > Just curious: do you have any numbers about how much faster it got in 
> > practice?
> 
> Oh, the numbers are all in the changelogs, never mind!
> 
> So in your test workload it went from 54.6 seconds to 36.7, a 48% 
> speedup :-)
> 

yep, I should have put it in here as well.. also the current
libdw unwind time on this workload is 26 seconds.. 10 more
seconds to go ;-)

jirka

^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: [PATCH 0/3] perf tools: Speedup DWARF unwind
  2014-04-17 17:39 [PATCH 0/3] perf tools: Speedup DWARF unwind Jiri Olsa
                   ` (3 preceding siblings ...)
  2014-04-18  7:51 ` [PATCH 0/3] perf tools: Speedup DWARF unwind Ingo Molnar
@ 2014-04-23 20:16 ` Jiri Olsa
  2014-04-25 13:08 ` Jiri Olsa
  5 siblings, 0 replies; 24+ messages in thread
From: Jiri Olsa @ 2014-04-23 20:16 UTC (permalink / raw)
  To: linux-kernel
  Cc: Corey Ashford, David Ahern, Frederic Weisbecker, Ingo Molnar,
	Namhyung Kim, Paul Mackerras, Peter Zijlstra,
	Arnaldo Carvalho de Melo, Jean Pihet

On Thu, Apr 17, 2014 at 07:39:09PM +0200, Jiri Olsa wrote:
> hi,
> trying to speedup DWARF unwind report code by factoring
> related code:
>   - caching sample's registers access
>   - keep dso data file descriptor open for the
>     life of the dso object
>   - replace dso cache code by mapping dso data file
>     directly for the life of the dso object
> 
> The speedup is mainly for libunwind unwind. The libdw will benefit
> mainly from cached registers access, because it handles dso data
> accesses by itself.. and anyway it's still faster ;-).
> 
> Also reachable in here:
>   git://git.kernel.org/pub/scm/linux/kernel/git/jolsa/perf.git
>   perf/core_unwind_speedup

rebased to latest tip perf/core, review appreciated ;-)

jirka

^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: [PATCH 0/3] perf tools: Speedup DWARF unwind
  2014-04-17 17:39 [PATCH 0/3] perf tools: Speedup DWARF unwind Jiri Olsa
                   ` (4 preceding siblings ...)
  2014-04-23 20:16 ` Jiri Olsa
@ 2014-04-25 13:08 ` Jiri Olsa
  5 siblings, 0 replies; 24+ messages in thread
From: Jiri Olsa @ 2014-04-25 13:08 UTC (permalink / raw)
  To: linux-kernel
  Cc: Corey Ashford, David Ahern, Frederic Weisbecker, Ingo Molnar,
	Namhyung Kim, Paul Mackerras, Peter Zijlstra,
	Arnaldo Carvalho de Melo, Jean Pihet

ping, any feedback?

thanks,
jirka


On Thu, Apr 17, 2014 at 07:39:09PM +0200, Jiri Olsa wrote:
> hi,
> trying to speedup DWARF unwind report code by factoring
> related code:
>   - caching sample's registers access
>   - keep dso data file descriptor open for the
>     life of the dso object
>   - replace dso cache code by mapping dso data file
>     directly for the life of the dso object
> 
> The speedup is mainly for libunwind unwind. The libdw will benefit
> mainly from cached registers access, because it handles dso data
> accesses by itself.. and anyway it's still faster ;-).
> 
> Also reachable in here:
>   git://git.kernel.org/pub/scm/linux/kernel/git/jolsa/perf.git
>   perf/core_unwind_speedup
> 
> thanks,
> jirka
> 
> Cc: Corey Ashford <cjashfor@linux.vnet.ibm.com>
> Cc: David Ahern <dsahern@gmail.com>
> Cc: Frederic Weisbecker <fweisbec@gmail.com>
> Cc: Ingo Molnar <mingo@kernel.org>
> Cc: Namhyung Kim <namhyung@kernel.org>
> Cc: Paul Mackerras <paulus@samba.org>
> Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
> Cc: Arnaldo Carvalho de Melo <acme@ghostprotocols.net>
> Cc: Jean Pihet <jean.pihet@linaro.org>
> Signed-off-by: Jiri Olsa <jolsa@redhat.com>
> ---
> Jiri Olsa (3):
>       perf tools: Cache register accesses for unwind processing
>       perf tools: Cache dso data file descriptor
>       perf tools: Replace dso data cache with mapped data
> 
>  tools/perf/tests/dso-data.c        |   7 ++++
>  tools/perf/util/dso.c              | 200 +++++++++++++++++++++++++++---------------------------------------------------------------------
>  tools/perf/util/dso.h              |  14 ++-----
>  tools/perf/util/event.h            |   5 +++
>  tools/perf/util/perf_regs.c        |  10 ++++-
>  tools/perf/util/perf_regs.h        |   4 +-
>  tools/perf/util/unwind-libunwind.c |   2 -
>  7 files changed, 83 insertions(+), 159 deletions(-)

^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: [PATCH 1/3] perf tools: Cache register accesses for unwind processing
  2014-04-17 17:39 ` [PATCH 1/3] perf tools: Cache register accesses for unwind processing Jiri Olsa
@ 2014-04-27 14:29   ` Namhyung Kim
  2014-04-28  9:48     ` Jiri Olsa
  2014-04-28 10:39   ` Christian Borntraeger
  1 sibling, 1 reply; 24+ messages in thread
From: Namhyung Kim @ 2014-04-27 14:29 UTC (permalink / raw)
  To: Jiri Olsa
  Cc: linux-kernel, Corey Ashford, David Ahern, Frederic Weisbecker,
	Ingo Molnar, Paul Mackerras, Peter Zijlstra,
	Arnaldo Carvalho de Melo, Jean Pihet

Hi Jiri,

2014-04-17 (목), 19:39 +0200, Jiri Olsa:
> Caching registers value into an array. Got about 4% speed up
> of perf_reg_value function for report command processing
> dwarf unwind stacks.

I'm not familiar with the code base, so probably silly questions:  Where
does the speed up come from?  IOW I don't know what's the difference
between the regs->regs and regs->cached_regs.  And does the cached_regs
contain correct values of registers for each frame?

Thanks,
Namhyung

> 
> Output from report over 1.5 GB data with DWARF unwind stacks:
> (TODO fix perf diff)
> 
>   current code:
>    6.81%  perf.old  perf.old                   [.] perf_reg_value
> 
>   change:
>    2.24%  perf      perf                       [.] perf_reg_value
> 
> And little bit of speed up:
> 
>  Performance counter stats for './perf.old report -i perf-test.data --stdio':
> 
>    134,664,011,577      cycles:u                  #    2.472 GHz
>    189,677,227,475      instructions:u            #    1.41  insns per cycle
>       54465.096050      task-clock (msec)         #    0.998 CPUs utilized
> 
>       54.598339009 seconds time elapsed
> 
>  Performance counter stats for './perf report -i perf-test.data --stdio':
> 
>    124,478,681,672      cycles:u                  #    2.466 GHz
>    168,998,379,866      instructions:u            #    1.36  insns per cycle
>       50487.110482      task-clock (msec)         #    0.997 CPUs utilized
> 
>       50.635824229 seconds time elapsed
> 
> Cc: Corey Ashford <cjashfor@linux.vnet.ibm.com>
> Cc: David Ahern <dsahern@gmail.com>
> Cc: Frederic Weisbecker <fweisbec@gmail.com>
> Cc: Ingo Molnar <mingo@kernel.org>
> Cc: Namhyung Kim <namhyung@kernel.org>
> Cc: Paul Mackerras <paulus@samba.org>
> Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
> Cc: Arnaldo Carvalho de Melo <acme@ghostprotocols.net>
> Cc: Jean Pihet <jean.pihet@linaro.org>
> Signed-off-by: Jiri Olsa <jolsa@redhat.com>
> ---
>  tools/perf/util/event.h     |  5 +++++
>  tools/perf/util/perf_regs.c | 10 +++++++++-
>  tools/perf/util/perf_regs.h |  4 +++-
>  3 files changed, 17 insertions(+), 2 deletions(-)
> 
> diff --git a/tools/perf/util/event.h b/tools/perf/util/event.h
> index 38457d4..970d4eb 100644
> --- a/tools/perf/util/event.h
> +++ b/tools/perf/util/event.h
> @@ -7,6 +7,7 @@
>  #include "../perf.h"
>  #include "map.h"
>  #include "build-id.h"
> +#include "perf_regs.h"
>  
>  struct mmap_event {
>  	struct perf_event_header header;
> @@ -87,6 +88,10 @@ struct regs_dump {
>  	u64 abi;
>  	u64 mask;
>  	u64 *regs;
> +
> +	/* Cached values/mask filled by first register access. */
> +	u64 cache_regs[PERF_REGS_MAX];
> +	u64 cache_mask;
>  };
>  
>  struct stack_dump {
> diff --git a/tools/perf/util/perf_regs.c b/tools/perf/util/perf_regs.c
> index a3539ef..43168fb 100644
> --- a/tools/perf/util/perf_regs.c
> +++ b/tools/perf/util/perf_regs.c
> @@ -1,11 +1,15 @@
>  #include <errno.h>
>  #include "perf_regs.h"
> +#include "event.h"
>  
>  int perf_reg_value(u64 *valp, struct regs_dump *regs, int id)
>  {
>  	int i, idx = 0;
>  	u64 mask = regs->mask;
>  
> +	if (regs->cache_mask & (1 << id))
> +		goto out;
> +
>  	if (!(mask & (1 << id)))
>  		return -EINVAL;
>  
> @@ -14,6 +18,10 @@ int perf_reg_value(u64 *valp, struct regs_dump *regs, int id)
>  			idx++;
>  	}
>  
> -	*valp = regs->regs[idx];
> +	regs->cache_mask |= (1 << id);
> +	regs->cache_regs[id] = regs->regs[idx];
> +
> +out:
> +	*valp = regs->cache_regs[id];
>  	return 0;
>  }
> diff --git a/tools/perf/util/perf_regs.h b/tools/perf/util/perf_regs.h
> index d6e8b6a..80d8ab1 100644
> --- a/tools/perf/util/perf_regs.h
> +++ b/tools/perf/util/perf_regs.h
> @@ -2,15 +2,17 @@
>  #define __PERF_REGS_H
>  
>  #include "types.h"
> -#include "event.h"
>  
>  #ifdef HAVE_PERF_REGS_SUPPORT
>  #include <perf_regs.h>
>  
> +struct regs_dump;
> +
>  int perf_reg_value(u64 *valp, struct regs_dump *regs, int id);
>  
>  #else
>  #define PERF_REGS_MASK	0
> +#define PERF_REGS_MAX	0
>  
>  static inline const char *perf_reg_name(int id __maybe_unused)
>  {




^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: [PATCH 2/3] perf tools: Cache dso data file descriptor
  2014-04-17 17:39 ` [PATCH 2/3] perf tools: Cache dso data file descriptor Jiri Olsa
@ 2014-04-27 14:36   ` Namhyung Kim
  2014-04-28 10:01     ` Jiri Olsa
  0 siblings, 1 reply; 24+ messages in thread
From: Namhyung Kim @ 2014-04-27 14:36 UTC (permalink / raw)
  To: Jiri Olsa
  Cc: linux-kernel, Corey Ashford, David Ahern, Frederic Weisbecker,
	Ingo Molnar, Paul Mackerras, Peter Zijlstra,
	Arnaldo Carvalho de Melo, Jean Pihet

2014-04-17 (목), 19:39 +0200, Jiri Olsa:
> Keeping the data file description open for the whole life
> of the dso object.

I suspect there might be an issue for reporting very large data file
with this approach - like open file limit?


[SNIP]
> @@ -168,8 +174,8 @@ int dso__data_fd(struct dso *dso, struct machine *machine)
>  	};
>  	int i = 0;
>  
> -	if (dso->binary_type != DSO_BINARY_TYPE__NOT_FOUND)
> -		return open_dso(dso, machine);

Why did you remove this line?

Thanks,
Namhyung


> +	if (dso->data_fd >= 0)
> +		return dso->data_fd;
>  
>  	do {
>  		int fd;
> @@ -178,7 +184,7 @@ int dso__data_fd(struct dso *dso, struct machine *machine)
>  
>  		fd = open_dso(dso, machine);
>  		if (fd >= 0)
> -			return fd;
> +			return dso->data_fd = fd;
>  
>  	} while (dso->binary_type != DSO_BINARY_TYPE__NOT_FOUND);



^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: [PATCH 1/3] perf tools: Cache register accesses for unwind processing
  2014-04-27 14:29   ` Namhyung Kim
@ 2014-04-28  9:48     ` Jiri Olsa
  2014-04-28 13:02       ` Namhyung Kim
  0 siblings, 1 reply; 24+ messages in thread
From: Jiri Olsa @ 2014-04-28  9:48 UTC (permalink / raw)
  To: Namhyung Kim
  Cc: linux-kernel, Corey Ashford, David Ahern, Frederic Weisbecker,
	Ingo Molnar, Paul Mackerras, Peter Zijlstra,
	Arnaldo Carvalho de Melo, Jean Pihet

On Sun, Apr 27, 2014 at 11:29:21PM +0900, Namhyung Kim wrote:
> Hi Jiri,
> 
> 2014-04-17 (목), 19:39 +0200, Jiri Olsa:
> > Caching registers value into an array. Got about 4% speed up
> > of perf_reg_value function for report command processing
> > dwarf unwind stacks.
> 
> I'm not familiar with the code base, so probably silly questions:  Where
> does the speed up come from?  IOW I don't know what's the difference
> between the regs->regs and regs->cached_regs.  And does the cached_regs
> contain correct values of registers for each frame?

the current way register's value is accessed is to get its
index in the sample's regs array.. based on register's id
and the registers mask

so each time you want register value you traverse the registers
mask and count reg's index for the sample regs array

this patch does this only once for each register (at the time it's
first accessed) and cache its value in the array (cache_regs). The
cache_mask is used to identify which regs are already cached.

jirka

^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: [PATCH 2/3] perf tools: Cache dso data file descriptor
  2014-04-27 14:36   ` Namhyung Kim
@ 2014-04-28 10:01     ` Jiri Olsa
  2014-04-28 13:16       ` Namhyung Kim
  2014-05-07 19:01       ` Ingo Molnar
  0 siblings, 2 replies; 24+ messages in thread
From: Jiri Olsa @ 2014-04-28 10:01 UTC (permalink / raw)
  To: Namhyung Kim
  Cc: linux-kernel, Corey Ashford, David Ahern, Frederic Weisbecker,
	Ingo Molnar, Paul Mackerras, Peter Zijlstra,
	Arnaldo Carvalho de Melo, Jean Pihet

On Sun, Apr 27, 2014 at 11:36:35PM +0900, Namhyung Kim wrote:
> 2014-04-17 (목), 19:39 +0200, Jiri Olsa:
> > Keeping the data file description open for the whole life
> > of the dso object.
> 
> I suspect there might be an issue for reporting very large data file
> with this approach - like open file limit?

I've got as high as ~200 openned file descriptors for
~2GB data of system wide monitoring

but right that could be an issue.. I wonder we could
workaround this somehow, because the speed up is quite
noticable

how about we monitor number of openned dso file descriptor
and once we cross this we close some portion of them

or something along those lines ;-)

> 
> 
> [SNIP]
> > @@ -168,8 +174,8 @@ int dso__data_fd(struct dso *dso, struct machine *machine)
> >  	};
> >  	int i = 0;
> >  
> > -	if (dso->binary_type != DSO_BINARY_TYPE__NOT_FOUND)
> > -		return open_dso(dso, machine);
> 
> Why did you remove this line?

that code reopens already openned (and closed) file.. 
instead I return (not closed) descriptor from previous open

thanks,
jirka

^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: [PATCH 1/3] perf tools: Cache register accesses for unwind processing
  2014-04-17 17:39 ` [PATCH 1/3] perf tools: Cache register accesses for unwind processing Jiri Olsa
  2014-04-27 14:29   ` Namhyung Kim
@ 2014-04-28 10:39   ` Christian Borntraeger
  2014-04-28 11:00     ` Jiri Olsa
  1 sibling, 1 reply; 24+ messages in thread
From: Christian Borntraeger @ 2014-04-28 10:39 UTC (permalink / raw)
  To: Jiri Olsa
  Cc: linux-kernel, Corey Ashford, David Ahern, Frederic Weisbecker,
	Ingo Molnar, Namhyung Kim, Paul Mackerras, Peter Zijlstra,
	Arnaldo Carvalho de Melo, Jean Pihet

On 17/04/14 19:39, Jiri Olsa wrote:
> Caching registers value into an array. Got about 4% speed up
> of perf_reg_value function for report command processing
> dwarf unwind stacks.
> 
> Output from report over 1.5 GB data with DWARF unwind stacks:
> (TODO fix perf diff)
> 
>   current code:
>    6.81%  perf.old  perf.old                   [.] perf_reg_value
> 
>   change:
>    2.24%  perf      perf                       [.] perf_reg_value
> 
> And little bit of speed up:
> 
>  Performance counter stats for './perf.old report -i perf-test.data --stdio':
> 
>    134,664,011,577      cycles:u                  #    2.472 GHz
>    189,677,227,475      instructions:u            #    1.41  insns per cycle
>       54465.096050      task-clock (msec)         #    0.998 CPUs utilized
> 
>       54.598339009 seconds time elapsed
> 
>  Performance counter stats for './perf report -i perf-test.data --stdio':
> 
>    124,478,681,672      cycles:u                  #    2.466 GHz
>    168,998,379,866      instructions:u            #    1.36  insns per cycle
>       50487.110482      task-clock (msec)         #    0.997 CPUs utilized
> 
>       50.635824229 seconds time elapsed
> 
> Cc: Corey Ashford <cjashfor@linux.vnet.ibm.com>
> Cc: David Ahern <dsahern@gmail.com>
> Cc: Frederic Weisbecker <fweisbec@gmail.com>
> Cc: Ingo Molnar <mingo@kernel.org>
> Cc: Namhyung Kim <namhyung@kernel.org>
> Cc: Paul Mackerras <paulus@samba.org>
> Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
> Cc: Arnaldo Carvalho de Melo <acme@ghostprotocols.net>
> Cc: Jean Pihet <jean.pihet@linaro.org>
> Signed-off-by: Jiri Olsa <jolsa@redhat.com>
> ---
>  tools/perf/util/event.h     |  5 +++++
>  tools/perf/util/perf_regs.c | 10 +++++++++-
>  tools/perf/util/perf_regs.h |  4 +++-
>  3 files changed, 17 insertions(+), 2 deletions(-)
> 
> diff --git a/tools/perf/util/event.h b/tools/perf/util/event.h
> index 38457d4..970d4eb 100644
> --- a/tools/perf/util/event.h
> +++ b/tools/perf/util/event.h
> @@ -7,6 +7,7 @@
>  #include "../perf.h"
>  #include "map.h"
>  #include "build-id.h"
> +#include "perf_regs.h"
> 
>  struct mmap_event {
>  	struct perf_event_header header;
> @@ -87,6 +88,10 @@ struct regs_dump {
>  	u64 abi;
>  	u64 mask;
>  	u64 *regs;
> +
> +	/* Cached values/mask filled by first register access. */
> +	u64 cache_regs[PERF_REGS_MAX];
> +	u64 cache_mask;
>  };
> 
>  struct stack_dump {
> diff --git a/tools/perf/util/perf_regs.c b/tools/perf/util/perf_regs.c
> index a3539ef..43168fb 100644
> --- a/tools/perf/util/perf_regs.c
> +++ b/tools/perf/util/perf_regs.c
> @@ -1,11 +1,15 @@
>  #include <errno.h>
>  #include "perf_regs.h"
> +#include "event.h"
> 
>  int perf_reg_value(u64 *valp, struct regs_dump *regs, int id)
>  {
>  	int i, idx = 0;
>  	u64 mask = regs->mask;
> 
> +	if (regs->cache_mask & (1 << id))
> +		goto out;
> +
>  	if (!(mask & (1 << id)))
>  		return -EINVAL;
> 
> @@ -14,6 +18,10 @@ int perf_reg_value(u64 *valp, struct regs_dump *regs, int id)
>  			idx++;
>  	}
> 
> -	*valp = regs->regs[idx];
> +	regs->cache_mask |= (1 << id);
> +	regs->cache_regs[id] = regs->regs[idx];
> +
> +out:
> +	*valp = regs->cache_regs[id];
>  	return 0;
>  }
> diff --git a/tools/perf/util/perf_regs.h b/tools/perf/util/perf_regs.h
> index d6e8b6a..80d8ab1 100644
> --- a/tools/perf/util/perf_regs.h
> +++ b/tools/perf/util/perf_regs.h
> @@ -2,15 +2,17 @@
>  #define __PERF_REGS_H
> 
>  #include "types.h"
> -#include "event.h"
> 
>  #ifdef HAVE_PERF_REGS_SUPPORT
>  #include <perf_regs.h>
> 
> +struct regs_dump;
> +
>  int perf_reg_value(u64 *valp, struct regs_dump *regs, int id);
> 
>  #else
>  #define PERF_REGS_MASK	0
> +#define PERF_REGS_MAX	0
> 
>  static inline const char *perf_reg_name(int id __maybe_unused)
>  {
> 

Want such a speedup, 
but it does not compile on my s390x system:

  CC       util/top.o
In file included from util/event.h:10:0,
                 from util/event.c:2:
util/perf_regs.h:24:6: error: ‘struct regs_dump’ declared inside parameter list [-Werror]
util/perf_regs.h:24:6: error: its scope is only this definition or declaration, which is probably not what you want [-Werror]
In file included from util/event.h:10:0,
                 from util/callchain.h:7,
                 from util/hist.h:6,
                 from util/evsel.h:11,
                 from util/evsel.c:18:
util/perf_regs.h:24:6: error: ‘struct regs_dump’ declared inside parameter list [-Werror]
util/perf_regs.h:24:6: error: its scope is only this definition or declaration, which is probably not what you want [-Werror]
In file included from /home/cborntra/REPOS/linux/tools/perf/util/event.h:10:0,
                 from /home/cborntra/REPOS/linux/tools/perf/util/debug.h:6,
                 from util/cpumap.h:8,
                 from util/top.c:9:
/home/cborntra/REPOS/linux/tools/perf/util/perf_regs.h:24:6: error: ‘struct regs_dump’ declared inside parameter list [-Werror]
/home/cborntra/REPOS/linux/tools/perf/util/perf_regs.h:24:6: error: its scope is only this definition or declaration, which is probably not what you want [-Werror]
In file included from /home/cborntra/REPOS/linux/tools/perf/util/event.h:10:0,
                 from /home/cborntra/REPOS/linux/tools/perf/util/debug.h:6,
                 from util/cpumap.h:8,
                 from util/evlist.c:12:
/home/cborntra/REPOS/linux/tools/perf/util/perf_regs.h:24:6: error: ‘struct regs_dump’ declared inside parameter list [-Werror]
/home/cborntra/REPOS/linux/tools/perf/util/perf_regs.h:24:6: error: its scope is only this definition or declaration, which is probably not what you want [-Werror]
  CC       util/usage.o
  CC       util/wrapper.o
  CC       util/sigchain.o
In file included from util/event.h:10:0,
                 from util/header.h:8,
                 from util/parse-options.c:4:
util/perf_regs.h:24:6: error: ‘struct regs_dump’ declared inside parameter list [-Werror]
util/perf_regs.h:24:6: error: its scope is only this definition or declaration, which is probably not what you want [-Werror]
cc1: all warnings being treated as errors
make[1]: *** [util/parse-options.o] Error 1
make[1]: *** Waiting for unfinished jobs....
cc1: all warnings being treated as errors
cc1: all warnings being treated as errors
make[1]: *** [util/event.o] Error 1
make[1]: *** [util/evsel.o] Error 1
In file included from util/event.h:10:0,
                 from util/debug.h:6,
                 from util/usage.c:10:
util/perf_regs.h:24:6: error: ‘struct regs_dump’ declared inside parameter list [-Werror]
util/perf_regs.h:24:6: error: its scope is only this definition or declaration, which is probably not what you want [-Werror]
cc1: all warnings being treated as errors
make[1]: *** [util/usage.o] Error 1
cc1: all warnings being treated as errors
make[1]: *** [util/evlist.o] Error 1
cc1: all warnings being treated as errors
make[1]: *** [util/top.o] Error 1



^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: [PATCH 1/3] perf tools: Cache register accesses for unwind processing
  2014-04-28 10:39   ` Christian Borntraeger
@ 2014-04-28 11:00     ` Jiri Olsa
  0 siblings, 0 replies; 24+ messages in thread
From: Jiri Olsa @ 2014-04-28 11:00 UTC (permalink / raw)
  To: Christian Borntraeger
  Cc: linux-kernel, Corey Ashford, David Ahern, Frederic Weisbecker,
	Ingo Molnar, Namhyung Kim, Paul Mackerras, Peter Zijlstra,
	Arnaldo Carvalho de Melo, Jean Pihet

On Mon, Apr 28, 2014 at 12:39:24PM +0200, Christian Borntraeger wrote:

SNIP

> >  {
> > 
> 
> Want such a speedup, 
> but it does not compile on my s390x system:

the speed up is for DWARF unwind report, which is not yet
supported on s390x perf.. still it should compile ;-)

I'll try to get some s390x and make a fix

thanks,
jirka

^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: [PATCH 1/3] perf tools: Cache register accesses for unwind processing
  2014-04-28  9:48     ` Jiri Olsa
@ 2014-04-28 13:02       ` Namhyung Kim
  2014-04-28 13:24         ` Jiri Olsa
  0 siblings, 1 reply; 24+ messages in thread
From: Namhyung Kim @ 2014-04-28 13:02 UTC (permalink / raw)
  To: Jiri Olsa
  Cc: linux-kernel, Corey Ashford, David Ahern, Frederic Weisbecker,
	Ingo Molnar, Paul Mackerras, Peter Zijlstra,
	Arnaldo Carvalho de Melo, Jean Pihet

Hi Jiri,

2014-04-28 (월), 11:48 +0200, Jiri Olsa:
> On Sun, Apr 27, 2014 at 11:29:21PM +0900, Namhyung Kim wrote:
> > Hi Jiri,
> > 
> > 2014-04-17 (목), 19:39 +0200, Jiri Olsa:
> > > Caching registers value into an array. Got about 4% speed up
> > > of perf_reg_value function for report command processing
> > > dwarf unwind stacks.
> > 
> > I'm not familiar with the code base, so probably silly questions:  Where
> > does the speed up come from?  IOW I don't know what's the difference
> > between the regs->regs and regs->cached_regs.  And does the cached_regs
> > contain correct values of registers for each frame?
> 
> the current way register's value is accessed is to get its
> index in the sample's regs array.. based on register's id
> and the registers mask
> 
> so each time you want register value you traverse the registers
> mask and count reg's index for the sample regs array
> 
> this patch does this only once for each register (at the time it's
> first accessed) and cache its value in the array (cache_regs). The
> cache_mask is used to identify which regs are already cached.

That means it'll get the same value everytime it accesses a register in
frames in a sample?

Thanks,
Namhyung



^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: [PATCH 2/3] perf tools: Cache dso data file descriptor
  2014-04-28 10:01     ` Jiri Olsa
@ 2014-04-28 13:16       ` Namhyung Kim
  2014-04-28 13:34         ` Jiri Olsa
  2014-04-28 14:57         ` David Ahern
  2014-05-07 19:01       ` Ingo Molnar
  1 sibling, 2 replies; 24+ messages in thread
From: Namhyung Kim @ 2014-04-28 13:16 UTC (permalink / raw)
  To: Jiri Olsa
  Cc: linux-kernel, Corey Ashford, David Ahern, Frederic Weisbecker,
	Ingo Molnar, Paul Mackerras, Peter Zijlstra,
	Arnaldo Carvalho de Melo, Jean Pihet

2014-04-28 (월), 12:01 +0200, Jiri Olsa:
> On Sun, Apr 27, 2014 at 11:36:35PM +0900, Namhyung Kim wrote:
> > 2014-04-17 (목), 19:39 +0200, Jiri Olsa:
> > > Keeping the data file description open for the whole life
> > > of the dso object.
> > 
> > I suspect there might be an issue for reporting very large data file
> > with this approach - like open file limit?
> 
> I've got as high as ~200 openned file descriptors for
> ~2GB data of system wide monitoring
> 
> but right that could be an issue.. I wonder we could
> workaround this somehow, because the speed up is quite
> noticable
> 
> how about we monitor number of openned dso file descriptor
> and once we cross this we close some portion of them
> 
> or something along those lines ;-)

Yeah, we'll need some way to control those eventually.

> 
> > 
> > 
> > [SNIP]
> > > @@ -168,8 +174,8 @@ int dso__data_fd(struct dso *dso, struct machine *machine)
> > >  	};
> > >  	int i = 0;
> > >  
> > > -	if (dso->binary_type != DSO_BINARY_TYPE__NOT_FOUND)
> > > -		return open_dso(dso, machine);
> > 
> > Why did you remove this line?
> 
> that code reopens already openned (and closed) file.. 
> instead I return (not closed) descriptor from previous open

But it'll overwrite the dso->binary_type then.  What about this?

	if (dso->data_fd >= 0)
		return dso->data_fd;

	if (dso->binary_type != DSO_BINARY_TYPE__NOT_FOUND) {
		dso->data_fd = open_dso(dso, machine);
		return dso->data_fd;
	}


Thanks,
Namhyung





^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: [PATCH 1/3] perf tools: Cache register accesses for unwind processing
  2014-04-28 13:02       ` Namhyung Kim
@ 2014-04-28 13:24         ` Jiri Olsa
  2014-04-29  0:36           ` Namhyung Kim
  0 siblings, 1 reply; 24+ messages in thread
From: Jiri Olsa @ 2014-04-28 13:24 UTC (permalink / raw)
  To: Namhyung Kim
  Cc: linux-kernel, Corey Ashford, David Ahern, Frederic Weisbecker,
	Ingo Molnar, Paul Mackerras, Peter Zijlstra,
	Arnaldo Carvalho de Melo, Jean Pihet

On Mon, Apr 28, 2014 at 10:02:55PM +0900, Namhyung Kim wrote:
> Hi Jiri,
> 
> 2014-04-28 (월), 11:48 +0200, Jiri Olsa:
> > On Sun, Apr 27, 2014 at 11:29:21PM +0900, Namhyung Kim wrote:
> > > Hi Jiri,
> > > 
> > > 2014-04-17 (목), 19:39 +0200, Jiri Olsa:
> > > > Caching registers value into an array. Got about 4% speed up
> > > > of perf_reg_value function for report command processing
> > > > dwarf unwind stacks.
> > > 
> > > I'm not familiar with the code base, so probably silly questions:  Where
> > > does the speed up come from?  IOW I don't know what's the difference
> > > between the regs->regs and regs->cached_regs.  And does the cached_regs
> > > contain correct values of registers for each frame?
> > 
> > the current way register's value is accessed is to get its
> > index in the sample's regs array.. based on register's id
> > and the registers mask
> > 
> > so each time you want register value you traverse the registers
> > mask and count reg's index for the sample regs array
> > 
> > this patch does this only once for each register (at the time it's
> > first accessed) and cache its value in the array (cache_regs). The
> > cache_mask is used to identify which regs are already cached.
> 
> That means it'll get the same value everytime it accesses a register in
> frames in a sample?

right..

jirka

^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: [PATCH 2/3] perf tools: Cache dso data file descriptor
  2014-04-28 13:16       ` Namhyung Kim
@ 2014-04-28 13:34         ` Jiri Olsa
  2014-04-28 14:57         ` David Ahern
  1 sibling, 0 replies; 24+ messages in thread
From: Jiri Olsa @ 2014-04-28 13:34 UTC (permalink / raw)
  To: Namhyung Kim
  Cc: linux-kernel, Corey Ashford, David Ahern, Frederic Weisbecker,
	Ingo Molnar, Paul Mackerras, Peter Zijlstra,
	Arnaldo Carvalho de Melo, Jean Pihet

On Mon, Apr 28, 2014 at 10:16:34PM +0900, Namhyung Kim wrote:
> 2014-04-28 (월), 12:01 +0200, Jiri Olsa:
> > On Sun, Apr 27, 2014 at 11:36:35PM +0900, Namhyung Kim wrote:
> > > 2014-04-17 (목), 19:39 +0200, Jiri Olsa:
> > > > Keeping the data file description open for the whole life
> > > > of the dso object.
> > > 
> > > I suspect there might be an issue for reporting very large data file
> > > with this approach - like open file limit?
> > 
> > I've got as high as ~200 openned file descriptors for
> > ~2GB data of system wide monitoring
> > 
> > but right that could be an issue.. I wonder we could
> > workaround this somehow, because the speed up is quite
> > noticable
> > 
> > how about we monitor number of openned dso file descriptor
> > and once we cross this we close some portion of them
> > 
> > or something along those lines ;-)
> 
> Yeah, we'll need some way to control those eventually.
> 
> > 
> > > 
> > > 
> > > [SNIP]
> > > > @@ -168,8 +174,8 @@ int dso__data_fd(struct dso *dso, struct machine *machine)
> > > >  	};
> > > >  	int i = 0;
> > > >  
> > > > -	if (dso->binary_type != DSO_BINARY_TYPE__NOT_FOUND)
> > > > -		return open_dso(dso, machine);
> > > 
> > > Why did you remove this line?
> > 
> > that code reopens already openned (and closed) file.. 
> > instead I return (not closed) descriptor from previous open
> 
> But it'll overwrite the dso->binary_type then.  What about this?
> 
> 	if (dso->data_fd >= 0)
> 		return dso->data_fd;
> 
> 	if (dso->binary_type != DSO_BINARY_TYPE__NOT_FOUND) {
> 		dso->data_fd = open_dso(dso, machine);
> 		return dso->data_fd;
> 	}

right, makes sense.. I'll add it with the control code for the
number of openned descriptors

thanks,
jirka

^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: [PATCH 2/3] perf tools: Cache dso data file descriptor
  2014-04-28 13:16       ` Namhyung Kim
  2014-04-28 13:34         ` Jiri Olsa
@ 2014-04-28 14:57         ` David Ahern
  2014-04-29  0:41           ` Namhyung Kim
  1 sibling, 1 reply; 24+ messages in thread
From: David Ahern @ 2014-04-28 14:57 UTC (permalink / raw)
  To: Namhyung Kim, Jiri Olsa
  Cc: linux-kernel, Corey Ashford, Frederic Weisbecker, Ingo Molnar,
	Paul Mackerras, Peter Zijlstra, Arnaldo Carvalho de Melo,
	Jean Pihet

On 4/28/14, 7:16 AM, Namhyung Kim wrote:
> 2014-04-28 (월), 12:01 +0200, Jiri Olsa:
>> On Sun, Apr 27, 2014 at 11:36:35PM +0900, Namhyung Kim wrote:
>>> 2014-04-17 (목), 19:39 +0200, Jiri Olsa:
>>>> Keeping the data file description open for the whole life
>>>> of the dso object.
>>>
>>> I suspect there might be an issue for reporting very large data file
>>> with this approach - like open file limit?
>>
>> I've got as high as ~200 openned file descriptors for
>> ~2GB data of system wide monitoring
>>
>> but right that could be an issue.. I wonder we could
>> workaround this somehow, because the speed up is quite
>> noticable
>>
>> how about we monitor number of openned dso file descriptor
>> and once we cross this we close some portion of them
>>
>> or something along those lines ;-)
>
> Yeah, we'll need some way to control those eventually.

Handle EMFILE failures. Find an "old" one and close it to let the new 
one succeed.

David


^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: [PATCH 1/3] perf tools: Cache register accesses for unwind processing
  2014-04-28 13:24         ` Jiri Olsa
@ 2014-04-29  0:36           ` Namhyung Kim
  2014-04-30 12:12             ` Jiri Olsa
  0 siblings, 1 reply; 24+ messages in thread
From: Namhyung Kim @ 2014-04-29  0:36 UTC (permalink / raw)
  To: Jiri Olsa
  Cc: linux-kernel, Corey Ashford, David Ahern, Frederic Weisbecker,
	Ingo Molnar, Paul Mackerras, Peter Zijlstra,
	Arnaldo Carvalho de Melo, Jean Pihet

On Mon, 28 Apr 2014 15:24:20 +0200, Jiri Olsa wrote:
> On Mon, Apr 28, 2014 at 10:02:55PM +0900, Namhyung Kim wrote:
>> Hi Jiri,
>> 
>> 2014-04-28 (월), 11:48 +0200, Jiri Olsa:
>> > On Sun, Apr 27, 2014 at 11:29:21PM +0900, Namhyung Kim wrote:
>> > > Hi Jiri,
>> > > 
>> > > 2014-04-17 (목), 19:39 +0200, Jiri Olsa:
>> > > > Caching registers value into an array. Got about 4% speed up
>> > > > of perf_reg_value function for report command processing
>> > > > dwarf unwind stacks.
>> > > 
>> > > I'm not familiar with the code base, so probably silly questions:  Where
>> > > does the speed up come from?  IOW I don't know what's the difference
>> > > between the regs->regs and regs->cached_regs.  And does the cached_regs
>> > > contain correct values of registers for each frame?
>> > 
>> > the current way register's value is accessed is to get its
>> > index in the sample's regs array.. based on register's id
>> > and the registers mask
>> > 
>> > so each time you want register value you traverse the registers
>> > mask and count reg's index for the sample regs array
>> > 
>> > this patch does this only once for each register (at the time it's
>> > first accessed) and cache its value in the array (cache_regs). The
>> > cache_mask is used to identify which regs are already cached.
>> 
>> That means it'll get the same value everytime it accesses a register in
>> frames in a sample?
>
> right..

Hmm.. I thought it'd be changed somehow as it unwinds frames.

Thanks,
Namhyung

^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: [PATCH 2/3] perf tools: Cache dso data file descriptor
  2014-04-28 14:57         ` David Ahern
@ 2014-04-29  0:41           ` Namhyung Kim
  0 siblings, 0 replies; 24+ messages in thread
From: Namhyung Kim @ 2014-04-29  0:41 UTC (permalink / raw)
  To: David Ahern
  Cc: Jiri Olsa, linux-kernel, Corey Ashford, Frederic Weisbecker,
	Ingo Molnar, Paul Mackerras, Peter Zijlstra,
	Arnaldo Carvalho de Melo, Jean Pihet

Hi David,

On Mon, 28 Apr 2014 08:57:49 -0600, David Ahern wrote:
> On 4/28/14, 7:16 AM, Namhyung Kim wrote:
>> 2014-04-28 (월), 12:01 +0200, Jiri Olsa:
>>> On Sun, Apr 27, 2014 at 11:36:35PM +0900, Namhyung Kim wrote:
>>>> 2014-04-17 (목), 19:39 +0200, Jiri Olsa:
>>>>> Keeping the data file description open for the whole life
>>>>> of the dso object.
>>>>
>>>> I suspect there might be an issue for reporting very large data file
>>>> with this approach - like open file limit?
>>>
>>> I've got as high as ~200 openned file descriptors for
>>> ~2GB data of system wide monitoring
>>>
>>> but right that could be an issue.. I wonder we could
>>> workaround this somehow, because the speed up is quite
>>> noticable
>>>
>>> how about we monitor number of openned dso file descriptor
>>> and once we cross this we close some portion of them
>>>
>>> or something along those lines ;-)
>>
>> Yeah, we'll need some way to control those eventually.
>
> Handle EMFILE failures. Find an "old" one and close it to let the new
> one succeed.

But it would make other open(), if any, fail anyway..  So I'd rather
limit the size of the dso cache to a reasonable size.

Thanks,
Namhyung

^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: [PATCH 1/3] perf tools: Cache register accesses for unwind processing
  2014-04-29  0:36           ` Namhyung Kim
@ 2014-04-30 12:12             ` Jiri Olsa
  0 siblings, 0 replies; 24+ messages in thread
From: Jiri Olsa @ 2014-04-30 12:12 UTC (permalink / raw)
  To: Namhyung Kim
  Cc: linux-kernel, Corey Ashford, David Ahern, Frederic Weisbecker,
	Ingo Molnar, Paul Mackerras, Peter Zijlstra,
	Arnaldo Carvalho de Melo, Jean Pihet

On Tue, Apr 29, 2014 at 09:36:19AM +0900, Namhyung Kim wrote:
> On Mon, 28 Apr 2014 15:24:20 +0200, Jiri Olsa wrote:
> > On Mon, Apr 28, 2014 at 10:02:55PM +0900, Namhyung Kim wrote:
> >> Hi Jiri,
> >> 
> >> 2014-04-28 (월), 11:48 +0200, Jiri Olsa:
> >> > On Sun, Apr 27, 2014 at 11:29:21PM +0900, Namhyung Kim wrote:
> >> > > Hi Jiri,
> >> > > 
> >> > > 2014-04-17 (목), 19:39 +0200, Jiri Olsa:
> >> > > > Caching registers value into an array. Got about 4% speed up
> >> > > > of perf_reg_value function for report command processing
> >> > > > dwarf unwind stacks.
> >> > > 
> >> > > I'm not familiar with the code base, so probably silly questions:  Where
> >> > > does the speed up come from?  IOW I don't know what's the difference
> >> > > between the regs->regs and regs->cached_regs.  And does the cached_regs
> >> > > contain correct values of registers for each frame?
> >> > 
> >> > the current way register's value is accessed is to get its
> >> > index in the sample's regs array.. based on register's id
> >> > and the registers mask
> >> > 
> >> > so each time you want register value you traverse the registers
> >> > mask and count reg's index for the sample regs array
> >> > 
> >> > this patch does this only once for each register (at the time it's
> >> > first accessed) and cache its value in the array (cache_regs). The
> >> > cache_mask is used to identify which regs are already cached.
> >> 
> >> That means it'll get the same value everytime it accesses a register in
> >> frames in a sample?
> >
> > right..
> 
> Hmm.. I thought it'd be changed somehow as it unwinds frames.

nope, it's just sample's user space registers values from the
time sample was taken

both libunwind and libdw unwinders keep the registers state
through the frames unwinding internally

jirka

^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: [PATCH 2/3] perf tools: Cache dso data file descriptor
  2014-04-28 10:01     ` Jiri Olsa
  2014-04-28 13:16       ` Namhyung Kim
@ 2014-05-07 19:01       ` Ingo Molnar
  1 sibling, 0 replies; 24+ messages in thread
From: Ingo Molnar @ 2014-05-07 19:01 UTC (permalink / raw)
  To: Jiri Olsa
  Cc: Namhyung Kim, linux-kernel, Corey Ashford, David Ahern,
	Frederic Weisbecker, Paul Mackerras, Peter Zijlstra,
	Arnaldo Carvalho de Melo, Jean Pihet


* Jiri Olsa <jolsa@redhat.com> wrote:

> On Sun, Apr 27, 2014 at 11:36:35PM +0900, Namhyung Kim wrote:
> > 2014-04-17 (목), 19:39 +0200, Jiri Olsa:
> > > Keeping the data file description open for the whole life
> > > of the dso object.
> > 
> > I suspect there might be an issue for reporting very large data file
> > with this approach - like open file limit?
> 
> I've got as high as ~200 openned file descriptors for
> ~2GB data of system wide monitoring

Note that 200 open file descriptors in themselves are not a 
scalability problem on Linux, as long as perf doesn't walk them 
linearly anywhere.

I think we are reasonably fast even with a million open files in a 
singe process, or so.

Thanks,

	Ingo

^ permalink raw reply	[flat|nested] 24+ messages in thread

end of thread, other threads:[~2014-05-07 19:01 UTC | newest]

Thread overview: 24+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2014-04-17 17:39 [PATCH 0/3] perf tools: Speedup DWARF unwind Jiri Olsa
2014-04-17 17:39 ` [PATCH 1/3] perf tools: Cache register accesses for unwind processing Jiri Olsa
2014-04-27 14:29   ` Namhyung Kim
2014-04-28  9:48     ` Jiri Olsa
2014-04-28 13:02       ` Namhyung Kim
2014-04-28 13:24         ` Jiri Olsa
2014-04-29  0:36           ` Namhyung Kim
2014-04-30 12:12             ` Jiri Olsa
2014-04-28 10:39   ` Christian Borntraeger
2014-04-28 11:00     ` Jiri Olsa
2014-04-17 17:39 ` [PATCH 2/3] perf tools: Cache dso data file descriptor Jiri Olsa
2014-04-27 14:36   ` Namhyung Kim
2014-04-28 10:01     ` Jiri Olsa
2014-04-28 13:16       ` Namhyung Kim
2014-04-28 13:34         ` Jiri Olsa
2014-04-28 14:57         ` David Ahern
2014-04-29  0:41           ` Namhyung Kim
2014-05-07 19:01       ` Ingo Molnar
2014-04-17 17:39 ` [PATCH 3/3] perf tools: Replace dso data cache with mapped data Jiri Olsa
2014-04-18  7:51 ` [PATCH 0/3] perf tools: Speedup DWARF unwind Ingo Molnar
2014-04-18  7:55   ` Ingo Molnar
2014-04-18  9:35     ` Jiri Olsa
2014-04-23 20:16 ` Jiri Olsa
2014-04-25 13:08 ` Jiri Olsa

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.