linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [GIT PULL 0/8] perf/core improvements and fixes
@ 2014-10-01 19:50 Arnaldo Carvalho de Melo
  2014-10-01 19:50 ` [PATCH 1/8] perf tools: Refactor unit and scale function parameters Arnaldo Carvalho de Melo
                   ` (8 more replies)
  0 siblings, 9 replies; 14+ messages in thread
From: Arnaldo Carvalho de Melo @ 2014-10-01 19:50 UTC (permalink / raw)
  To: Ingo Molnar
  Cc: linux-kernel, Arnaldo Carvalho de Melo, Adrian Hunter,
	Andi Kleen, Chang Hyun Park, David Ahern, Davidlohr Bueso,
	Don Zickus, Douglas Hatch, Frederic Weisbecker, H . Peter Anvin,
	Jean Pihet, Jiri Olsa, linux-arm-kernel, Matt Fleming,
	Mike Galbraith, Namhyung Kim, Paul Mackerras, Peter Zijlstra,
	Scott J Norton, Stephane Eranian, Thomas Gleixner, Waiman Long,
	Will Deacon, Arnaldo Carvalho de Melo

Hi Ingo,

	Please consider pulling,

Best Regards,

- Arnaldo

The following changes since commit 07394b5f13a04f86b27e0ddd96a36c7d9bfe1a4f:

  Merge tag 'perf-core-for-mingo' of git://git.kernel.org/pub/scm/linux/kernel/git/acme/linux into perf/core (2014-09-27 09:15:48 +0200)

are available in the git repository at:


  git://git.kernel.org/pub/scm/linux/kernel/git/acme/linux.git tags/perf-core-for-mingo

for you to fetch changes up to 281f92f233a59ef52bb45287242bd815a67f5647:

  perf record: Fix error message for --filter option not coming after tracepoint (2014-10-01 15:05:32 -0300)

----------------------------------------------------------------
perf/core improvements and fixes:

User visible:

. Fix mmap return address truncation to 32-bit in 'perf trace'. (Chang Hyun Park)

o Support operations for shared futexes. (Davidlohr Bueso)

. Fix error message for --filter option not coming after tracepoint. (Arnaldo Carvalho de Melo)

Infrastructure:

. Refactor unit and scale function parameters for pmu parsing routines. (Matt Fleming)

. Improve DSO long names lookup with rbtree, resulting in great speedup for
  workloads with lots of DSOs. (Waiman Long)

. Fix build breakage on arm64 targets. (Will Deacon)

Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>

----------------------------------------------------------------
Arnaldo Carvalho de Melo (1):
      perf record: Fix error message for --filter option not coming after tracepoint

Chang Hyun Park (1):
      perf trace: Fix mmap return address truncation to 32-bit

Davidlohr Bueso (2):
      perf bench futex: Support operations for shared futexes
      perf bench futex: Sanitize -q option in requeue

Matt Fleming (1):
      perf tools: Refactor unit and scale function parameters

Waiman Long (2):
      perf symbols: Encapsulate dsos list head into struct dsos
      perf symbols: Improve DSO long names lookup speed with rbtree

Will Deacon (1):
      perf tools: Fix build breakage on arm64 targets

 tools/perf/arch/arm64/util/unwind-libunwind.c |  1 +
 tools/perf/bench/futex-hash.c                 |  7 ++-
 tools/perf/bench/futex-requeue.c              | 28 +++++----
 tools/perf/bench/futex-wake.c                 | 15 +++--
 tools/perf/builtin-trace.c                    |  6 +-
 tools/perf/util/dso.c                         | 85 +++++++++++++++++++++++----
 tools/perf/util/dso.h                         | 16 ++++-
 tools/perf/util/header.c                      | 32 +++++-----
 tools/perf/util/machine.c                     | 25 ++++----
 tools/perf/util/machine.h                     |  5 +-
 tools/perf/util/parse-events.c                | 11 ++--
 tools/perf/util/pmu.c                         | 38 +++++++-----
 tools/perf/util/pmu.h                         |  7 ++-
 tools/perf/util/probe-event.c                 |  3 +-
 tools/perf/util/symbol-elf.c                  |  7 ++-
 15 files changed, 200 insertions(+), 86 deletions(-)

^ permalink raw reply	[flat|nested] 14+ messages in thread

* [PATCH 1/8] perf tools: Refactor unit and scale function parameters
  2014-10-01 19:50 [GIT PULL 0/8] perf/core improvements and fixes Arnaldo Carvalho de Melo
@ 2014-10-01 19:50 ` Arnaldo Carvalho de Melo
  2014-10-01 19:50 ` [PATCH 2/8] perf trace: Fix mmap return address truncation to 32-bit Arnaldo Carvalho de Melo
                   ` (7 subsequent siblings)
  8 siblings, 0 replies; 14+ messages in thread
From: Arnaldo Carvalho de Melo @ 2014-10-01 19:50 UTC (permalink / raw)
  To: Ingo Molnar
  Cc: linux-kernel, Matt Fleming, H. Peter Anvin, Jiri Olsa,
	Peter Zijlstra, Thomas Gleixner, Arnaldo Carvalho de Melo

From: Matt Fleming <matt.fleming@intel.com>

Passing pointers to alias modifiers 'unit' and 'scale' isn't very
future-proof since if we add more modifiers to the list we'll end up
passing more arguments.

Instead wrap everything up in a struct perf_pmu_info, which can easily
be expanded when additional alias modifiers are necessary in the future.

Signed-off-by: Matt Fleming <matt.fleming@intel.com>
Acked-by: Jiri Olsa <jolsa@kernel.org>
Cc: "H. Peter Anvin" <hpa@zytor.com>
Cc: Ingo Molnar <mingo@kernel.org>
Cc: Jiri Olsa <jolsa@redhat.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Link: http://lkml.kernel.org/r/1411567455-31264-3-git-send-email-matt@console-pimps.org
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
---
 tools/perf/util/parse-events.c |  9 ++++-----
 tools/perf/util/pmu.c          | 38 +++++++++++++++++++++++---------------
 tools/perf/util/pmu.h          |  7 ++++++-
 3 files changed, 33 insertions(+), 21 deletions(-)

diff --git a/tools/perf/util/parse-events.c b/tools/perf/util/parse-events.c
index 61be3e695ec2..9522cf22ad81 100644
--- a/tools/perf/util/parse-events.c
+++ b/tools/perf/util/parse-events.c
@@ -634,10 +634,9 @@ int parse_events_add_pmu(struct list_head *list, int *idx,
 			 char *name, struct list_head *head_config)
 {
 	struct perf_event_attr attr;
+	struct perf_pmu_info info;
 	struct perf_pmu *pmu;
 	struct perf_evsel *evsel;
-	const char *unit;
-	double scale;
 
 	pmu = perf_pmu__find(name);
 	if (!pmu)
@@ -656,7 +655,7 @@ int parse_events_add_pmu(struct list_head *list, int *idx,
 		return evsel ? 0 : -ENOMEM;
 	}
 
-	if (perf_pmu__check_alias(pmu, head_config, &unit, &scale))
+	if (perf_pmu__check_alias(pmu, head_config, &info))
 		return -EINVAL;
 
 	/*
@@ -671,8 +670,8 @@ int parse_events_add_pmu(struct list_head *list, int *idx,
 	evsel = __add_event(list, idx, &attr, pmu_event_name(head_config),
 			    pmu->cpus);
 	if (evsel) {
-		evsel->unit = unit;
-		evsel->scale = scale;
+		evsel->unit = info.unit;
+		evsel->scale = info.scale;
 	}
 
 	return evsel ? 0 : -ENOMEM;
diff --git a/tools/perf/util/pmu.c b/tools/perf/util/pmu.c
index 22a4ad5a927a..93a41ca96b8e 100644
--- a/tools/perf/util/pmu.c
+++ b/tools/perf/util/pmu.c
@@ -210,6 +210,19 @@ static int perf_pmu__new_alias(struct list_head *list, char *dir, char *name, FI
 	return 0;
 }
 
+static inline bool pmu_alias_info_file(char *name)
+{
+	size_t len;
+
+	len = strlen(name);
+	if (len > 5 && !strcmp(name + len - 5, ".unit"))
+		return true;
+	if (len > 6 && !strcmp(name + len - 6, ".scale"))
+		return true;
+
+	return false;
+}
+
 /*
  * Process all the sysfs attributes located under the directory
  * specified in 'dir' parameter.
@@ -218,7 +231,6 @@ static int pmu_aliases_parse(char *dir, struct list_head *head)
 {
 	struct dirent *evt_ent;
 	DIR *event_dir;
-	size_t len;
 	int ret = 0;
 
 	event_dir = opendir(dir);
@@ -234,13 +246,9 @@ static int pmu_aliases_parse(char *dir, struct list_head *head)
 			continue;
 
 		/*
-		 * skip .unit and .scale info files
-		 * parsed in perf_pmu__new_alias()
+		 * skip info files parsed in perf_pmu__new_alias()
 		 */
-		len = strlen(name);
-		if (len > 5 && !strcmp(name + len - 5, ".unit"))
-			continue;
-		if (len > 6 && !strcmp(name + len - 6, ".scale"))
+		if (pmu_alias_info_file(name))
 			continue;
 
 		snprintf(path, PATH_MAX, "%s/%s", dir, name);
@@ -645,7 +653,7 @@ static int check_unit_scale(struct perf_pmu_alias *alias,
  * defined for the alias
  */
 int perf_pmu__check_alias(struct perf_pmu *pmu, struct list_head *head_terms,
-			  const char **unit, double *scale)
+			  struct perf_pmu_info *info)
 {
 	struct parse_events_term *term, *h;
 	struct perf_pmu_alias *alias;
@@ -655,8 +663,8 @@ int perf_pmu__check_alias(struct perf_pmu *pmu, struct list_head *head_terms,
 	 * Mark unit and scale as not set
 	 * (different from default values, see below)
 	 */
-	*unit   = NULL;
-	*scale  = 0.0;
+	info->unit   = NULL;
+	info->scale  = 0.0;
 
 	list_for_each_entry_safe(term, h, head_terms, list) {
 		alias = pmu_find_alias(pmu, term);
@@ -666,7 +674,7 @@ int perf_pmu__check_alias(struct perf_pmu *pmu, struct list_head *head_terms,
 		if (ret)
 			return ret;
 
-		ret = check_unit_scale(alias, unit, scale);
+		ret = check_unit_scale(alias, &info->unit, &info->scale);
 		if (ret)
 			return ret;
 
@@ -679,11 +687,11 @@ int perf_pmu__check_alias(struct perf_pmu *pmu, struct list_head *head_terms,
 	 * set defaults as for evsel
 	 * unit cannot left to NULL
 	 */
-	if (*unit == NULL)
-		*unit   = "";
+	if (info->unit == NULL)
+		info->unit   = "";
 
-	if (*scale == 0.0)
-		*scale  = 1.0;
+	if (info->scale == 0.0)
+		info->scale  = 1.0;
 
 	return 0;
 }
diff --git a/tools/perf/util/pmu.h b/tools/perf/util/pmu.h
index 0f5c0a88fdc8..fe90a012c003 100644
--- a/tools/perf/util/pmu.h
+++ b/tools/perf/util/pmu.h
@@ -25,6 +25,11 @@ struct perf_pmu {
 	struct list_head list;    /* ELEM */
 };
 
+struct perf_pmu_info {
+	const char *unit;
+	double scale;
+};
+
 struct perf_pmu *perf_pmu__find(const char *name);
 int perf_pmu__config(struct perf_pmu *pmu, struct perf_event_attr *attr,
 		     struct list_head *head_terms);
@@ -33,7 +38,7 @@ int perf_pmu__config_terms(struct list_head *formats,
 			   struct list_head *head_terms,
 			   bool zero);
 int perf_pmu__check_alias(struct perf_pmu *pmu, struct list_head *head_terms,
-			  const char **unit, double *scale);
+			  struct perf_pmu_info *info);
 struct list_head *perf_pmu__alias(struct perf_pmu *pmu,
 				  struct list_head *head_terms);
 int perf_pmu_wrap(void);
-- 
1.9.3


^ permalink raw reply related	[flat|nested] 14+ messages in thread

* [PATCH 2/8] perf trace: Fix mmap return address truncation to 32-bit
  2014-10-01 19:50 [GIT PULL 0/8] perf/core improvements and fixes Arnaldo Carvalho de Melo
  2014-10-01 19:50 ` [PATCH 1/8] perf tools: Refactor unit and scale function parameters Arnaldo Carvalho de Melo
@ 2014-10-01 19:50 ` Arnaldo Carvalho de Melo
  2014-10-01 19:50 ` [PATCH 3/8] perf bench futex: Support operations for shared futexes Arnaldo Carvalho de Melo
                   ` (6 subsequent siblings)
  8 siblings, 0 replies; 14+ messages in thread
From: Arnaldo Carvalho de Melo @ 2014-10-01 19:50 UTC (permalink / raw)
  To: Ingo Molnar
  Cc: linux-kernel, Chang Hyun Park, H. Peter Anvin, Ingo Molnar,
	Thomas Gleixner, Arnaldo Carvalho de Melo

From: Chang Hyun Park <heartinpiece@gmail.com>

Using 'perf trace' for mmap is truncating return values by stripping the
top 32 bits, actually printing only the lower 32 bits.

This was because the ret value was of an 'int' type and not a 'long'
type.

  The Problem:

  991258501.244 ( 0.004 ms): mmap(len: 40001536, prot: READ|WRITE, flags: PRIVATE|ANONYMOUS, fd: -1) = 0x56691000
  991258501.257 ( 0.000 ms): minfault [_int_malloc+0x1038] => //anon@0x7fa056691008 //(d.)

The first line shows an mmap, which succeeds and returns 0x56691000.

However the next line shows a memory access to that virtual memory area,
specifically to 0x7fa056691008. The upper 32 bit is lost due to the
problem mentioned above, and thus mmap's return value didn't have the
upper 0x7fa0.

Tested on 3.17-rc5 from the linus's tree, and the HEAD of tip/master

Signed-off-by: Chang Hyun Park <heartinpiece@gmail.com>
Cc: H. Peter Anvin <hpa@zytor.com>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
Link: http://lkml.kernel.org/r/1411736041-8017-1-git-send-email-heartinpiece@gmail.com
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
---
 tools/perf/builtin-trace.c | 6 +++---
 1 file changed, 3 insertions(+), 3 deletions(-)

diff --git a/tools/perf/builtin-trace.c b/tools/perf/builtin-trace.c
index c70e69ea1c5d..09bcf2393910 100644
--- a/tools/perf/builtin-trace.c
+++ b/tools/perf/builtin-trace.c
@@ -1695,7 +1695,7 @@ static int trace__sys_exit(struct trace *trace, struct perf_evsel *evsel,
 			   union perf_event *event __maybe_unused,
 			   struct perf_sample *sample)
 {
-	int ret;
+	long ret;
 	u64 duration = 0;
 	struct thread *thread;
 	int id = perf_evsel__sc_tp_uint(evsel, id, sample);
@@ -1748,7 +1748,7 @@ static int trace__sys_exit(struct trace *trace, struct perf_evsel *evsel,
 
 	if (sc->fmt == NULL) {
 signed_print:
-		fprintf(trace->output, ") = %d", ret);
+		fprintf(trace->output, ") = %ld", ret);
 	} else if (ret < 0 && sc->fmt->errmsg) {
 		char bf[STRERR_BUFSIZE];
 		const char *emsg = strerror_r(-ret, bf, sizeof(bf)),
@@ -1758,7 +1758,7 @@ signed_print:
 	} else if (ret == 0 && sc->fmt->timeout)
 		fprintf(trace->output, ") = 0 Timeout");
 	else if (sc->fmt->hexret)
-		fprintf(trace->output, ") = %#x", ret);
+		fprintf(trace->output, ") = %#lx", ret);
 	else
 		goto signed_print;
 
-- 
1.9.3


^ permalink raw reply related	[flat|nested] 14+ messages in thread

* [PATCH 3/8] perf bench futex: Support operations for shared futexes
  2014-10-01 19:50 [GIT PULL 0/8] perf/core improvements and fixes Arnaldo Carvalho de Melo
  2014-10-01 19:50 ` [PATCH 1/8] perf tools: Refactor unit and scale function parameters Arnaldo Carvalho de Melo
  2014-10-01 19:50 ` [PATCH 2/8] perf trace: Fix mmap return address truncation to 32-bit Arnaldo Carvalho de Melo
@ 2014-10-01 19:50 ` Arnaldo Carvalho de Melo
  2014-10-01 19:50 ` [PATCH 4/8] perf bench futex: Sanitize -q option in requeue Arnaldo Carvalho de Melo
                   ` (5 subsequent siblings)
  8 siblings, 0 replies; 14+ messages in thread
From: Arnaldo Carvalho de Melo @ 2014-10-01 19:50 UTC (permalink / raw)
  To: Ingo Molnar
  Cc: linux-kernel, Davidlohr Bueso, Davidlohr Bueso, Arnaldo Carvalho de Melo

From: Davidlohr Bueso <dave@stgolabs.net>

Unlike futex-hash, requeuing and wakeup benchmarks do not support shared
futexes, limiting the usefulness of the programs. Correct this, and
allow using the local -S parameter. The default remains using private
futexes.

Signed-off-by: Davidlohr Bueso <dbueso@suse.de>
Cc: Davidlohr Bueso <dbueso@suse.de>
Link: http://lkml.kernel.org/r/1412008868-22328-1-git-send-email-dave@stgolabs.net
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
---
 tools/perf/bench/futex-hash.c    |  7 +++++--
 tools/perf/bench/futex-requeue.c | 24 +++++++++++++++---------
 tools/perf/bench/futex-wake.c    | 15 ++++++++++-----
 3 files changed, 30 insertions(+), 16 deletions(-)

diff --git a/tools/perf/bench/futex-hash.c b/tools/perf/bench/futex-hash.c
index a84206e9c4aa..fc9bebd2cca0 100644
--- a/tools/perf/bench/futex-hash.c
+++ b/tools/perf/bench/futex-hash.c
@@ -26,6 +26,7 @@ static unsigned int nsecs    = 10;
 /* amount of futexes per thread */
 static unsigned int nfutexes = 1024;
 static bool fshared = false, done = false, silent = false;
+static int futex_flag = 0;
 
 struct timeval start, end, runtime;
 static pthread_mutex_t thread_lock;
@@ -75,8 +76,7 @@ static void *workerfn(void *arg)
 			 * such as internal waitqueue handling, thus enlarging
 			 * the critical region protected by hb->lock.
 			 */
-			ret = futex_wait(&w->futex[i], 1234, NULL,
-					 fshared ? 0 : FUTEX_PRIVATE_FLAG);
+			ret = futex_wait(&w->futex[i], 1234, NULL, futex_flag);
 			if (!silent &&
 			    (!ret || errno != EAGAIN || errno != EWOULDBLOCK))
 				warn("Non-expected futex return call");
@@ -135,6 +135,9 @@ int bench_futex_hash(int argc, const char **argv,
 	if (!worker)
 		goto errmem;
 
+	if (!fshared)
+		futex_flag = FUTEX_PRIVATE_FLAG;
+
 	printf("Run summary [PID %d]: %d threads, each operating on %d [%s] futexes for %d secs.\n\n",
 	       getpid(), nthreads, nfutexes, fshared ? "shared":"private", nsecs);
 
diff --git a/tools/perf/bench/futex-requeue.c b/tools/perf/bench/futex-requeue.c
index 732403bfd31a..9837a8831406 100644
--- a/tools/perf/bench/futex-requeue.c
+++ b/tools/perf/bench/futex-requeue.c
@@ -30,16 +30,18 @@ static u_int32_t futex1 = 0, futex2 = 0;
 static unsigned int nrequeue = 1;
 
 static pthread_t *worker;
-static bool done = 0, silent = 0;
+static bool done = false, silent = false, fshared = false;
 static pthread_mutex_t thread_lock;
 static pthread_cond_t thread_parent, thread_worker;
 static struct stats requeuetime_stats, requeued_stats;
 static unsigned int ncpus, threads_starting, nthreads = 0;
+static int futex_flag = 0;
 
 static const struct option options[] = {
 	OPT_UINTEGER('t', "threads",  &nthreads, "Specify amount of threads"),
 	OPT_UINTEGER('q', "nrequeue", &nrequeue, "Specify amount of threads to requeue at once"),
 	OPT_BOOLEAN( 's', "silent",   &silent,   "Silent mode: do not display data/details"),
+	OPT_BOOLEAN( 'S', "shared",   &fshared,  "Use shared futexes instead of private ones"),
 	OPT_END()
 };
 
@@ -70,7 +72,7 @@ static void *workerfn(void *arg __maybe_unused)
 	pthread_cond_wait(&thread_worker, &thread_lock);
 	pthread_mutex_unlock(&thread_lock);
 
-	futex_wait(&futex1, 0, NULL, FUTEX_PRIVATE_FLAG);
+	futex_wait(&futex1, 0, NULL, futex_flag);
 	return NULL;
 }
 
@@ -127,9 +129,12 @@ int bench_futex_requeue(int argc, const char **argv,
 	if (!worker)
 		err(EXIT_FAILURE, "calloc");
 
-	printf("Run summary [PID %d]: Requeuing %d threads (from %p to %p), "
-	       "%d at a time.\n\n",
-	       getpid(), nthreads, &futex1, &futex2, nrequeue);
+	if (!fshared)
+		futex_flag = FUTEX_PRIVATE_FLAG;
+
+	printf("Run summary [PID %d]: Requeuing %d threads (from [%s] %p to %p), "
+	       "%d at a time.\n\n",  getpid(), nthreads,
+	       fshared ? "shared":"private", &futex1, &futex2, nrequeue);
 
 	init_stats(&requeued_stats);
 	init_stats(&requeuetime_stats);
@@ -156,13 +161,14 @@ int bench_futex_requeue(int argc, const char **argv,
 
 		/* Ok, all threads are patiently blocked, start requeueing */
 		gettimeofday(&start, NULL);
-		for (nrequeued = 0; nrequeued < nthreads; nrequeued += nrequeue)
+		for (nrequeued = 0; nrequeued < nthreads; nrequeued += nrequeue) {
 			/*
 			 * Do not wakeup any tasks blocked on futex1, allowing
 			 * us to really measure futex_wait functionality.
 			 */
-			futex_cmp_requeue(&futex1, 0, &futex2, 0, nrequeue,
-					  FUTEX_PRIVATE_FLAG);
+			futex_cmp_requeue(&futex1, 0, &futex2, 0,
+					  nrequeue, futex_flag);
+		}
 		gettimeofday(&end, NULL);
 		timersub(&end, &start, &runtime);
 
@@ -175,7 +181,7 @@ int bench_futex_requeue(int argc, const char **argv,
 		}
 
 		/* everybody should be blocked on futex2, wake'em up */
-		nrequeued = futex_wake(&futex2, nthreads, FUTEX_PRIVATE_FLAG);
+		nrequeued = futex_wake(&futex2, nthreads, futex_flag);
 		if (nthreads != nrequeued)
 			warnx("couldn't wakeup all tasks (%d/%d)", nrequeued, nthreads);
 
diff --git a/tools/perf/bench/futex-wake.c b/tools/perf/bench/futex-wake.c
index 50022cbce87e..929f762be47e 100644
--- a/tools/perf/bench/futex-wake.c
+++ b/tools/perf/bench/futex-wake.c
@@ -31,16 +31,18 @@ static u_int32_t futex1 = 0;
 static unsigned int nwakes = 1;
 
 pthread_t *worker;
-static bool done = false, silent = false;
+static bool done = false, silent = false, fshared = false;
 static pthread_mutex_t thread_lock;
 static pthread_cond_t thread_parent, thread_worker;
 static struct stats waketime_stats, wakeup_stats;
 static unsigned int ncpus, threads_starting, nthreads = 0;
+static int futex_flag = 0;
 
 static const struct option options[] = {
 	OPT_UINTEGER('t', "threads", &nthreads, "Specify amount of threads"),
 	OPT_UINTEGER('w', "nwakes",  &nwakes,   "Specify amount of threads to wake at once"),
 	OPT_BOOLEAN( 's', "silent",  &silent,   "Silent mode: do not display data/details"),
+	OPT_BOOLEAN( 'S', "shared",  &fshared,  "Use shared futexes instead of private ones"),
 	OPT_END()
 };
 
@@ -58,7 +60,7 @@ static void *workerfn(void *arg __maybe_unused)
 	pthread_cond_wait(&thread_worker, &thread_lock);
 	pthread_mutex_unlock(&thread_lock);
 
-	futex_wait(&futex1, 0, NULL, FUTEX_PRIVATE_FLAG);
+	futex_wait(&futex1, 0, NULL, futex_flag);
 	return NULL;
 }
 
@@ -130,9 +132,12 @@ int bench_futex_wake(int argc, const char **argv,
 	if (!worker)
 		err(EXIT_FAILURE, "calloc");
 
-	printf("Run summary [PID %d]: blocking on %d threads (at futex %p), "
+	if (!fshared)
+		futex_flag = FUTEX_PRIVATE_FLAG;
+
+	printf("Run summary [PID %d]: blocking on %d threads (at [%s] futex %p), "
 	       "waking up %d at a time.\n\n",
-	       getpid(), nthreads, &futex1, nwakes);
+	       getpid(), nthreads, fshared ? "shared":"private",  &futex1, nwakes);
 
 	init_stats(&wakeup_stats);
 	init_stats(&waketime_stats);
@@ -160,7 +165,7 @@ int bench_futex_wake(int argc, const char **argv,
 		/* Ok, all threads are patiently blocked, start waking folks up */
 		gettimeofday(&start, NULL);
 		while (nwoken != nthreads)
-			nwoken += futex_wake(&futex1, nwakes, FUTEX_PRIVATE_FLAG);
+			nwoken += futex_wake(&futex1, nwakes, futex_flag);
 		gettimeofday(&end, NULL);
 		timersub(&end, &start, &runtime);
 
-- 
1.9.3


^ permalink raw reply related	[flat|nested] 14+ messages in thread

* [PATCH 4/8] perf bench futex: Sanitize -q option in requeue
  2014-10-01 19:50 [GIT PULL 0/8] perf/core improvements and fixes Arnaldo Carvalho de Melo
                   ` (2 preceding siblings ...)
  2014-10-01 19:50 ` [PATCH 3/8] perf bench futex: Support operations for shared futexes Arnaldo Carvalho de Melo
@ 2014-10-01 19:50 ` Arnaldo Carvalho de Melo
  2014-10-01 19:50 ` [PATCH 5/8] perf symbols: Encapsulate dsos list head into struct dsos Arnaldo Carvalho de Melo
                   ` (4 subsequent siblings)
  8 siblings, 0 replies; 14+ messages in thread
From: Arnaldo Carvalho de Melo @ 2014-10-01 19:50 UTC (permalink / raw)
  To: Ingo Molnar
  Cc: linux-kernel, Davidlohr Bueso, Davidlohr Bueso, Arnaldo Carvalho de Melo

From: Davidlohr Bueso <dave@stgolabs.net>

When given the number of threads to requeue at once by user input,
there's always the risk of this value being larger than the total number
of threads.  This doesn't make any sense, and the kernel can easily deal
with such sort of situations, hence no big deal. We should however
prevent bogus output such as:

./perf bench --repeat 2 futex requeue -q 10
Run summary [PID 22210]: Requeuing 4 threads (from [private] 0x99ef3c to 0x99ef38), 10 at a time.

[Run 1]: Requeued 10 of 4 threads in 0.0040 ms
[Run 2]: Requeued 10 of 4 threads in 0.0030 ms
Requeued 10 of 4 threads in 0.0035 ms (+-14.29%)

Signed-off-by: Davidlohr Bueso <dbueso@suse.de>
Cc: Davidlohr Bueso <dbueso@suse.de>
Link: http://lkml.kernel.org/r/1412008868-22328-2-git-send-email-dave@stgolabs.net
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
---
 tools/perf/bench/futex-requeue.c | 4 +++-
 1 file changed, 3 insertions(+), 1 deletion(-)

diff --git a/tools/perf/bench/futex-requeue.c b/tools/perf/bench/futex-requeue.c
index 9837a8831406..bedff6b5b3cf 100644
--- a/tools/perf/bench/futex-requeue.c
+++ b/tools/perf/bench/futex-requeue.c
@@ -172,6 +172,9 @@ int bench_futex_requeue(int argc, const char **argv,
 		gettimeofday(&end, NULL);
 		timersub(&end, &start, &runtime);
 
+		if (nrequeued > nthreads)
+			nrequeued = nthreads;
+
 		update_stats(&requeued_stats, nrequeued);
 		update_stats(&requeuetime_stats, runtime.tv_usec);
 
@@ -190,7 +193,6 @@ int bench_futex_requeue(int argc, const char **argv,
 			if (ret)
 				err(EXIT_FAILURE, "pthread_join");
 		}
-
 	}
 
 	/* cleanup & report results */
-- 
1.9.3


^ permalink raw reply related	[flat|nested] 14+ messages in thread

* [PATCH 5/8] perf symbols: Encapsulate dsos list head into struct dsos
  2014-10-01 19:50 [GIT PULL 0/8] perf/core improvements and fixes Arnaldo Carvalho de Melo
                   ` (3 preceding siblings ...)
  2014-10-01 19:50 ` [PATCH 4/8] perf bench futex: Sanitize -q option in requeue Arnaldo Carvalho de Melo
@ 2014-10-01 19:50 ` Arnaldo Carvalho de Melo
  2014-10-01 19:50 ` [PATCH 6/8] perf symbols: Improve DSO long names lookup speed with rbtree Arnaldo Carvalho de Melo
                   ` (3 subsequent siblings)
  8 siblings, 0 replies; 14+ messages in thread
From: Arnaldo Carvalho de Melo @ 2014-10-01 19:50 UTC (permalink / raw)
  To: Ingo Molnar
  Cc: linux-kernel, Waiman Long, Adrian Hunter, Don Zickus,
	Douglas Hatch, Ingo Molnar, Jiri Olsa, Namhyung Kim,
	Paul Mackerras, Peter Zijlstra, Scott J Norton,
	Arnaldo Carvalho de Melo

From: Waiman Long <Waiman.Long@hp.com>

This is a precursor patch to enable long name searching of DSOs using
a rbtree.

In this patch, a new dsos structure is created which contains only a
list head structure for the moment.

The new dsos structure is used, in turn, in the machine structure for
the user_dsos and kernel_dsos fields.

Only the following 3 dsos functions are modified to accept the new dsos
structure parameter instead of list_head:

 - dsos__add()
 - dsos__find()
 - __dsos__findnew()

Signed-off-by: Waiman Long <Waiman.Long@hp.com>
Cc: Adrian Hunter <adrian.hunter@intel.com>
Cc: Don Zickus <dzickus@redhat.com>
Cc: Douglas Hatch <doug.hatch@hp.com>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: Jiri Olsa <jolsa@kernel.org>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Paul Mackerras <paulus@samba.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Scott J Norton <scott.norton@hp.com>
Link: http://lkml.kernel.org/r/1412021249-19201-2-git-send-email-Waiman.Long@hp.com
[ Move struct dsos to dso.h to reduce the dso methods depends on machine.h ]
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>

Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
---
 tools/perf/util/dso.c         | 17 +++++++++--------
 tools/perf/util/dso.h         | 13 ++++++++++---
 tools/perf/util/header.c      | 32 ++++++++++++++++++--------------
 tools/perf/util/machine.c     | 24 ++++++++++++------------
 tools/perf/util/machine.h     |  5 +++--
 tools/perf/util/probe-event.c |  3 ++-
 tools/perf/util/symbol-elf.c  |  7 ++++++-
 7 files changed, 60 insertions(+), 41 deletions(-)

diff --git a/tools/perf/util/dso.c b/tools/perf/util/dso.c
index 55e39dc1bcda..901a58fa3f22 100644
--- a/tools/perf/util/dso.c
+++ b/tools/perf/util/dso.c
@@ -851,35 +851,36 @@ bool __dsos__read_build_ids(struct list_head *head, bool with_hits)
 	return have_build_id;
 }
 
-void dsos__add(struct list_head *head, struct dso *dso)
+void dsos__add(struct dsos *dsos, struct dso *dso)
 {
-	list_add_tail(&dso->node, head);
+	list_add_tail(&dso->node, &dsos->head);
 }
 
-struct dso *dsos__find(const struct list_head *head, const char *name, bool cmp_short)
+struct dso *dsos__find(const struct dsos *dsos, const char *name,
+		       bool cmp_short)
 {
 	struct dso *pos;
 
 	if (cmp_short) {
-		list_for_each_entry(pos, head, node)
+		list_for_each_entry(pos, &dsos->head, node)
 			if (strcmp(pos->short_name, name) == 0)
 				return pos;
 		return NULL;
 	}
-	list_for_each_entry(pos, head, node)
+	list_for_each_entry(pos, &dsos->head, node)
 		if (strcmp(pos->long_name, name) == 0)
 			return pos;
 	return NULL;
 }
 
-struct dso *__dsos__findnew(struct list_head *head, const char *name)
+struct dso *__dsos__findnew(struct dsos *dsos, const char *name)
 {
-	struct dso *dso = dsos__find(head, name, false);
+	struct dso *dso = dsos__find(dsos, name, false);
 
 	if (!dso) {
 		dso = dso__new(name);
 		if (dso != NULL) {
-			dsos__add(head, dso);
+			dsos__add(dsos, dso);
 			dso__set_basename(dso);
 		}
 	}
diff --git a/tools/perf/util/dso.h b/tools/perf/util/dso.h
index 5e463c0964d4..b63dc98ad71d 100644
--- a/tools/perf/util/dso.h
+++ b/tools/perf/util/dso.h
@@ -90,6 +90,13 @@ struct dso_cache {
 	char data[0];
 };
 
+/*
+ * DSOs are put into a list for fast iteration.
+ */
+struct dsos {
+	struct list_head head;
+};
+
 struct dso {
 	struct list_head node;
 	struct rb_root	 symbols[MAP__NR_TYPES];
@@ -224,10 +231,10 @@ struct map *dso__new_map(const char *name);
 struct dso *dso__kernel_findnew(struct machine *machine, const char *name,
 				const char *short_name, int dso_type);
 
-void dsos__add(struct list_head *head, struct dso *dso);
-struct dso *dsos__find(const struct list_head *head, const char *name,
+void dsos__add(struct dsos *dsos, struct dso *dso);
+struct dso *dsos__find(const struct dsos *dsos, const char *name,
 		       bool cmp_short);
-struct dso *__dsos__findnew(struct list_head *head, const char *name);
+struct dso *__dsos__findnew(struct dsos *dsos, const char *name);
 bool __dsos__read_build_ids(struct list_head *head, bool with_hits);
 
 size_t __dsos__fprintf_buildid(struct list_head *head, FILE *fp,
diff --git a/tools/perf/util/header.c b/tools/perf/util/header.c
index 158c787ce0c4..ce0de00399da 100644
--- a/tools/perf/util/header.c
+++ b/tools/perf/util/header.c
@@ -214,11 +214,11 @@ static int machine__hit_all_dsos(struct machine *machine)
 {
 	int err;
 
-	err = __dsos__hit_all(&machine->kernel_dsos);
+	err = __dsos__hit_all(&machine->kernel_dsos.head);
 	if (err)
 		return err;
 
-	return __dsos__hit_all(&machine->user_dsos);
+	return __dsos__hit_all(&machine->user_dsos.head);
 }
 
 int dsos__hit_all(struct perf_session *session)
@@ -288,11 +288,12 @@ static int machine__write_buildid_table(struct machine *machine, int fd)
 		umisc = PERF_RECORD_MISC_GUEST_USER;
 	}
 
-	err = __dsos__write_buildid_table(&machine->kernel_dsos, machine,
+	err = __dsos__write_buildid_table(&machine->kernel_dsos.head, machine,
 					  machine->pid, kmisc, fd);
 	if (err == 0)
-		err = __dsos__write_buildid_table(&machine->user_dsos, machine,
-						  machine->pid, umisc, fd);
+		err = __dsos__write_buildid_table(&machine->user_dsos.head,
+						  machine, machine->pid, umisc,
+						  fd);
 	return err;
 }
 
@@ -455,9 +456,10 @@ static int __dsos__cache_build_ids(struct list_head *head,
 
 static int machine__cache_build_ids(struct machine *machine, const char *debugdir)
 {
-	int ret = __dsos__cache_build_ids(&machine->kernel_dsos, machine,
+	int ret = __dsos__cache_build_ids(&machine->kernel_dsos.head, machine,
 					  debugdir);
-	ret |= __dsos__cache_build_ids(&machine->user_dsos, machine, debugdir);
+	ret |= __dsos__cache_build_ids(&machine->user_dsos.head, machine,
+				       debugdir);
 	return ret;
 }
 
@@ -483,8 +485,10 @@ static int perf_session__cache_build_ids(struct perf_session *session)
 
 static bool machine__read_build_ids(struct machine *machine, bool with_hits)
 {
-	bool ret = __dsos__read_build_ids(&machine->kernel_dsos, with_hits);
-	ret |= __dsos__read_build_ids(&machine->user_dsos, with_hits);
+	bool ret;
+
+	ret  = __dsos__read_build_ids(&machine->kernel_dsos.head, with_hits);
+	ret |= __dsos__read_build_ids(&machine->user_dsos.head, with_hits);
 	return ret;
 }
 
@@ -1548,7 +1552,7 @@ static int __event_process_build_id(struct build_id_event *bev,
 				    struct perf_session *session)
 {
 	int err = -1;
-	struct list_head *head;
+	struct dsos *dsos;
 	struct machine *machine;
 	u16 misc;
 	struct dso *dso;
@@ -1563,22 +1567,22 @@ static int __event_process_build_id(struct build_id_event *bev,
 	switch (misc) {
 	case PERF_RECORD_MISC_KERNEL:
 		dso_type = DSO_TYPE_KERNEL;
-		head = &machine->kernel_dsos;
+		dsos = &machine->kernel_dsos;
 		break;
 	case PERF_RECORD_MISC_GUEST_KERNEL:
 		dso_type = DSO_TYPE_GUEST_KERNEL;
-		head = &machine->kernel_dsos;
+		dsos = &machine->kernel_dsos;
 		break;
 	case PERF_RECORD_MISC_USER:
 	case PERF_RECORD_MISC_GUEST_USER:
 		dso_type = DSO_TYPE_USER;
-		head = &machine->user_dsos;
+		dsos = &machine->user_dsos;
 		break;
 	default:
 		goto out;
 	}
 
-	dso = __dsos__findnew(head, filename);
+	dso = __dsos__findnew(dsos, filename);
 	if (dso != NULL) {
 		char sbuild_id[BUILD_ID_SIZE * 2 + 1];
 
diff --git a/tools/perf/util/machine.c b/tools/perf/util/machine.c
index b2ec38bf211e..49a75ec4c47b 100644
--- a/tools/perf/util/machine.c
+++ b/tools/perf/util/machine.c
@@ -17,8 +17,8 @@ int machine__init(struct machine *machine, const char *root_dir, pid_t pid)
 {
 	map_groups__init(&machine->kmaps);
 	RB_CLEAR_NODE(&machine->rb_node);
-	INIT_LIST_HEAD(&machine->user_dsos);
-	INIT_LIST_HEAD(&machine->kernel_dsos);
+	INIT_LIST_HEAD(&machine->user_dsos.head);
+	INIT_LIST_HEAD(&machine->kernel_dsos.head);
 
 	machine->threads = RB_ROOT;
 	INIT_LIST_HEAD(&machine->dead_threads);
@@ -72,11 +72,11 @@ out_delete:
 	return NULL;
 }
 
-static void dsos__delete(struct list_head *dsos)
+static void dsos__delete(struct dsos *dsos)
 {
 	struct dso *pos, *n;
 
-	list_for_each_entry_safe(pos, n, dsos, node) {
+	list_for_each_entry_safe(pos, n, &dsos->head, node) {
 		list_del(&pos->node);
 		dso__delete(pos);
 	}
@@ -477,23 +477,23 @@ struct map *machine__new_module(struct machine *machine, u64 start,
 size_t machines__fprintf_dsos(struct machines *machines, FILE *fp)
 {
 	struct rb_node *nd;
-	size_t ret = __dsos__fprintf(&machines->host.kernel_dsos, fp) +
-		     __dsos__fprintf(&machines->host.user_dsos, fp);
+	size_t ret = __dsos__fprintf(&machines->host.kernel_dsos.head, fp) +
+		     __dsos__fprintf(&machines->host.user_dsos.head, fp);
 
 	for (nd = rb_first(&machines->guests); nd; nd = rb_next(nd)) {
 		struct machine *pos = rb_entry(nd, struct machine, rb_node);
-		ret += __dsos__fprintf(&pos->kernel_dsos, fp);
-		ret += __dsos__fprintf(&pos->user_dsos, fp);
+		ret += __dsos__fprintf(&pos->kernel_dsos.head, fp);
+		ret += __dsos__fprintf(&pos->user_dsos.head, fp);
 	}
 
 	return ret;
 }
 
-size_t machine__fprintf_dsos_buildid(struct machine *machine, FILE *fp,
+size_t machine__fprintf_dsos_buildid(struct machine *m, FILE *fp,
 				     bool (skip)(struct dso *dso, int parm), int parm)
 {
-	return __dsos__fprintf_buildid(&machine->kernel_dsos, fp, skip, parm) +
-	       __dsos__fprintf_buildid(&machine->user_dsos, fp, skip, parm);
+	return __dsos__fprintf_buildid(&m->kernel_dsos.head, fp, skip, parm) +
+	       __dsos__fprintf_buildid(&m->user_dsos.head, fp, skip, parm);
 }
 
 size_t machines__fprintf_dsos_buildid(struct machines *machines, FILE *fp,
@@ -994,7 +994,7 @@ static bool machine__uses_kcore(struct machine *machine)
 {
 	struct dso *dso;
 
-	list_for_each_entry(dso, &machine->kernel_dsos, node) {
+	list_for_each_entry(dso, &machine->kernel_dsos.head, node) {
 		if (dso__is_kcore(dso))
 			return true;
 	}
diff --git a/tools/perf/util/machine.h b/tools/perf/util/machine.h
index 6a6bcc1cff54..2b651a7f5d0d 100644
--- a/tools/perf/util/machine.h
+++ b/tools/perf/util/machine.h
@@ -4,6 +4,7 @@
 #include <sys/types.h>
 #include <linux/rbtree.h>
 #include "map.h"
+#include "dso.h"
 #include "event.h"
 
 struct addr_location;
@@ -32,8 +33,8 @@ struct machine {
 	struct list_head  dead_threads;
 	struct thread	  *last_match;
 	struct vdso_info  *vdso_info;
-	struct list_head  user_dsos;
-	struct list_head  kernel_dsos;
+	struct dsos	  user_dsos;
+	struct dsos	  kernel_dsos;
 	struct map_groups kmaps;
 	struct map	  *vmlinux_maps[MAP__NR_TYPES];
 	u64		  kernel_start;
diff --git a/tools/perf/util/probe-event.c b/tools/perf/util/probe-event.c
index be37b5aca335..c150ca4343eb 100644
--- a/tools/perf/util/probe-event.c
+++ b/tools/perf/util/probe-event.c
@@ -184,7 +184,8 @@ static struct dso *kernel_get_module_dso(const char *module)
 	const char *vmlinux_name;
 
 	if (module) {
-		list_for_each_entry(dso, &host_machine->kernel_dsos, node) {
+		list_for_each_entry(dso, &host_machine->kernel_dsos.head,
+				    node) {
 			if (strncmp(dso->short_name + 1, module,
 				    dso->short_name_len - 2) == 0)
 				goto found;
diff --git a/tools/perf/util/symbol-elf.c b/tools/perf/util/symbol-elf.c
index 2a92e10317c5..1e23a5bfb044 100644
--- a/tools/perf/util/symbol-elf.c
+++ b/tools/perf/util/symbol-elf.c
@@ -6,6 +6,7 @@
 #include <inttypes.h>
 
 #include "symbol.h"
+#include "machine.h"
 #include "vdso.h"
 #include <symbol/kallsyms.h>
 #include "debug.h"
@@ -929,7 +930,11 @@ int dso__load_sym(struct dso *dso, struct map *map,
 				}
 				curr_dso->symtab_type = dso->symtab_type;
 				map_groups__insert(kmap->kmaps, curr_map);
-				dsos__add(&dso->node, curr_dso);
+				/*
+				 * The new DSO should go to the kernel DSOS
+				 */
+				dsos__add(&map->groups->machine->kernel_dsos,
+					  curr_dso);
 				dso__set_loaded(curr_dso, map->type);
 			} else
 				curr_dso = curr_map->dso;
-- 
1.9.3


^ permalink raw reply related	[flat|nested] 14+ messages in thread

* [PATCH 6/8] perf symbols: Improve DSO long names lookup speed with rbtree
  2014-10-01 19:50 [GIT PULL 0/8] perf/core improvements and fixes Arnaldo Carvalho de Melo
                   ` (4 preceding siblings ...)
  2014-10-01 19:50 ` [PATCH 5/8] perf symbols: Encapsulate dsos list head into struct dsos Arnaldo Carvalho de Melo
@ 2014-10-01 19:50 ` Arnaldo Carvalho de Melo
  2014-10-14  9:09   ` Jiri Olsa
  2014-10-01 19:50 ` [PATCH 7/8] perf tools: Fix build breakage on arm64 targets Arnaldo Carvalho de Melo
                   ` (2 subsequent siblings)
  8 siblings, 1 reply; 14+ messages in thread
From: Arnaldo Carvalho de Melo @ 2014-10-01 19:50 UTC (permalink / raw)
  To: Ingo Molnar
  Cc: linux-kernel, Waiman Long, Adrian Hunter, Don Zickus,
	Douglas Hatch, Ingo Molnar, Jiri Olsa, Namhyung Kim,
	Paul Mackerras, Peter Zijlstra, Scott J Norton,
	Arnaldo Carvalho de Melo

From: Waiman Long <Waiman.Long@hp.com>

With workload that spawns and destroys many threads and processes, it
was found that perf-mem could took a long time to post-process the perf
data after the target workload had completed its operation.

The performance bottleneck was found to be the lookup and insertion of
the new DSO structures (thousands of them in this case).

In a dual-socket Ivy-Bridge E7-4890 v2 machine (30-core, 60-thread), the
perf profile below shows what perf was doing after the profiled AIM7
shared workload completed:

-     83.94%  perf  libc-2.11.3.so     [.] __strcmp_sse42
   - __strcmp_sse42
      - 99.82% map__new
           machine__process_mmap_event
           perf_session_deliver_event
           perf_session__process_event
           __perf_session__process_events
           cmd_record
           cmd_mem
           run_builtin
           main
           __libc_start_main
-     13.17%  perf  perf               [.] __dsos__findnew
     __dsos__findnew
     map__new
     machine__process_mmap_event
     perf_session_deliver_event
     perf_session__process_event
     __perf_session__process_events
     cmd_record
     cmd_mem
     run_builtin
     main
     __libc_start_main

So about 97% of CPU times were spent in the map__new() function trying
to insert new DSO entry into the DSO linked list. The whole
post-processing step took about 9 minutes.

The DSO structures are currently searched linearly. So the total
processing time will be proportional to n^2.

To overcome this performance problem, the DSO code is modified to also
put the DSO structures in a RB tree sorted by its long name in
additional to being in a simple linked list. With this change, the
processing time will become proportional to n*log(n) which will be much
quicker for large n. However, the short name will still be searched
using the old linear searching method.  With that patch in place, the
same perf-mem post-processing step took less than 30 seconds to
complete.

Signed-off-by: Waiman Long <Waiman.Long@hp.com>
Cc: Adrian Hunter <adrian.hunter@intel.com>
Cc: Don Zickus <dzickus@redhat.com>
Cc: Douglas Hatch <doug.hatch@hp.com>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: Jiri Olsa <jolsa@kernel.org>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Paul Mackerras <paulus@samba.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Scott J Norton <scott.norton@hp.com>
Link: http://lkml.kernel.org/r/1412098575-27863-3-git-send-email-Waiman.Long@hp.com
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
---
 tools/perf/util/dso.c     | 70 ++++++++++++++++++++++++++++++++++++++++++++---
 tools/perf/util/dso.h     |  5 +++-
 tools/perf/util/machine.c |  1 +
 3 files changed, 71 insertions(+), 5 deletions(-)

diff --git a/tools/perf/util/dso.c b/tools/perf/util/dso.c
index 901a58fa3f22..0247acfdfaca 100644
--- a/tools/perf/util/dso.c
+++ b/tools/perf/util/dso.c
@@ -653,6 +653,65 @@ struct dso *dso__kernel_findnew(struct machine *machine, const char *name,
 	return dso;
 }
 
+/*
+ * Find a matching entry and/or link current entry to RB tree.
+ * Either one of the dso or name parameter must be non-NULL or the
+ * function will not work.
+ */
+static struct dso *dso__findlink_by_longname(struct rb_root *root,
+					     struct dso *dso, const char *name)
+{
+	struct rb_node **p = &root->rb_node;
+	struct rb_node  *parent = NULL;
+
+	if (!name)
+		name = dso->long_name;
+	/*
+	 * Find node with the matching name
+	 */
+	while (*p) {
+		struct dso *this = rb_entry(*p, struct dso, rb_node);
+		int rc = strcmp(name, this->long_name);
+
+		parent = *p;
+		if (rc == 0) {
+			/*
+			 * In case the new DSO is a duplicate of an existing
+			 * one, print an one-time warning & put the new entry
+			 * at the end of the list of duplicates.
+			 */
+			if (!dso || (dso == this))
+				return this;	/* Find matching dso */
+			/*
+			 * The core kernel DSOs may have duplicated long name.
+			 * In this case, the short name should be different.
+			 * Comparing the short names to differentiate the DSOs.
+			 */
+			rc = strcmp(dso->short_name, this->short_name);
+			if (rc == 0) {
+				pr_err("Duplicated dso name: %s\n", name);
+				return NULL;
+			}
+		}
+		if (rc < 0)
+			p = &parent->rb_left;
+		else
+			p = &parent->rb_right;
+	}
+	if (dso) {
+		/* Add new node and rebalance tree */
+		rb_link_node(&dso->rb_node, parent, p);
+		rb_insert_color(&dso->rb_node, root);
+	}
+	return NULL;
+}
+
+static inline struct dso *
+dso__find_by_longname(const struct rb_root *root, const char *name)
+{
+	return dso__findlink_by_longname((struct rb_root *)root, NULL, name);
+}
+
 void dso__set_long_name(struct dso *dso, const char *name, bool name_allocated)
 {
 	if (name == NULL)
@@ -755,6 +814,7 @@ struct dso *dso__new(const char *name)
 		dso->a2l_fails = 1;
 		dso->kernel = DSO_TYPE_USER;
 		dso->needs_swap = DSO_SWAP__UNSET;
+		RB_CLEAR_NODE(&dso->rb_node);
 		INIT_LIST_HEAD(&dso->node);
 		INIT_LIST_HEAD(&dso->data.open_entry);
 	}
@@ -765,6 +825,10 @@ struct dso *dso__new(const char *name)
 void dso__delete(struct dso *dso)
 {
 	int i;
+
+	if (!RB_EMPTY_NODE(&dso->rb_node))
+		pr_err("DSO %s is still in rbtree when being deleted!\n",
+		       dso->long_name);
 	for (i = 0; i < MAP__NR_TYPES; ++i)
 		symbols__delete(&dso->symbols[i]);
 
@@ -854,6 +918,7 @@ bool __dsos__read_build_ids(struct list_head *head, bool with_hits)
 void dsos__add(struct dsos *dsos, struct dso *dso)
 {
 	list_add_tail(&dso->node, &dsos->head);
+	dso__findlink_by_longname(&dsos->root, dso, NULL);
 }
 
 struct dso *dsos__find(const struct dsos *dsos, const char *name,
@@ -867,10 +932,7 @@ struct dso *dsos__find(const struct dsos *dsos, const char *name,
 				return pos;
 		return NULL;
 	}
-	list_for_each_entry(pos, &dsos->head, node)
-		if (strcmp(pos->long_name, name) == 0)
-			return pos;
-	return NULL;
+	return dso__find_by_longname(&dsos->root, name);
 }
 
 struct dso *__dsos__findnew(struct dsos *dsos, const char *name)
diff --git a/tools/perf/util/dso.h b/tools/perf/util/dso.h
index b63dc98ad71d..acb651acc7fd 100644
--- a/tools/perf/util/dso.h
+++ b/tools/perf/util/dso.h
@@ -91,14 +91,17 @@ struct dso_cache {
 };
 
 /*
- * DSOs are put into a list for fast iteration.
+ * DSOs are put into both a list for fast iteration and rbtree for fast
+ * long name lookup.
  */
 struct dsos {
 	struct list_head head;
+	struct rb_root	 root;	/* rbtree root sorted by long name */
 };
 
 struct dso {
 	struct list_head node;
+	struct rb_node	 rb_node;	/* rbtree node sorted by long name */
 	struct rb_root	 symbols[MAP__NR_TYPES];
 	struct rb_root	 symbol_names[MAP__NR_TYPES];
 	void		 *a2l;
diff --git a/tools/perf/util/machine.c b/tools/perf/util/machine.c
index 49a75ec4c47b..b7d477fbda02 100644
--- a/tools/perf/util/machine.c
+++ b/tools/perf/util/machine.c
@@ -77,6 +77,7 @@ static void dsos__delete(struct dsos *dsos)
 	struct dso *pos, *n;
 
 	list_for_each_entry_safe(pos, n, &dsos->head, node) {
+		RB_CLEAR_NODE(&pos->rb_node);
 		list_del(&pos->node);
 		dso__delete(pos);
 	}
-- 
1.9.3


^ permalink raw reply related	[flat|nested] 14+ messages in thread

* [PATCH 7/8] perf tools: Fix build breakage on arm64 targets
  2014-10-01 19:50 [GIT PULL 0/8] perf/core improvements and fixes Arnaldo Carvalho de Melo
                   ` (5 preceding siblings ...)
  2014-10-01 19:50 ` [PATCH 6/8] perf symbols: Improve DSO long names lookup speed with rbtree Arnaldo Carvalho de Melo
@ 2014-10-01 19:50 ` Arnaldo Carvalho de Melo
  2014-10-01 19:50 ` [PATCH 8/8] perf record: Fix error message for --filter option not coming after tracepoint Arnaldo Carvalho de Melo
  2014-10-03  3:31 ` [GIT PULL 0/8] perf/core improvements and fixes Ingo Molnar
  8 siblings, 0 replies; 14+ messages in thread
From: Arnaldo Carvalho de Melo @ 2014-10-01 19:50 UTC (permalink / raw)
  To: Ingo Molnar
  Cc: linux-kernel, Will Deacon, Jean Pihet, Jiri Olsa,
	linux-arm-kernel, Arnaldo Carvalho de Melo

From: Will Deacon <will.deacon@arm.com>

Attempting to build the perf tool for an arm64 target results in the
following failure:

  arch/arm64/util/unwind-libunwind.c: In function 'libunwind__arch_reg_id':
  arch/arm64/util/unwind-libunwind.c:77:3: error: implicit declaration of function 'pr_err'
     pr_err("unwind: invalid reg id %d\n", regnum);
     ^
  arch/arm64/util/unwind-libunwind.c:77:3: error: nested extern declaration of 'pr_err'

This is due to commit 84f5d36f4866 ("perf tools: Move pr_* debug macros
into debug object") moving the pr_* macros into a new header file, but
failing to update architectures other than x86.

This patch adds the missing include, and fixes the build again.

Signed-off-by: Will Deacon <will.deacon@arm.com>
Cc: Jean Pihet <jean.pihet@linaro.org>
Cc: Jiri Olsa <jolsa@kernel.org>
Cc: linux-arm-kernel@lists.infradead.org
Link: http://lkml.kernel.org/r/1412076432-22045-1-git-send-email-will.deacon@arm.com
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
---
 tools/perf/arch/arm64/util/unwind-libunwind.c | 1 +
 1 file changed, 1 insertion(+)

diff --git a/tools/perf/arch/arm64/util/unwind-libunwind.c b/tools/perf/arch/arm64/util/unwind-libunwind.c
index 436ee43859dc..a87afa91a99e 100644
--- a/tools/perf/arch/arm64/util/unwind-libunwind.c
+++ b/tools/perf/arch/arm64/util/unwind-libunwind.c
@@ -3,6 +3,7 @@
 #include <libunwind.h>
 #include "perf_regs.h"
 #include "../../util/unwind.h"
+#include "../../util/debug.h"
 
 int libunwind__arch_reg_id(int regnum)
 {
-- 
1.9.3


^ permalink raw reply related	[flat|nested] 14+ messages in thread

* [PATCH 8/8] perf record: Fix error message for --filter option not coming after tracepoint
  2014-10-01 19:50 [GIT PULL 0/8] perf/core improvements and fixes Arnaldo Carvalho de Melo
                   ` (6 preceding siblings ...)
  2014-10-01 19:50 ` [PATCH 7/8] perf tools: Fix build breakage on arm64 targets Arnaldo Carvalho de Melo
@ 2014-10-01 19:50 ` Arnaldo Carvalho de Melo
  2014-10-03  3:31 ` [GIT PULL 0/8] perf/core improvements and fixes Ingo Molnar
  8 siblings, 0 replies; 14+ messages in thread
From: Arnaldo Carvalho de Melo @ 2014-10-01 19:50 UTC (permalink / raw)
  To: Ingo Molnar
  Cc: linux-kernel, Arnaldo Carvalho de Melo, Adrian Hunter,
	Andi Kleen, David Ahern, Don Zickus, Frederic Weisbecker,
	Jiri Olsa, Mike Galbraith, Namhyung Kim, Paul Mackerras,
	Peter Zijlstra, Stephane Eranian

From: Arnaldo Carvalho de Melo <acme@redhat.com>

  [root@zoo ~]# perf record --filter "common_pid != PERF_PID" -a
  -F option should follow a -e tracepoint option.

The -F option is for --freq, not --filter. Fix it up to show:

  [root@zoo ~]# perf record --filter "common_pid != PERF_PID" -a
  --filter option should follow a -e tracepoint option

Cc: Adrian Hunter <adrian.hunter@intel.com>
Cc: Andi Kleen <andi@firstfloor.org>
Cc: David Ahern <dsahern@gmail.com>
Cc: Don Zickus <dzickus@redhat.com>
Cc: Frederic Weisbecker <fweisbec@gmail.com>
Cc: Jiri Olsa <jolsa@redhat.com>
Cc: Mike Galbraith <efault@gmx.de>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Paul Mackerras <paulus@samba.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Stephane Eranian <eranian@google.com>
Link: http://lkml.kernel.org/n/tip-z0yrm8stn9w3423nkov3eksg@git.kernel.org
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
---
 tools/perf/util/parse-events.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/tools/perf/util/parse-events.c b/tools/perf/util/parse-events.c
index 9522cf22ad81..d76aa30cb1fb 100644
--- a/tools/perf/util/parse-events.c
+++ b/tools/perf/util/parse-events.c
@@ -984,7 +984,7 @@ int parse_filter(const struct option *opt, const char *str,
 
 	if (last == NULL || last->attr.type != PERF_TYPE_TRACEPOINT) {
 		fprintf(stderr,
-			"-F option should follow a -e tracepoint option\n");
+			"--filter option should follow a -e tracepoint option\n");
 		return -1;
 	}
 
-- 
1.9.3


^ permalink raw reply related	[flat|nested] 14+ messages in thread

* Re: [GIT PULL 0/8] perf/core improvements and fixes
  2014-10-01 19:50 [GIT PULL 0/8] perf/core improvements and fixes Arnaldo Carvalho de Melo
                   ` (7 preceding siblings ...)
  2014-10-01 19:50 ` [PATCH 8/8] perf record: Fix error message for --filter option not coming after tracepoint Arnaldo Carvalho de Melo
@ 2014-10-03  3:31 ` Ingo Molnar
  8 siblings, 0 replies; 14+ messages in thread
From: Ingo Molnar @ 2014-10-03  3:31 UTC (permalink / raw)
  To: Arnaldo Carvalho de Melo
  Cc: linux-kernel, Adrian Hunter, Andi Kleen, Chang Hyun Park,
	David Ahern, Davidlohr Bueso, Don Zickus, Douglas Hatch,
	Frederic Weisbecker, H . Peter Anvin, Jean Pihet, Jiri Olsa,
	linux-arm-kernel, Matt Fleming, Mike Galbraith, Namhyung Kim,
	Paul Mackerras, Peter Zijlstra, Scott J Norton, Stephane Eranian,
	Thomas Gleixner, Waiman Long, Will Deacon,
	Arnaldo Carvalho de Melo, Peter Zijlstra


* Arnaldo Carvalho de Melo <acme@kernel.org> wrote:

> Hi Ingo,
> 
> 	Please consider pulling,
> 
> Best Regards,
> 
> - Arnaldo
> 
> The following changes since commit 07394b5f13a04f86b27e0ddd96a36c7d9bfe1a4f:
> 
>   Merge tag 'perf-core-for-mingo' of git://git.kernel.org/pub/scm/linux/kernel/git/acme/linux into perf/core (2014-09-27 09:15:48 +0200)
> 
> are available in the git repository at:
> 
> 
>   git://git.kernel.org/pub/scm/linux/kernel/git/acme/linux.git tags/perf-core-for-mingo
> 
> for you to fetch changes up to 281f92f233a59ef52bb45287242bd815a67f5647:
> 
>   perf record: Fix error message for --filter option not coming after tracepoint (2014-10-01 15:05:32 -0300)
> 
> ----------------------------------------------------------------
> perf/core improvements and fixes:
> 
> User visible:
> 
> . Fix mmap return address truncation to 32-bit in 'perf trace'. (Chang Hyun Park)
> 
> o Support operations for shared futexes. (Davidlohr Bueso)
> 
> . Fix error message for --filter option not coming after tracepoint. (Arnaldo Carvalho de Melo)
> 
> Infrastructure:
> 
> . Refactor unit and scale function parameters for pmu parsing routines. (Matt Fleming)
> 
> . Improve DSO long names lookup with rbtree, resulting in great speedup for
>   workloads with lots of DSOs. (Waiman Long)
> 
> . Fix build breakage on arm64 targets. (Will Deacon)
> 
> Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
> 
> ----------------------------------------------------------------
> Arnaldo Carvalho de Melo (1):
>       perf record: Fix error message for --filter option not coming after tracepoint
> 
> Chang Hyun Park (1):
>       perf trace: Fix mmap return address truncation to 32-bit
> 
> Davidlohr Bueso (2):
>       perf bench futex: Support operations for shared futexes
>       perf bench futex: Sanitize -q option in requeue
> 
> Matt Fleming (1):
>       perf tools: Refactor unit and scale function parameters
> 
> Waiman Long (2):
>       perf symbols: Encapsulate dsos list head into struct dsos
>       perf symbols: Improve DSO long names lookup speed with rbtree
> 
> Will Deacon (1):
>       perf tools: Fix build breakage on arm64 targets
> 
>  tools/perf/arch/arm64/util/unwind-libunwind.c |  1 +
>  tools/perf/bench/futex-hash.c                 |  7 ++-
>  tools/perf/bench/futex-requeue.c              | 28 +++++----
>  tools/perf/bench/futex-wake.c                 | 15 +++--
>  tools/perf/builtin-trace.c                    |  6 +-
>  tools/perf/util/dso.c                         | 85 +++++++++++++++++++++++----
>  tools/perf/util/dso.h                         | 16 ++++-
>  tools/perf/util/header.c                      | 32 +++++-----
>  tools/perf/util/machine.c                     | 25 ++++----
>  tools/perf/util/machine.h                     |  5 +-
>  tools/perf/util/parse-events.c                | 11 ++--
>  tools/perf/util/pmu.c                         | 38 +++++++-----
>  tools/perf/util/pmu.h                         |  7 ++-
>  tools/perf/util/probe-event.c                 |  3 +-
>  tools/perf/util/symbol-elf.c                  |  7 ++-
>  15 files changed, 200 insertions(+), 86 deletions(-)

Pulled into tip:perf/core, thanks a lot Arnaldo!

	Ingo

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [PATCH 6/8] perf symbols: Improve DSO long names lookup speed with rbtree
  2014-10-01 19:50 ` [PATCH 6/8] perf symbols: Improve DSO long names lookup speed with rbtree Arnaldo Carvalho de Melo
@ 2014-10-14  9:09   ` Jiri Olsa
  2014-10-14 17:34     ` Arnaldo Carvalho de Melo
  0 siblings, 1 reply; 14+ messages in thread
From: Jiri Olsa @ 2014-10-14  9:09 UTC (permalink / raw)
  To: Arnaldo Carvalho de Melo
  Cc: Ingo Molnar, linux-kernel, Waiman Long, Adrian Hunter,
	Don Zickus, Douglas Hatch, Ingo Molnar, Jiri Olsa, Namhyung Kim,
	Paul Mackerras, Peter Zijlstra, Scott J Norton,
	Arnaldo Carvalho de Melo

On Wed, Oct 01, 2014 at 04:50:41PM -0300, Arnaldo Carvalho de Melo wrote:
> From: Waiman Long <Waiman.Long@hp.com>
> 
> With workload that spawns and destroys many threads and processes, it
> was found that perf-mem could took a long time to post-process the perf
> data after the target workload had completed its operation.
> 
> The performance bottleneck was found to be the lookup and insertion of
> the new DSO structures (thousands of them in this case).

this change segfaults (below) some tests, but only if I compiled
without DEBUG when I revert this commit, I can no longer reproduce..

jirka

(gdb) set follow-fork-mode child
(gdb) r test 31
Starting program: /home/jolsa/kernel.org/linux-perf/tools/perf/perf test 31
warning: section  not found in /usr/lib/debug/lib/modules/3.16.3-200.fc20.x86_64/vdso/vdso64.so.debug
[Thread debugging using libthread_db enabled]
Using host libthread_db library "/lib64/libthread_db.so.1".
31: Test output sorting of hist entries                    :[New process 15477]
[Thread debugging using libthread_db enabled]
Using host libthread_db library "/lib64/libthread_db.so.1".

Program received signal SIGSEGV, Segmentation fault.
[Switching to Thread 0x7ffff7b9d7c0 (LWP 15477)]
__strcmp_ssse3 () at ../sysdeps/x86_64/strcmp.S:210
210             movlpd  (%rsi), %xmm2
(gdb) bt
#0  __strcmp_ssse3 () at ../sysdeps/x86_64/strcmp.S:210
#1  0x0000000000477967 in dso__findlink_by_longname (name=<optimized out>, dso=0x0, root=0x7fffffffdbf0)
    at util/dso.c:674
#2  dso__find_by_longname (name=0x7fffffffcae8 "perf", root=0x7fffffffdbf0) at util/dso.c:712
#3  dsos__find (cmp_short=false, name=0x7fffffffcae8 "perf", dsos=0x7fffffffdbe0) at util/dso.c:935
#4  __dsos__findnew (dsos=dsos@entry=0x7fffffffdbe0, name=name@entry=0x7fffffffcae8 "perf") at util/dso.c:940
#5  0x00000000004915d9 in map__new (machine=machine@entry=0x7fffffffdb90, start=4194304, len=1048576, pgoff=0, 
    pid=<optimized out>, d_maj=d_maj@entry=0, d_min=d_min@entry=0, ino=ino@entry=0, ino_gen=ino_gen@entry=0, 
    prot=prot@entry=0, flags=flags@entry=0, filename=filename@entry=0x7fffffffcae8 "perf", type=MAP__FUNCTION, 
    thread=thread@entry=0x90d1f0) at util/map.c:180
#6  0x00000000004900f4 in machine__process_mmap_event (machine=machine@entry=0x7fffffffdb90, 
    event=event@entry=0x7fffffffcac0, sample=sample@entry=0x0) at util/machine.c:1182
#7  0x00000000004d12bb in setup_fake_machine (machines=machines@entry=0x7fffffffdb90)
    at tests/hists_common.c:116
#8  0x00000000004d4478 in test__hists_output () at tests/hists_output.c:600
#9  0x0000000000448fe4 in run_test (test=0x8166a0 <tests+480>) at tests/builtin-test.c:210
#10 __cmd_test (skiplist=0x0, argv=0x7fffffffe2d0, argc=1) at tests/builtin-test.c:255
#11 cmd_test (argc=1, argv=0x7fffffffe2d0, prefix=<optimized out>) at tests/builtin-test.c:320
#12 0x000000000041c8f5 in run_builtin (p=p@entry=0x814fc0 <commands+480>, argc=argc@entry=2, 
    argv=argv@entry=0x7fffffffe2d0) at perf.c:331
#13 0x000000000041c110 in handle_internal_command (argv=0x7fffffffe2d0, argc=2) at perf.c:390
#14 run_argv (argv=0x7fffffffe050, argcp=0x7fffffffe05c) at perf.c:434
#15 main (argc=2, argv=0x7fffffffe2d0) at perf.c:549
(gdb) 

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [PATCH 6/8] perf symbols: Improve DSO long names lookup speed with rbtree
  2014-10-14  9:09   ` Jiri Olsa
@ 2014-10-14 17:34     ` Arnaldo Carvalho de Melo
  2014-10-14 18:03       ` Arnaldo Carvalho de Melo
  0 siblings, 1 reply; 14+ messages in thread
From: Arnaldo Carvalho de Melo @ 2014-10-14 17:34 UTC (permalink / raw)
  To: Jiri Olsa
  Cc: Ingo Molnar, linux-kernel, Waiman Long, Adrian Hunter,
	Don Zickus, Douglas Hatch, Ingo Molnar, Jiri Olsa, Namhyung Kim,
	Paul Mackerras, Peter Zijlstra, Scott J Norton

Em Tue, Oct 14, 2014 at 11:09:58AM +0200, Jiri Olsa escreveu:
> On Wed, Oct 01, 2014 at 04:50:41PM -0300, Arnaldo Carvalho de Melo wrote:
> > From: Waiman Long <Waiman.Long@hp.com>
> > 
> > With workload that spawns and destroys many threads and processes, it
> > was found that perf-mem could took a long time to post-process the perf
> > data after the target workload had completed its operation.
> > 
> > The performance bottleneck was found to be the lookup and insertion of
> > the new DSO structures (thousands of them in this case).
> 
> this change segfaults (below) some tests, but only if I compiled
> without DEBUG when I revert this commit, I can no longer reproduce..

Reproduced, looking at it... 
 
> jirka
> 
> (gdb) set follow-fork-mode child
> (gdb) r test 31
> Starting program: /home/jolsa/kernel.org/linux-perf/tools/perf/perf test 31
> warning: section  not found in /usr/lib/debug/lib/modules/3.16.3-200.fc20.x86_64/vdso/vdso64.so.debug
> [Thread debugging using libthread_db enabled]
> Using host libthread_db library "/lib64/libthread_db.so.1".
> 31: Test output sorting of hist entries                    :[New process 15477]
> [Thread debugging using libthread_db enabled]
> Using host libthread_db library "/lib64/libthread_db.so.1".
> 
> Program received signal SIGSEGV, Segmentation fault.
> [Switching to Thread 0x7ffff7b9d7c0 (LWP 15477)]
> __strcmp_ssse3 () at ../sysdeps/x86_64/strcmp.S:210
> 210             movlpd  (%rsi), %xmm2
> (gdb) bt
> #0  __strcmp_ssse3 () at ../sysdeps/x86_64/strcmp.S:210
> #1  0x0000000000477967 in dso__findlink_by_longname (name=<optimized out>, dso=0x0, root=0x7fffffffdbf0)
>     at util/dso.c:674
> #2  dso__find_by_longname (name=0x7fffffffcae8 "perf", root=0x7fffffffdbf0) at util/dso.c:712
> #3  dsos__find (cmp_short=false, name=0x7fffffffcae8 "perf", dsos=0x7fffffffdbe0) at util/dso.c:935
> #4  __dsos__findnew (dsos=dsos@entry=0x7fffffffdbe0, name=name@entry=0x7fffffffcae8 "perf") at util/dso.c:940
> #5  0x00000000004915d9 in map__new (machine=machine@entry=0x7fffffffdb90, start=4194304, len=1048576, pgoff=0, 
>     pid=<optimized out>, d_maj=d_maj@entry=0, d_min=d_min@entry=0, ino=ino@entry=0, ino_gen=ino_gen@entry=0, 
>     prot=prot@entry=0, flags=flags@entry=0, filename=filename@entry=0x7fffffffcae8 "perf", type=MAP__FUNCTION, 
>     thread=thread@entry=0x90d1f0) at util/map.c:180
> #6  0x00000000004900f4 in machine__process_mmap_event (machine=machine@entry=0x7fffffffdb90, 
>     event=event@entry=0x7fffffffcac0, sample=sample@entry=0x0) at util/machine.c:1182
> #7  0x00000000004d12bb in setup_fake_machine (machines=machines@entry=0x7fffffffdb90)
>     at tests/hists_common.c:116
> #8  0x00000000004d4478 in test__hists_output () at tests/hists_output.c:600
> #9  0x0000000000448fe4 in run_test (test=0x8166a0 <tests+480>) at tests/builtin-test.c:210
> #10 __cmd_test (skiplist=0x0, argv=0x7fffffffe2d0, argc=1) at tests/builtin-test.c:255
> #11 cmd_test (argc=1, argv=0x7fffffffe2d0, prefix=<optimized out>) at tests/builtin-test.c:320
> #12 0x000000000041c8f5 in run_builtin (p=p@entry=0x814fc0 <commands+480>, argc=argc@entry=2, 
>     argv=argv@entry=0x7fffffffe2d0) at perf.c:331
> #13 0x000000000041c110 in handle_internal_command (argv=0x7fffffffe2d0, argc=2) at perf.c:390
> #14 run_argv (argv=0x7fffffffe050, argcp=0x7fffffffe05c) at perf.c:434
> #15 main (argc=2, argv=0x7fffffffe2d0) at perf.c:549
> (gdb) 

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [PATCH 6/8] perf symbols: Improve DSO long names lookup speed with rbtree
  2014-10-14 17:34     ` Arnaldo Carvalho de Melo
@ 2014-10-14 18:03       ` Arnaldo Carvalho de Melo
  2014-10-15 10:05         ` [tip:perf/urgent] perf machine: Add missing dsos-> root rbtree root initialization tip-bot for Arnaldo Carvalho de Melo
  0 siblings, 1 reply; 14+ messages in thread
From: Arnaldo Carvalho de Melo @ 2014-10-14 18:03 UTC (permalink / raw)
  To: Jiri Olsa
  Cc: Ingo Molnar, linux-kernel, Waiman Long, Adrian Hunter,
	Don Zickus, Douglas Hatch, Ingo Molnar, Jiri Olsa, Namhyung Kim,
	Paul Mackerras, Peter Zijlstra, Scott J Norton

Em Tue, Oct 14, 2014 at 02:34:03PM -0300, Arnaldo Carvalho de Melo escreveu:
> Em Tue, Oct 14, 2014 at 11:09:58AM +0200, Jiri Olsa escreveu:
> > On Wed, Oct 01, 2014 at 04:50:41PM -0300, Arnaldo Carvalho de Melo wrote:
> > > From: Waiman Long <Waiman.Long@hp.com>

> > > With workload that spawns and destroys many threads and processes, it
> > > was found that perf-mem could took a long time to post-process the perf
> > > data after the target workload had completed its operation.

> > > The performance bottleneck was found to be the lookup and insertion of
> > > the new DSO structures (thousands of them in this case).

> > this change segfaults (below) some tests, but only if I compiled
> > without DEBUG when I revert this commit, I can no longer reproduce..
 
> Reproduced, looking at it... 

Fixed, this happens because we end up using a struct machines on the
stack, and then machines__init() was not initializing the newly
introduced rb_root, just the existing list_head.

When we introduced struct dsos, to group the two ways to store dsos,
i.e. the linked list and the rbtree, we didn't turned the initialization
done in machines__init(machines->host) -> machine__init() ->
INIT_LIST_HEAD into a dsos__init() to keep on initializing the list_head
but _as well_ initializing the rb_root, oops. All worked because outside
perf-test we probably zalloc the whole thing which ends up initializing
it in to NULL.

So the problem looks contained to 'perf test' that uses it on stack,
etc.

Will add your Reported-by, but if you're quick, I can give you a
Tested-by too ;-)

- Arnaldo

diff --git a/tools/perf/util/machine.c b/tools/perf/util/machine.c
index b7d477fbda02..34fc7c8672e4 100644
--- a/tools/perf/util/machine.c
+++ b/tools/perf/util/machine.c
@@ -13,12 +13,18 @@
 #include <symbol/kallsyms.h>
 #include "unwind.h"
 
+static void dsos__init(struct dsos *dsos)
+{
+	INIT_LIST_HEAD(&dsos->head);
+	dsos->root = RB_ROOT;
+}
+
 int machine__init(struct machine *machine, const char *root_dir, pid_t pid)
 {
 	map_groups__init(&machine->kmaps);
 	RB_CLEAR_NODE(&machine->rb_node);
-	INIT_LIST_HEAD(&machine->user_dsos.head);
-	INIT_LIST_HEAD(&machine->kernel_dsos.head);
+	dsos__init(&machine->user_dsos);
+	dsos__init(&machine->kernel_dsos);
 
 	machine->threads = RB_ROOT;
 	INIT_LIST_HEAD(&machine->dead_threads);

^ permalink raw reply related	[flat|nested] 14+ messages in thread

* [tip:perf/urgent] perf machine: Add missing dsos-> root rbtree root initialization
  2014-10-14 18:03       ` Arnaldo Carvalho de Melo
@ 2014-10-15 10:05         ` tip-bot for Arnaldo Carvalho de Melo
  0 siblings, 0 replies; 14+ messages in thread
From: tip-bot for Arnaldo Carvalho de Melo @ 2014-10-15 10:05 UTC (permalink / raw)
  To: linux-tip-commits
  Cc: mingo, dzickus, paulus, jolsa, tglx, hpa, acme, namhyung,
	linux-kernel, doug.hatch, peterz, jolsa, scott.norton, mingo

Commit-ID:  e167f995e26249aa93708589c5eea539652351fa
Gitweb:     http://git.kernel.org/tip/e167f995e26249aa93708589c5eea539652351fa
Author:     Arnaldo Carvalho de Melo <acme@redhat.com>
AuthorDate: Tue, 14 Oct 2014 15:07:48 -0300
Committer:  Arnaldo Carvalho de Melo <acme@redhat.com>
CommitDate: Tue, 14 Oct 2014 17:50:44 -0300

perf machine: Add missing dsos->root rbtree root initialization

A segfault happens on 'perf test hists_link' because we end up using a
struct machines on the stack, and then machines__init() was not
initializing the newly introduced rb_root, just the existing list_head.

When we introduced struct dsos, to group the two ways to store dsos,
i.e. the linked list and the rbtree, we didn't turned the initialization
done in:

	machines__init(machines->host) ->
		machine__init() ->
			INIT_LIST_HEAD

into a dsos__init() to keep on initializing the list_head but _as well_
initializing the rb_root, oops.

All worked because outside perf-test we probably zalloc the whole thing
which ends up initializing it in to NULL.

So the problem looks contained to 'perf test' that uses it on stack,
etc.

Reported-by: Jiri Olsa <jolsa@redhat.com>
Acked-by: Waiman Long <Waiman.Long@hp.com>,
Cc: Adrian Hunter <adrian.hunter@intel.com>,
Cc: Don Zickus <dzickus@redhat.com>
Cc: Douglas Hatch <doug.hatch@hp.com>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: Jiri Olsa <jolsa@kernel.org>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Paul Mackerras <paulus@samba.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Scott J Norton <scott.norton@hp.com>
Cc: Waiman Long <Waiman.Long@hp.com>,
Link: http://lkml.kernel.org/r/20141014180353.GF3198@kernel.org
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
---
 tools/perf/util/machine.c | 10 ++++++++--
 1 file changed, 8 insertions(+), 2 deletions(-)

diff --git a/tools/perf/util/machine.c b/tools/perf/util/machine.c
index b7d477f..34fc7c8 100644
--- a/tools/perf/util/machine.c
+++ b/tools/perf/util/machine.c
@@ -13,12 +13,18 @@
 #include <symbol/kallsyms.h>
 #include "unwind.h"
 
+static void dsos__init(struct dsos *dsos)
+{
+	INIT_LIST_HEAD(&dsos->head);
+	dsos->root = RB_ROOT;
+}
+
 int machine__init(struct machine *machine, const char *root_dir, pid_t pid)
 {
 	map_groups__init(&machine->kmaps);
 	RB_CLEAR_NODE(&machine->rb_node);
-	INIT_LIST_HEAD(&machine->user_dsos.head);
-	INIT_LIST_HEAD(&machine->kernel_dsos.head);
+	dsos__init(&machine->user_dsos);
+	dsos__init(&machine->kernel_dsos);
 
 	machine->threads = RB_ROOT;
 	INIT_LIST_HEAD(&machine->dead_threads);

^ permalink raw reply related	[flat|nested] 14+ messages in thread

end of thread, other threads:[~2014-10-15 10:06 UTC | newest]

Thread overview: 14+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2014-10-01 19:50 [GIT PULL 0/8] perf/core improvements and fixes Arnaldo Carvalho de Melo
2014-10-01 19:50 ` [PATCH 1/8] perf tools: Refactor unit and scale function parameters Arnaldo Carvalho de Melo
2014-10-01 19:50 ` [PATCH 2/8] perf trace: Fix mmap return address truncation to 32-bit Arnaldo Carvalho de Melo
2014-10-01 19:50 ` [PATCH 3/8] perf bench futex: Support operations for shared futexes Arnaldo Carvalho de Melo
2014-10-01 19:50 ` [PATCH 4/8] perf bench futex: Sanitize -q option in requeue Arnaldo Carvalho de Melo
2014-10-01 19:50 ` [PATCH 5/8] perf symbols: Encapsulate dsos list head into struct dsos Arnaldo Carvalho de Melo
2014-10-01 19:50 ` [PATCH 6/8] perf symbols: Improve DSO long names lookup speed with rbtree Arnaldo Carvalho de Melo
2014-10-14  9:09   ` Jiri Olsa
2014-10-14 17:34     ` Arnaldo Carvalho de Melo
2014-10-14 18:03       ` Arnaldo Carvalho de Melo
2014-10-15 10:05         ` [tip:perf/urgent] perf machine: Add missing dsos-> root rbtree root initialization tip-bot for Arnaldo Carvalho de Melo
2014-10-01 19:50 ` [PATCH 7/8] perf tools: Fix build breakage on arm64 targets Arnaldo Carvalho de Melo
2014-10-01 19:50 ` [PATCH 8/8] perf record: Fix error message for --filter option not coming after tracepoint Arnaldo Carvalho de Melo
2014-10-03  3:31 ` [GIT PULL 0/8] perf/core improvements and fixes Ingo Molnar

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).