linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [PATCH RFC 00/10] perf top optimization
@ 2017-09-07 17:55 kan.liang
  2017-09-07 17:55 ` [PATCH RFC 01/10] perf tools: hashtable for machine threads kan.liang
                   ` (9 more replies)
  0 siblings, 10 replies; 15+ messages in thread
From: kan.liang @ 2017-09-07 17:55 UTC (permalink / raw)
  To: acme, peterz, mingo, linux-kernel
  Cc: jolsa, namhyung, adrian.hunter, lukasz.odzioba, ak, Kan Liang

From: Kan Liang <kan.liang@intel.com>

The patch series intends to fix the severe performance issue in
Knights Landing/Mill, when monitoring in heavy load system.
perf top costs a few minutes to show the result, which is
unacceptable.
With the patch series applied, the latency will reduces to
several seconds.

machine__synthesize_threads and perf_top__mmap_read costs most of
the perf top time (> 99%).
Patch 1-9 do the optimization for machine__synthesize_threads.
Patch 10 does the optimization for perf_top__mmap_read.

Optimization for machine__synthesize_threads
  - Multithreading the whole process.
  - The threads number is set to the max online CPU# by default.
    User can change the threads number through the new option.
  - Introduces hashtable for machine threads to reduce the lock
    contention.
  - The optimization can also benefit other platforms and other
    perf tools, like perf record. But this patch series doesn't
    do the optimization for other tools. It can be done later
    separately.
  - With this optimization applied, there is a 1.56x speedup in
    Knights Mill with heavy workload.

Optimization for perf_top__mmap_read
  - switch back to overwrite mode
    For non overwrite mode, it tries to read everything in the ring buffer
    and does not check the messup. Once there are lots of samples delivered
    shortly, the processing time could be very long.
    Considering the real time requirement for perf top, it should switch
    back to overwrite mode.
  - With this optimization applied, there is a huge 53.8x speedup in
    Knights Mill with heavy workload.
  - With this optimization applied, the latency of perf_top__mmap_read is
    less than the default perf top fresh time (2s) in Knights Mill with
    heavy workload.

Here are the perf top latency test result on Knights Mill and Skylake server

The heavy workload is to compile Linux kernel as below
"sudo nice make -j$(grep -c '^processor' /proc/cpuinfo)"
Then, "sudo perf top"

The latency period is the time between perf top launched and the first
profiling result shown.

- Latency on Knights Mill (272 CPUs)

Original(s)     With patch(s)   Speedup
272.68          16.48           16.54x

- Latency on Skylake server (192 CPUs)

Original(s)     With patch(s)   Speedup
12.28           2.96            4.15x

Kan Liang (10):
  perf tools: hashtable for machine threads
  perf tools: using scandir to replace readdir
  petf tools: using comm_str to replace comm in hist_entry
  petf tools: introduce a new function to set namespaces id
  perf tools: lock to protect thread list
  perf tools: lock to protect comm_str rb tree
  perf tools: change machine comm_exec type to atomic
  perf top: implement multithreading for perf_event__synthesize_threads
  perf top: add option to set the number of thread for event synthesize
  perf top: switch back to overwrite mode

 tools/perf/builtin-kvm.c              |   3 +-
 tools/perf/builtin-record.c           |   2 +-
 tools/perf/builtin-top.c              |   9 +-
 tools/perf/builtin-trace.c            |  21 +++--
 tools/perf/tests/mmap-thread-lookup.c |   2 +-
 tools/perf/ui/browsers/hists.c        |   2 +-
 tools/perf/util/comm.c                |  18 +++-
 tools/perf/util/event.c               | 142 ++++++++++++++++++++++++------
 tools/perf/util/event.h               |  14 ++-
 tools/perf/util/evlist.c              |   5 +-
 tools/perf/util/hist.c                |  10 +--
 tools/perf/util/machine.c             | 158 +++++++++++++++++++++-------------
 tools/perf/util/machine.h             |  34 ++++++--
 tools/perf/util/rb_resort.h           |   5 +-
 tools/perf/util/sort.c                |   8 +-
 tools/perf/util/sort.h                |   2 +-
 tools/perf/util/thread.c              |  68 ++++++++++++---
 tools/perf/util/thread.h              |   6 +-
 tools/perf/util/top.h                 |   1 +
 19 files changed, 368 insertions(+), 142 deletions(-)

-- 
2.5.5

^ permalink raw reply	[flat|nested] 15+ messages in thread

* [PATCH RFC 01/10] perf tools: hashtable for machine threads
  2017-09-07 17:55 [PATCH RFC 00/10] perf top optimization kan.liang
@ 2017-09-07 17:55 ` kan.liang
  2017-09-08 14:18   ` Arnaldo Carvalho de Melo
  2017-09-07 17:55 ` [PATCH RFC 02/10] perf tools: using scandir to replace readdir kan.liang
                   ` (8 subsequent siblings)
  9 siblings, 1 reply; 15+ messages in thread
From: kan.liang @ 2017-09-07 17:55 UTC (permalink / raw)
  To: acme, peterz, mingo, linux-kernel
  Cc: jolsa, namhyung, adrian.hunter, lukasz.odzioba, ak, Kan Liang

From: Kan Liang <kan.liang@intel.com>

To process any events, it needs to find the thread in the machine first.
The machine maintains a rb tree to store all threads. The rb tree is
protected by a rw lock.
It is not a problem for current perf which serially processing events.
However, it will have scalability performance issue to process events in
parallel, especially on a heave load system which have many threads.

Introduce a hashtable to divide the big rb tree into many samll rb tree
for threads. The index is thread id % hashtable size. It can reduce the
lock contention.

Signed-off-by: Kan Liang <kan.liang@intel.com>
---
 tools/perf/builtin-trace.c  |  19 +++---
 tools/perf/util/machine.c   | 139 ++++++++++++++++++++++++++++----------------
 tools/perf/util/machine.h   |  23 ++++++--
 tools/perf/util/rb_resort.h |   5 +-
 4 files changed, 120 insertions(+), 66 deletions(-)

diff --git a/tools/perf/builtin-trace.c b/tools/perf/builtin-trace.c
index 771ddab..af4e4e4 100644
--- a/tools/perf/builtin-trace.c
+++ b/tools/perf/builtin-trace.c
@@ -2730,20 +2730,23 @@ DEFINE_RESORT_RB(threads, (thread__nr_events(a->thread->priv) < thread__nr_event
 
 static size_t trace__fprintf_thread_summary(struct trace *trace, FILE *fp)
 {
-	DECLARE_RESORT_RB_MACHINE_THREADS(threads, trace->host);
 	size_t printed = trace__fprintf_threads_header(fp);
 	struct rb_node *nd;
+	int i;
 
-	if (threads == NULL) {
-		fprintf(fp, "%s", "Error sorting output by nr_events!\n");
-		return 0;
-	}
+	for (i = 0; i < MACHINE_TH_TABLE_SIZE; i++) {
+		DECLARE_RESORT_RB_MACHINE_THREADS(threads, trace->host, i);
 
-	resort_rb__for_each_entry(nd, threads)
-		printed += trace__fprintf_thread(fp, threads_entry->thread, trace);
+		if (threads == NULL) {
+			fprintf(fp, "%s", "Error sorting output by nr_events!\n");
+			return 0;
+		}
 
-	resort_rb__delete(threads);
+		resort_rb__for_each_entry(nd, threads)
+			printed += trace__fprintf_thread(fp, threads_entry->thread, trace);
 
+		resort_rb__delete(threads);
+	}
 	return printed;
 }
 
diff --git a/tools/perf/util/machine.c b/tools/perf/util/machine.c
index df70936..5e451f9 100644
--- a/tools/perf/util/machine.c
+++ b/tools/perf/util/machine.c
@@ -33,6 +33,21 @@ static void dsos__init(struct dsos *dsos)
 	pthread_rwlock_init(&dsos->lock, NULL);
 }
 
+static void machine__thread_init(struct machine *machine)
+{
+	struct machine_th *machine_th;
+	int i;
+
+	for (i = 0; i < MACHINE_TH_TABLE_SIZE; i++) {
+		machine_th = &machine->threads[i];
+		machine_th->threads = RB_ROOT;
+		pthread_rwlock_init(&machine_th->threads_lock, NULL);
+		machine_th->nr_threads = 0;
+		INIT_LIST_HEAD(&machine_th->dead_threads);
+		machine_th->last_match = NULL;
+	}
+}
+
 int machine__init(struct machine *machine, const char *root_dir, pid_t pid)
 {
 	memset(machine, 0, sizeof(*machine));
@@ -40,11 +55,7 @@ int machine__init(struct machine *machine, const char *root_dir, pid_t pid)
 	RB_CLEAR_NODE(&machine->rb_node);
 	dsos__init(&machine->dsos);
 
-	machine->threads = RB_ROOT;
-	pthread_rwlock_init(&machine->threads_lock, NULL);
-	machine->nr_threads = 0;
-	INIT_LIST_HEAD(&machine->dead_threads);
-	machine->last_match = NULL;
+	machine__thread_init(machine);
 
 	machine->vdso_info = NULL;
 	machine->env = NULL;
@@ -140,28 +151,39 @@ static void dsos__exit(struct dsos *dsos)
 
 void machine__delete_threads(struct machine *machine)
 {
+	struct machine_th *machine_th;
 	struct rb_node *nd;
+	int i;
 
-	pthread_rwlock_wrlock(&machine->threads_lock);
-	nd = rb_first(&machine->threads);
-	while (nd) {
-		struct thread *t = rb_entry(nd, struct thread, rb_node);
+	for (i = 0; i < MACHINE_TH_TABLE_SIZE; i++) {
+		machine_th = &machine->threads[i];
+		pthread_rwlock_wrlock(&machine_th->threads_lock);
+		nd = rb_first(&machine_th->threads);
+		while (nd) {
+			struct thread *t = rb_entry(nd, struct thread, rb_node);
 
-		nd = rb_next(nd);
-		__machine__remove_thread(machine, t, false);
+			nd = rb_next(nd);
+			__machine__remove_thread(machine, t, false);
+		}
+		pthread_rwlock_unlock(&machine_th->threads_lock);
 	}
-	pthread_rwlock_unlock(&machine->threads_lock);
 }
 
 void machine__exit(struct machine *machine)
 {
+	struct machine_th *machine_th;
+	int i;
+
 	machine__destroy_kernel_maps(machine);
 	map_groups__exit(&machine->kmaps);
 	dsos__exit(&machine->dsos);
 	machine__exit_vdso(machine);
 	zfree(&machine->root_dir);
 	zfree(&machine->current_tid);
-	pthread_rwlock_destroy(&machine->threads_lock);
+	for (i = 0; i < MACHINE_TH_TABLE_SIZE; i++) {
+		machine_th = &machine->threads[i];
+		pthread_rwlock_destroy(&machine_th->threads_lock);
+	}
 }
 
 void machine__delete(struct machine *machine)
@@ -382,7 +404,8 @@ static struct thread *____machine__findnew_thread(struct machine *machine,
 						  pid_t pid, pid_t tid,
 						  bool create)
 {
-	struct rb_node **p = &machine->threads.rb_node;
+	struct machine_th *machine_th = machine_thread(machine, tid);
+	struct rb_node **p = &machine_th->threads.rb_node;
 	struct rb_node *parent = NULL;
 	struct thread *th;
 
@@ -391,14 +414,14 @@ static struct thread *____machine__findnew_thread(struct machine *machine,
 	 * so most of the time we dont have to look up
 	 * the full rbtree:
 	 */
-	th = machine->last_match;
+	th = machine_th->last_match;
 	if (th != NULL) {
 		if (th->tid == tid) {
 			machine__update_thread_pid(machine, th, pid);
 			return thread__get(th);
 		}
 
-		machine->last_match = NULL;
+		machine_th->last_match = NULL;
 	}
 
 	while (*p != NULL) {
@@ -406,7 +429,7 @@ static struct thread *____machine__findnew_thread(struct machine *machine,
 		th = rb_entry(parent, struct thread, rb_node);
 
 		if (th->tid == tid) {
-			machine->last_match = th;
+			machine_th->last_match = th;
 			machine__update_thread_pid(machine, th, pid);
 			return thread__get(th);
 		}
@@ -423,7 +446,7 @@ static struct thread *____machine__findnew_thread(struct machine *machine,
 	th = thread__new(pid, tid);
 	if (th != NULL) {
 		rb_link_node(&th->rb_node, parent, p);
-		rb_insert_color(&th->rb_node, &machine->threads);
+		rb_insert_color(&th->rb_node, &machine_th->threads);
 
 		/*
 		 * We have to initialize map_groups separately
@@ -434,7 +457,7 @@ static struct thread *____machine__findnew_thread(struct machine *machine,
 		 * leader and that would screwed the rb tree.
 		 */
 		if (thread__init_map_groups(th, machine)) {
-			rb_erase_init(&th->rb_node, &machine->threads);
+			rb_erase_init(&th->rb_node, &machine_th->threads);
 			RB_CLEAR_NODE(&th->rb_node);
 			thread__put(th);
 			return NULL;
@@ -443,8 +466,8 @@ static struct thread *____machine__findnew_thread(struct machine *machine,
 		 * It is now in the rbtree, get a ref
 		 */
 		thread__get(th);
-		machine->last_match = th;
-		++machine->nr_threads;
+		machine_th->last_match = th;
+		++machine_th->nr_threads;
 	}
 
 	return th;
@@ -458,21 +481,24 @@ struct thread *__machine__findnew_thread(struct machine *machine, pid_t pid, pid
 struct thread *machine__findnew_thread(struct machine *machine, pid_t pid,
 				       pid_t tid)
 {
+	struct machine_th *machine_th = machine_thread(machine, tid);
 	struct thread *th;
 
-	pthread_rwlock_wrlock(&machine->threads_lock);
+	pthread_rwlock_wrlock(&machine_th->threads_lock);
 	th = __machine__findnew_thread(machine, pid, tid);
-	pthread_rwlock_unlock(&machine->threads_lock);
+	pthread_rwlock_unlock(&machine_th->threads_lock);
 	return th;
 }
 
 struct thread *machine__find_thread(struct machine *machine, pid_t pid,
 				    pid_t tid)
 {
+	struct machine_th *machine_th = machine_thread(machine, tid);
 	struct thread *th;
-	pthread_rwlock_rdlock(&machine->threads_lock);
+
+	pthread_rwlock_rdlock(&machine_th->threads_lock);
 	th =  ____machine__findnew_thread(machine, pid, tid, false);
-	pthread_rwlock_unlock(&machine->threads_lock);
+	pthread_rwlock_unlock(&machine_th->threads_lock);
 	return th;
 }
 
@@ -719,21 +745,25 @@ size_t machine__fprintf_vmlinux_path(struct machine *machine, FILE *fp)
 
 size_t machine__fprintf(struct machine *machine, FILE *fp)
 {
-	size_t ret;
+	struct machine_th *machine_th;
 	struct rb_node *nd;
+	size_t ret;
+	int i;
 
-	pthread_rwlock_rdlock(&machine->threads_lock);
-
-	ret = fprintf(fp, "Threads: %u\n", machine->nr_threads);
+	for (i = 0; i < MACHINE_TH_TABLE_SIZE; i++) {
+		machine_th = &machine->threads[i];
+		pthread_rwlock_rdlock(&machine_th->threads_lock);
 
-	for (nd = rb_first(&machine->threads); nd; nd = rb_next(nd)) {
-		struct thread *pos = rb_entry(nd, struct thread, rb_node);
+		ret = fprintf(fp, "Threads: %u\n", machine_th->nr_threads);
 
-		ret += thread__fprintf(pos, fp);
-	}
+		for (nd = rb_first(&machine_th->threads); nd; nd = rb_next(nd)) {
+			struct thread *pos = rb_entry(nd, struct thread, rb_node);
 
-	pthread_rwlock_unlock(&machine->threads_lock);
+			ret += thread__fprintf(pos, fp);
+		}
 
+		pthread_rwlock_unlock(&machine_th->threads_lock);
+	}
 	return ret;
 }
 
@@ -1479,23 +1509,25 @@ int machine__process_mmap_event(struct machine *machine, union perf_event *event
 
 static void __machine__remove_thread(struct machine *machine, struct thread *th, bool lock)
 {
-	if (machine->last_match == th)
-		machine->last_match = NULL;
+	struct machine_th *machine_th = machine_thread(machine, th->tid);
+
+	if (machine_th->last_match == th)
+		machine_th->last_match = NULL;
 
 	BUG_ON(refcount_read(&th->refcnt) == 0);
 	if (lock)
-		pthread_rwlock_wrlock(&machine->threads_lock);
-	rb_erase_init(&th->rb_node, &machine->threads);
+		pthread_rwlock_wrlock(&machine_th->threads_lock);
+	rb_erase_init(&th->rb_node, &machine_th->threads);
 	RB_CLEAR_NODE(&th->rb_node);
-	--machine->nr_threads;
+	--machine_th->nr_threads;
 	/*
 	 * Move it first to the dead_threads list, then drop the reference,
 	 * if this is the last reference, then the thread__delete destructor
 	 * will be called and we will remove it from the dead_threads list.
 	 */
-	list_add_tail(&th->node, &machine->dead_threads);
+	list_add_tail(&th->node, &machine_th->dead_threads);
 	if (lock)
-		pthread_rwlock_unlock(&machine->threads_lock);
+		pthread_rwlock_unlock(&machine_th->threads_lock);
 	thread__put(th);
 }
 
@@ -2140,21 +2172,26 @@ int machine__for_each_thread(struct machine *machine,
 			     int (*fn)(struct thread *thread, void *p),
 			     void *priv)
 {
+	struct machine_th *machine_th;
 	struct rb_node *nd;
 	struct thread *thread;
 	int rc = 0;
+	int i;
 
-	for (nd = rb_first(&machine->threads); nd; nd = rb_next(nd)) {
-		thread = rb_entry(nd, struct thread, rb_node);
-		rc = fn(thread, priv);
-		if (rc != 0)
-			return rc;
-	}
+	for (i = 0; i < MACHINE_TH_TABLE_SIZE; i++) {
+		machine_th = &machine->threads[i];
+		for (nd = rb_first(&machine_th->threads); nd; nd = rb_next(nd)) {
+			thread = rb_entry(nd, struct thread, rb_node);
+			rc = fn(thread, priv);
+			if (rc != 0)
+				return rc;
+		}
 
-	list_for_each_entry(thread, &machine->dead_threads, node) {
-		rc = fn(thread, priv);
-		if (rc != 0)
-			return rc;
+		list_for_each_entry(thread, &machine_th->dead_threads, node) {
+			rc = fn(thread, priv);
+			if (rc != 0)
+				return rc;
+		}
 	}
 	return rc;
 }
diff --git a/tools/perf/util/machine.h b/tools/perf/util/machine.h
index 3cdb134..745a0ca 100644
--- a/tools/perf/util/machine.h
+++ b/tools/perf/util/machine.h
@@ -23,6 +23,17 @@ extern const char *ref_reloc_sym_names[];
 
 struct vdso_info;
 
+#define MACHINE_TH_TABLE_BITS	8
+#define MACHINE_TH_TABLE_SIZE	(1 << MACHINE_TH_TABLE_BITS)
+
+struct machine_th {
+	struct rb_root	  threads;
+	pthread_rwlock_t  threads_lock;
+	unsigned int	  nr_threads;
+	struct list_head  dead_threads;
+	struct thread	  *last_match;
+};
+
 struct machine {
 	struct rb_node	  rb_node;
 	pid_t		  pid;
@@ -30,11 +41,7 @@ struct machine {
 	bool		  comm_exec;
 	bool		  kptr_restrict_warned;
 	char		  *root_dir;
-	struct rb_root	  threads;
-	pthread_rwlock_t  threads_lock;
-	unsigned int	  nr_threads;
-	struct list_head  dead_threads;
-	struct thread	  *last_match;
+	struct machine_th threads[MACHINE_TH_TABLE_SIZE];
 	struct vdso_info  *vdso_info;
 	struct perf_env   *env;
 	struct dsos	  dsos;
@@ -49,6 +56,12 @@ struct machine {
 };
 
 static inline
+struct machine_th *machine_thread(struct machine *machine, pid_t tid)
+{
+	return &machine->threads[tid % MACHINE_TH_TABLE_SIZE];
+}
+
+static inline
 struct map *__machine__kernel_map(struct machine *machine, enum map_type type)
 {
 	return machine->vmlinux_maps[type];
diff --git a/tools/perf/util/rb_resort.h b/tools/perf/util/rb_resort.h
index 808cc45..bbd78fa 100644
--- a/tools/perf/util/rb_resort.h
+++ b/tools/perf/util/rb_resort.h
@@ -143,7 +143,8 @@ struct __name##_sorted *__name = __name##_sorted__new
 				  __ilist->rblist.nr_entries)
 
 /* For 'struct machine->threads' */
-#define DECLARE_RESORT_RB_MACHINE_THREADS(__name, __machine)			\
-	DECLARE_RESORT_RB(__name)(&__machine->threads, __machine->nr_threads)
+#define DECLARE_RESORT_RB_MACHINE_THREADS(__name, __machine, tid)		\
+	DECLARE_RESORT_RB(__name)(&__machine->threads[tid].threads,		\
+				  __machine->threads[tid].nr_threads)
 
 #endif /* _PERF_RESORT_RB_H_ */
-- 
2.5.5

^ permalink raw reply related	[flat|nested] 15+ messages in thread

* [PATCH RFC 02/10] perf tools: using scandir to replace readdir
  2017-09-07 17:55 [PATCH RFC 00/10] perf top optimization kan.liang
  2017-09-07 17:55 ` [PATCH RFC 01/10] perf tools: hashtable for machine threads kan.liang
@ 2017-09-07 17:55 ` kan.liang
  2017-09-08 14:11   ` Arnaldo Carvalho de Melo
  2017-09-22 16:35   ` [tip:perf/core] perf tools: Use scandir() to replace readdir() tip-bot for Kan Liang
  2017-09-07 17:55 ` [PATCH RFC 03/10] petf tools: using comm_str to replace comm in hist_entry kan.liang
                   ` (7 subsequent siblings)
  9 siblings, 2 replies; 15+ messages in thread
From: kan.liang @ 2017-09-07 17:55 UTC (permalink / raw)
  To: acme, peterz, mingo, linux-kernel
  Cc: jolsa, namhyung, adrian.hunter, lukasz.odzioba, ak, Kan Liang

From: Kan Liang <kan.liang@intel.com>

For perf_event__synthesize_threads, perf goes through all proc files
serially by readdir.
scandir did a snapshoot of proc, which is multithreading friendly.

It's possible that some threads which are added during event synthesize.
But the number of lost threads should be small.
They should not impact the final analysis.

Signed-off-by: Kan Liang <kan.liang@intel.com>
---
 tools/perf/util/event.c | 45 +++++++++++++++++++++++++--------------------
 1 file changed, 25 insertions(+), 20 deletions(-)

diff --git a/tools/perf/util/event.c b/tools/perf/util/event.c
index 1c905ba..17c21ea 100644
--- a/tools/perf/util/event.c
+++ b/tools/perf/util/event.c
@@ -683,12 +683,14 @@ int perf_event__synthesize_threads(struct perf_tool *tool,
 				   bool mmap_data,
 				   unsigned int proc_map_timeout)
 {
-	DIR *proc;
-	char proc_path[PATH_MAX];
-	struct dirent *dirent;
 	union perf_event *comm_event, *mmap_event, *fork_event;
 	union perf_event *namespaces_event;
+	char proc_path[PATH_MAX];
+	struct dirent **dirent;
 	int err = -1;
+	char *end;
+	pid_t pid;
+	int n, i;
 
 	if (machine__is_default_guest(machine))
 		return 0;
@@ -712,29 +714,32 @@ int perf_event__synthesize_threads(struct perf_tool *tool,
 		goto out_free_fork;
 
 	snprintf(proc_path, sizeof(proc_path), "%s/proc", machine->root_dir);
-	proc = opendir(proc_path);
+	n = scandir(proc_path, &dirent, 0, alphasort);
 
-	if (proc == NULL)
+	if (n < 0)
 		goto out_free_namespaces;
 
-	while ((dirent = readdir(proc)) != NULL) {
-		char *end;
-		pid_t pid = strtol(dirent->d_name, &end, 10);
-
-		if (*end) /* only interested in proper numerical dirents */
+	for (i = 0; i < n; i++) {
+		if (!isdigit(dirent[i]->d_name[0]))
 			continue;
-		/*
- 		 * We may race with exiting thread, so don't stop just because
- 		 * one thread couldn't be synthesized.
- 		 */
-		__event__synthesize_thread(comm_event, mmap_event, fork_event,
-					   namespaces_event, pid, 1, process,
-					   tool, machine, mmap_data,
-					   proc_map_timeout);
-	}
 
+		pid = (pid_t)strtol(dirent[i]->d_name, &end, 10);
+		/* only interested in proper numerical dirents */
+		if (!*end) {
+			/*
+			 * We may race with exiting thread, so don't stop just because
+			 * one thread couldn't be synthesized.
+			 */
+			__event__synthesize_thread(comm_event, mmap_event, fork_event,
+						   namespaces_event, pid, 1, process,
+						   tool, machine, mmap_data,
+						   proc_map_timeout);
+		}
+		free(dirent[i]);
+	}
+	free(dirent);
 	err = 0;
-	closedir(proc);
+
 out_free_namespaces:
 	free(namespaces_event);
 out_free_fork:
-- 
2.5.5

^ permalink raw reply related	[flat|nested] 15+ messages in thread

* [PATCH RFC 03/10] petf tools: using comm_str to replace comm in hist_entry
  2017-09-07 17:55 [PATCH RFC 00/10] perf top optimization kan.liang
  2017-09-07 17:55 ` [PATCH RFC 01/10] perf tools: hashtable for machine threads kan.liang
  2017-09-07 17:55 ` [PATCH RFC 02/10] perf tools: using scandir to replace readdir kan.liang
@ 2017-09-07 17:55 ` kan.liang
  2017-09-07 17:55 ` [PATCH RFC 04/10] petf tools: introduce a new function to set namespaces id kan.liang
                   ` (6 subsequent siblings)
  9 siblings, 0 replies; 15+ messages in thread
From: kan.liang @ 2017-09-07 17:55 UTC (permalink / raw)
  To: acme, peterz, mingo, linux-kernel
  Cc: jolsa, namhyung, adrian.hunter, lukasz.odzioba, ak, Kan Liang

From: Kan Liang <kan.liang@intel.com>

For hist_entry, it only needs comm_str for cmp.

Signed-off-by: Kan Liang <kan.liang@intel.com>
---
 tools/perf/util/hist.c | 4 ++--
 tools/perf/util/sort.c | 8 ++++----
 tools/perf/util/sort.h | 2 +-
 3 files changed, 7 insertions(+), 7 deletions(-)

diff --git a/tools/perf/util/hist.c b/tools/perf/util/hist.c
index e60d8d8..0f00dd9 100644
--- a/tools/perf/util/hist.c
+++ b/tools/perf/util/hist.c
@@ -587,7 +587,7 @@ __hists__add_entry(struct hists *hists,
 	struct namespaces *ns = thread__namespaces(al->thread);
 	struct hist_entry entry = {
 		.thread	= al->thread,
-		.comm = thread__comm(al->thread),
+		.comm_str = thread__comm_str(al->thread),
 		.cgroup_id = {
 			.dev = ns ? ns->link_info[CGROUP_NS_INDEX].dev : 0,
 			.ino = ns ? ns->link_info[CGROUP_NS_INDEX].ino : 0,
@@ -944,7 +944,7 @@ iter_add_next_cumulative_entry(struct hist_entry_iter *iter,
 		.hists = evsel__hists(evsel),
 		.cpu = al->cpu,
 		.thread = al->thread,
-		.comm = thread__comm(al->thread),
+		.comm_str = thread__comm_str(al->thread),
 		.ip = al->addr,
 		.ms = {
 			.map = al->map,
diff --git a/tools/perf/util/sort.c b/tools/perf/util/sort.c
index eb3ab90..99ab411 100644
--- a/tools/perf/util/sort.c
+++ b/tools/perf/util/sort.c
@@ -114,26 +114,26 @@ static int64_t
 sort__comm_cmp(struct hist_entry *left, struct hist_entry *right)
 {
 	/* Compare the addr that should be unique among comm */
-	return strcmp(comm__str(right->comm), comm__str(left->comm));
+	return strcmp(right->comm_str, left->comm_str);
 }
 
 static int64_t
 sort__comm_collapse(struct hist_entry *left, struct hist_entry *right)
 {
 	/* Compare the addr that should be unique among comm */
-	return strcmp(comm__str(right->comm), comm__str(left->comm));
+	return strcmp(right->comm_str, left->comm_str);
 }
 
 static int64_t
 sort__comm_sort(struct hist_entry *left, struct hist_entry *right)
 {
-	return strcmp(comm__str(right->comm), comm__str(left->comm));
+	return strcmp(right->comm_str, left->comm_str);
 }
 
 static int hist_entry__comm_snprintf(struct hist_entry *he, char *bf,
 				     size_t size, unsigned int width)
 {
-	return repsep_snprintf(bf, size, "%-*.*s", width, width, comm__str(he->comm));
+	return repsep_snprintf(bf, size, "%-*.*s", width, width, he->comm_str);
 }
 
 struct sort_entry sort_comm = {
diff --git a/tools/perf/util/sort.h b/tools/perf/util/sort.h
index f36dc49..861cbe7 100644
--- a/tools/perf/util/sort.h
+++ b/tools/perf/util/sort.h
@@ -96,7 +96,7 @@ struct hist_entry {
 	struct he_stat		*stat_acc;
 	struct map_symbol	ms;
 	struct thread		*thread;
-	struct comm		*comm;
+	const char		*comm_str;
 	struct namespace_id	cgroup_id;
 	u64			ip;
 	u64			transaction;
-- 
2.5.5

^ permalink raw reply related	[flat|nested] 15+ messages in thread

* [PATCH RFC 04/10] petf tools: introduce a new function to set namespaces id
  2017-09-07 17:55 [PATCH RFC 00/10] perf top optimization kan.liang
                   ` (2 preceding siblings ...)
  2017-09-07 17:55 ` [PATCH RFC 03/10] petf tools: using comm_str to replace comm in hist_entry kan.liang
@ 2017-09-07 17:55 ` kan.liang
  2017-09-07 17:55 ` [PATCH RFC 05/10] perf tools: lock to protect thread list kan.liang
                   ` (5 subsequent siblings)
  9 siblings, 0 replies; 15+ messages in thread
From: kan.liang @ 2017-09-07 17:55 UTC (permalink / raw)
  To: acme, peterz, mingo, linux-kernel
  Cc: jolsa, namhyung, adrian.hunter, lukasz.odzioba, ak, Kan Liang

From: Kan Liang <kan.liang@intel.com>

Finish the namespaces id setting job in thread.c.

Signed-off-by: Kan Liang <kan.liang@intel.com>
---
 tools/perf/util/hist.c   |  6 +-----
 tools/perf/util/thread.c | 10 ++++++++++
 tools/perf/util/thread.h |  2 ++
 3 files changed, 13 insertions(+), 5 deletions(-)

diff --git a/tools/perf/util/hist.c b/tools/perf/util/hist.c
index 0f00dd9..1f467cc 100644
--- a/tools/perf/util/hist.c
+++ b/tools/perf/util/hist.c
@@ -584,14 +584,9 @@ __hists__add_entry(struct hists *hists,
 		   bool sample_self,
 		   struct hist_entry_ops *ops)
 {
-	struct namespaces *ns = thread__namespaces(al->thread);
 	struct hist_entry entry = {
 		.thread	= al->thread,
 		.comm_str = thread__comm_str(al->thread),
-		.cgroup_id = {
-			.dev = ns ? ns->link_info[CGROUP_NS_INDEX].dev : 0,
-			.ino = ns ? ns->link_info[CGROUP_NS_INDEX].ino : 0,
-		},
 		.ms = {
 			.map	= al->map,
 			.sym	= al->sym,
@@ -617,6 +612,7 @@ __hists__add_entry(struct hists *hists,
 		.ops = ops,
 	};
 
+	thread__namespaces_id(al->thread, &entry.cgroup_id.dev, &entry.cgroup_id.ino);
 	return hists__findnew_entry(hists, &entry, al, sample_self);
 }
 
diff --git a/tools/perf/util/thread.c b/tools/perf/util/thread.c
index aee9a42..b7957da 100644
--- a/tools/perf/util/thread.c
+++ b/tools/perf/util/thread.c
@@ -149,6 +149,16 @@ int thread__set_namespaces(struct thread *thread, u64 timestamp,
 	return 0;
 }
 
+void thread__namespaces_id(const struct thread *thread,
+			   u64 *dev, u64 *ino)
+{
+	struct namespaces *ns;
+
+	ns = thread__namespaces(thread);
+	*dev = ns ? ns->link_info[CGROUP_NS_INDEX].dev : 0;
+	*ino = ns ? ns->link_info[CGROUP_NS_INDEX].ino : 0;
+}
+
 struct comm *thread__comm(const struct thread *thread)
 {
 	if (list_empty(&thread->comm_list))
diff --git a/tools/perf/util/thread.h b/tools/perf/util/thread.h
index cb1a5dd..fd7bd41 100644
--- a/tools/perf/util/thread.h
+++ b/tools/perf/util/thread.h
@@ -68,6 +68,8 @@ static inline void thread__exited(struct thread *thread)
 struct namespaces *thread__namespaces(const struct thread *thread);
 int thread__set_namespaces(struct thread *thread, u64 timestamp,
 			   struct namespaces_event *event);
+void thread__namespaces_id(const struct thread *thread,
+			   u64 *dev, u64 *ino);
 
 int __thread__set_comm(struct thread *thread, const char *comm, u64 timestamp,
 		       bool exec);
-- 
2.5.5

^ permalink raw reply related	[flat|nested] 15+ messages in thread

* [PATCH RFC 05/10] perf tools: lock to protect thread list
  2017-09-07 17:55 [PATCH RFC 00/10] perf top optimization kan.liang
                   ` (3 preceding siblings ...)
  2017-09-07 17:55 ` [PATCH RFC 04/10] petf tools: introduce a new function to set namespaces id kan.liang
@ 2017-09-07 17:55 ` kan.liang
  2017-09-07 17:55 ` [PATCH RFC 06/10] perf tools: lock to protect comm_str rb tree kan.liang
                   ` (4 subsequent siblings)
  9 siblings, 0 replies; 15+ messages in thread
From: kan.liang @ 2017-09-07 17:55 UTC (permalink / raw)
  To: acme, peterz, mingo, linux-kernel
  Cc: jolsa, namhyung, adrian.hunter, lukasz.odzioba, ak, Kan Liang

From: Kan Liang <kan.liang@intel.com>

Add two locks to protect namespaces_list and comm_list.

The comm which is used in db-export are not protected. Because the
multithread code will not touch it. It can be added later if required.

Signed-off-by: Kan Liang <kan.liang@intel.com>
---
 tools/perf/ui/browsers/hists.c |  2 +-
 tools/perf/util/thread.c       | 60 +++++++++++++++++++++++++++++++++---------
 tools/perf/util/thread.h       |  6 +++--
 3 files changed, 52 insertions(+), 16 deletions(-)

diff --git a/tools/perf/ui/browsers/hists.c b/tools/perf/ui/browsers/hists.c
index 13dfb0a..429e1dd 100644
--- a/tools/perf/ui/browsers/hists.c
+++ b/tools/perf/ui/browsers/hists.c
@@ -2370,7 +2370,7 @@ static int perf_evsel_browser_title(struct hist_browser *browser,
 	char unit;
 	int printed;
 	const struct dso *dso = hists->dso_filter;
-	const struct thread *thread = hists->thread_filter;
+	struct thread *thread = hists->thread_filter;
 	int socket_id = hists->socket_filter;
 	unsigned long nr_samples = hists->stats.nr_events[PERF_RECORD_SAMPLE];
 	u64 nr_events = hists->stats.total_period;
diff --git a/tools/perf/util/thread.c b/tools/perf/util/thread.c
index b7957da..25830b2 100644
--- a/tools/perf/util/thread.c
+++ b/tools/perf/util/thread.c
@@ -45,6 +45,8 @@ struct thread *thread__new(pid_t pid, pid_t tid)
 		thread->cpu = -1;
 		INIT_LIST_HEAD(&thread->namespaces_list);
 		INIT_LIST_HEAD(&thread->comm_list);
+		pthread_mutex_init(&thread->namespaces_lock, NULL);
+		pthread_mutex_init(&thread->comm_lock, NULL);
 
 		comm_str = malloc(32);
 		if (!comm_str)
@@ -83,15 +85,21 @@ void thread__delete(struct thread *thread)
 		map_groups__put(thread->mg);
 		thread->mg = NULL;
 	}
+	pthread_mutex_lock(&thread->namespaces_lock);
 	list_for_each_entry_safe(namespaces, tmp_namespaces,
 				 &thread->namespaces_list, list) {
 		list_del(&namespaces->list);
 		namespaces__free(namespaces);
 	}
+	pthread_mutex_unlock(&thread->namespaces_lock);
+
+	pthread_mutex_lock(&thread->comm_lock);
 	list_for_each_entry_safe(comm, tmp_comm, &thread->comm_list, list) {
 		list_del(&comm->list);
 		comm__free(comm);
 	}
+	pthread_mutex_unlock(&thread->comm_lock);
+
 	unwind__finish_access(thread);
 	nsinfo__zput(thread->nsinfo);
 
@@ -128,11 +136,17 @@ struct namespaces *thread__namespaces(const struct thread *thread)
 int thread__set_namespaces(struct thread *thread, u64 timestamp,
 			   struct namespaces_event *event)
 {
-	struct namespaces *new, *curr = thread__namespaces(thread);
+	struct namespaces *new, *curr;
+
+	pthread_mutex_lock(&thread->namespaces_lock);
+
+	curr = thread__namespaces(thread);
 
 	new = namespaces__new(event);
-	if (!new)
+	if (!new) {
+		pthread_mutex_unlock(&thread->namespaces_lock);
 		return -ENOMEM;
+	}
 
 	list_add(&new->list, &thread->namespaces_list);
 
@@ -146,17 +160,21 @@ int thread__set_namespaces(struct thread *thread, u64 timestamp,
 		curr->end_time = timestamp;
 	}
 
+	pthread_mutex_unlock(&thread->namespaces_lock);
+
 	return 0;
 }
 
-void thread__namespaces_id(const struct thread *thread,
+void thread__namespaces_id(struct thread *thread,
 			   u64 *dev, u64 *ino)
 {
 	struct namespaces *ns;
 
+	pthread_mutex_lock(&thread->namespaces_lock);
 	ns = thread__namespaces(thread);
 	*dev = ns ? ns->link_info[CGROUP_NS_INDEX].dev : 0;
 	*ino = ns ? ns->link_info[CGROUP_NS_INDEX].ino : 0;
+	pthread_mutex_unlock(&thread->namespaces_lock);
 }
 
 struct comm *thread__comm(const struct thread *thread)
@@ -183,17 +201,23 @@ struct comm *thread__exec_comm(const struct thread *thread)
 int __thread__set_comm(struct thread *thread, const char *str, u64 timestamp,
 		       bool exec)
 {
-	struct comm *new, *curr = thread__comm(thread);
+	struct comm *new, *curr;
+	int err = 0;
+
+	pthread_mutex_lock(&thread->comm_lock);
+	curr = thread__comm(thread);
 
 	/* Override the default :tid entry */
 	if (!thread->comm_set) {
-		int err = comm__override(curr, str, timestamp, exec);
+		err = comm__override(curr, str, timestamp, exec);
 		if (err)
-			return err;
+			goto unlock;
 	} else {
 		new = comm__new(str, timestamp, exec);
-		if (!new)
-			return -ENOMEM;
+		if (!new) {
+			err = -ENOMEM;
+			goto unlock;
+		}
 		list_add(&new->list, &thread->comm_list);
 
 		if (exec)
@@ -202,7 +226,9 @@ int __thread__set_comm(struct thread *thread, const char *str, u64 timestamp,
 
 	thread->comm_set = true;
 
-	return 0;
+unlock:
+	pthread_mutex_unlock(&thread->comm_lock);
+	return err;
 }
 
 int thread__set_comm_from_proc(struct thread *thread)
@@ -222,14 +248,22 @@ int thread__set_comm_from_proc(struct thread *thread)
 	return err;
 }
 
-const char *thread__comm_str(const struct thread *thread)
+const char *thread__comm_str(struct thread *thread)
 {
-	const struct comm *comm = thread__comm(thread);
+	const struct comm *comm;
+	const char *str;
+
+	pthread_mutex_lock(&thread->comm_lock);
+	comm = thread__comm(thread);
 
-	if (!comm)
+	if (!comm) {
+		pthread_mutex_unlock(&thread->comm_lock);
 		return NULL;
+	}
+	str = comm__str(comm);
 
-	return comm__str(comm);
+	pthread_mutex_unlock(&thread->comm_lock);
+	return str;
 }
 
 /* CHECKME: it should probably better return the max comm len from its comm list */
diff --git a/tools/perf/util/thread.h b/tools/perf/util/thread.h
index fd7bd41..1d3062a 100644
--- a/tools/perf/util/thread.h
+++ b/tools/perf/util/thread.h
@@ -29,7 +29,9 @@ struct thread {
 	int			comm_len;
 	bool			dead; /* if set thread has exited */
 	struct list_head	namespaces_list;
+	pthread_mutex_t		namespaces_lock;
 	struct list_head	comm_list;
+	pthread_mutex_t		comm_lock;
 	u64			db_id;
 
 	void			*priv;
@@ -68,7 +70,7 @@ static inline void thread__exited(struct thread *thread)
 struct namespaces *thread__namespaces(const struct thread *thread);
 int thread__set_namespaces(struct thread *thread, u64 timestamp,
 			   struct namespaces_event *event);
-void thread__namespaces_id(const struct thread *thread,
+void thread__namespaces_id(struct thread *thread,
 			   u64 *dev, u64 *ino);
 
 int __thread__set_comm(struct thread *thread, const char *comm, u64 timestamp,
@@ -84,7 +86,7 @@ int thread__set_comm_from_proc(struct thread *thread);
 int thread__comm_len(struct thread *thread);
 struct comm *thread__comm(const struct thread *thread);
 struct comm *thread__exec_comm(const struct thread *thread);
-const char *thread__comm_str(const struct thread *thread);
+const char *thread__comm_str(struct thread *thread);
 int thread__insert_map(struct thread *thread, struct map *map);
 int thread__fork(struct thread *thread, struct thread *parent, u64 timestamp);
 size_t thread__fprintf(struct thread *thread, FILE *fp);
-- 
2.5.5

^ permalink raw reply related	[flat|nested] 15+ messages in thread

* [PATCH RFC 06/10] perf tools: lock to protect comm_str rb tree
  2017-09-07 17:55 [PATCH RFC 00/10] perf top optimization kan.liang
                   ` (4 preceding siblings ...)
  2017-09-07 17:55 ` [PATCH RFC 05/10] perf tools: lock to protect thread list kan.liang
@ 2017-09-07 17:55 ` kan.liang
  2017-09-08 14:27   ` Arnaldo Carvalho de Melo
  2017-09-07 17:55 ` [PATCH RFC 07/10] perf tools: change machine comm_exec type to atomic kan.liang
                   ` (3 subsequent siblings)
  9 siblings, 1 reply; 15+ messages in thread
From: kan.liang @ 2017-09-07 17:55 UTC (permalink / raw)
  To: acme, peterz, mingo, linux-kernel
  Cc: jolsa, namhyung, adrian.hunter, lukasz.odzioba, ak, Kan Liang

From: Kan Liang <kan.liang@intel.com>

Add comm_str_lock to protect comm_str rb tree.

Signed-off-by: Kan Liang <kan.liang@intel.com>
---
 tools/perf/util/comm.c | 18 ++++++++++++++----
 1 file changed, 14 insertions(+), 4 deletions(-)

diff --git a/tools/perf/util/comm.c b/tools/perf/util/comm.c
index 7bc981b..1bdfef1 100644
--- a/tools/perf/util/comm.c
+++ b/tools/perf/util/comm.c
@@ -5,6 +5,7 @@
 #include <stdio.h>
 #include <string.h>
 #include <linux/refcount.h>
+#include <pthread.h>
 
 struct comm_str {
 	char *str;
@@ -14,6 +15,7 @@ struct comm_str {
 
 /* Should perhaps be moved to struct machine */
 static struct rb_root comm_str_root;
+static pthread_mutex_t comm_str_lock = PTHREAD_MUTEX_INITIALIZER;
 
 static struct comm_str *comm_str__get(struct comm_str *cs)
 {
@@ -24,11 +26,13 @@ static struct comm_str *comm_str__get(struct comm_str *cs)
 
 static void comm_str__put(struct comm_str *cs)
 {
+	pthread_mutex_lock(&comm_str_lock);
 	if (cs && refcount_dec_and_test(&cs->refcnt)) {
 		rb_erase(&cs->rb_node, &comm_str_root);
 		zfree(&cs->str);
 		free(cs);
 	}
+	pthread_mutex_unlock(&comm_str_lock);
 }
 
 static struct comm_str *comm_str__alloc(const char *str)
@@ -52,18 +56,22 @@ static struct comm_str *comm_str__alloc(const char *str)
 
 static struct comm_str *comm_str__findnew(const char *str, struct rb_root *root)
 {
-	struct rb_node **p = &root->rb_node;
 	struct rb_node *parent = NULL;
 	struct comm_str *iter, *new;
+	struct rb_node **p;
 	int cmp;
 
+	pthread_mutex_lock(&comm_str_lock);
+	p = &root->rb_node;
 	while (*p != NULL) {
 		parent = *p;
 		iter = rb_entry(parent, struct comm_str, rb_node);
 
 		cmp = strcmp(str, iter->str);
-		if (!cmp)
-			return comm_str__get(iter);
+		if (!cmp) {
+			new = comm_str__get(iter);
+			goto unlock;
+		}
 
 		if (cmp < 0)
 			p = &(*p)->rb_left;
@@ -73,11 +81,13 @@ static struct comm_str *comm_str__findnew(const char *str, struct rb_root *root)
 
 	new = comm_str__alloc(str);
 	if (!new)
-		return NULL;
+		goto unlock;
 
 	rb_link_node(&new->rb_node, parent, p);
 	rb_insert_color(&new->rb_node, root);
 
+unlock:
+	pthread_mutex_unlock(&comm_str_lock);
 	return new;
 }
 
-- 
2.5.5

^ permalink raw reply related	[flat|nested] 15+ messages in thread

* [PATCH RFC 07/10] perf tools: change machine comm_exec type to atomic
  2017-09-07 17:55 [PATCH RFC 00/10] perf top optimization kan.liang
                   ` (5 preceding siblings ...)
  2017-09-07 17:55 ` [PATCH RFC 06/10] perf tools: lock to protect comm_str rb tree kan.liang
@ 2017-09-07 17:55 ` kan.liang
  2017-09-07 17:55 ` [PATCH RFC 08/10] perf top: implement multithreading for perf_event__synthesize_threads kan.liang
                   ` (2 subsequent siblings)
  9 siblings, 0 replies; 15+ messages in thread
From: kan.liang @ 2017-09-07 17:55 UTC (permalink / raw)
  To: acme, peterz, mingo, linux-kernel
  Cc: jolsa, namhyung, adrian.hunter, lukasz.odzioba, ak, Kan Liang

From: Kan Liang <kan.liang@intel.com>

In case there are two or more threads want to change it.

Signed-off-by: Kan Liang <kan.liang@intel.com>
---
 tools/perf/util/machine.c | 11 ++++++-----
 tools/perf/util/machine.h |  2 +-
 2 files changed, 7 insertions(+), 6 deletions(-)

diff --git a/tools/perf/util/machine.c b/tools/perf/util/machine.c
index 5e451f9..f568c22 100644
--- a/tools/perf/util/machine.c
+++ b/tools/perf/util/machine.c
@@ -64,8 +64,8 @@ int machine__init(struct machine *machine, const char *root_dir, pid_t pid)
 
 	machine->id_hdr_size = 0;
 	machine->kptr_restrict_warned = false;
-	machine->comm_exec = false;
 	machine->kernel_start = 0;
+	atomic_set(&machine->comm_exec, 0);
 
 	memset(machine->vmlinux_maps, 0, sizeof(machine->vmlinux_maps));
 
@@ -238,14 +238,15 @@ struct machine *machines__add(struct machines *machines, pid_t pid,
 
 void machines__set_comm_exec(struct machines *machines, bool comm_exec)
 {
+	int exec = comm_exec ? 1 : 0;
 	struct rb_node *nd;
 
-	machines->host.comm_exec = comm_exec;
+	atomic_set(&machines->host.comm_exec, exec);
 
 	for (nd = rb_first(&machines->guests); nd; nd = rb_next(nd)) {
 		struct machine *machine = rb_entry(nd, struct machine, rb_node);
 
-		machine->comm_exec = comm_exec;
+		atomic_set(&machine->comm_exec, exec);
 	}
 }
 
@@ -505,7 +506,7 @@ struct thread *machine__find_thread(struct machine *machine, pid_t pid,
 struct comm *machine__thread_exec_comm(struct machine *machine,
 				       struct thread *thread)
 {
-	if (machine->comm_exec)
+	if (atomic_read(&machine->comm_exec))
 		return thread__exec_comm(thread);
 	else
 		return thread__comm(thread);
@@ -521,7 +522,7 @@ int machine__process_comm_event(struct machine *machine, union perf_event *event
 	int err = 0;
 
 	if (exec)
-		machine->comm_exec = true;
+		atomic_set(&machine->comm_exec, 1);
 
 	if (dump_trace)
 		perf_event__fprintf_comm(event, stdout);
diff --git a/tools/perf/util/machine.h b/tools/perf/util/machine.h
index 745a0ca..ec9a47a 100644
--- a/tools/perf/util/machine.h
+++ b/tools/perf/util/machine.h
@@ -38,7 +38,7 @@ struct machine {
 	struct rb_node	  rb_node;
 	pid_t		  pid;
 	u16		  id_hdr_size;
-	bool		  comm_exec;
+	atomic_t	  comm_exec;
 	bool		  kptr_restrict_warned;
 	char		  *root_dir;
 	struct machine_th threads[MACHINE_TH_TABLE_SIZE];
-- 
2.5.5

^ permalink raw reply related	[flat|nested] 15+ messages in thread

* [PATCH RFC 08/10] perf top: implement multithreading for perf_event__synthesize_threads
  2017-09-07 17:55 [PATCH RFC 00/10] perf top optimization kan.liang
                   ` (6 preceding siblings ...)
  2017-09-07 17:55 ` [PATCH RFC 07/10] perf tools: change machine comm_exec type to atomic kan.liang
@ 2017-09-07 17:55 ` kan.liang
  2017-09-07 17:55 ` [PATCH RFC 09/10] perf top: add option to set the number of thread for event synthesize kan.liang
  2017-09-07 17:55 ` [PATCH RFC 10/10] perf top: switch back to overwrite mode kan.liang
  9 siblings, 0 replies; 15+ messages in thread
From: kan.liang @ 2017-09-07 17:55 UTC (permalink / raw)
  To: acme, peterz, mingo, linux-kernel
  Cc: jolsa, namhyung, adrian.hunter, lukasz.odzioba, ak, Kan Liang

From: Kan Liang <kan.liang@intel.com>

The proc files which is sorted with alphabetical order are evenly
assigned to several synthesize threads to be processed in parallel.

For perf top, the threads number hard code to online CPU number. The
following patch will introduce an option to set it.
For other perf tools, the thread number is 1. Because the process
function is not ready for multithreading, e.g.
process_synthesized_event.
This patch series only support event synthesize multithreading for perf
top. For other tools, it can be done separately later.

With multithread applied, the total processing time can get up to 1.56x
speedup on Knights Mill for perf top.

For specific single event processing, the processing time could increase
because of the lock contention. So proc_map_timeout may need to be
increased. Otherwise some proc maps will be truncated.
Based on my test, increasing the proc_map_timeout has small impact
on the total processing time. The total processing time still get 1.49x
speedup on Knights Mill after increasing the proc_map_timeout.
The patch itself doesn't increase the proc_map_timeout.

Doesn't need to implement multithreading for
perf_event__synthesize_thread_map. It doesn't have performance issue.

Signed-off-by: Kan Liang <kan.liang@intel.com>
---
 tools/perf/builtin-kvm.c              |   3 +-
 tools/perf/builtin-record.c           |   2 +-
 tools/perf/builtin-top.c              |   4 +-
 tools/perf/builtin-trace.c            |   2 +-
 tools/perf/tests/mmap-thread-lookup.c |   2 +-
 tools/perf/util/event.c               | 144 ++++++++++++++++++++++++++--------
 tools/perf/util/event.h               |  14 +++-
 tools/perf/util/machine.c             |   8 +-
 tools/perf/util/machine.h             |   9 ++-
 9 files changed, 145 insertions(+), 43 deletions(-)

diff --git a/tools/perf/builtin-kvm.c b/tools/perf/builtin-kvm.c
index f309c37..edd14cb 100644
--- a/tools/perf/builtin-kvm.c
+++ b/tools/perf/builtin-kvm.c
@@ -1442,7 +1442,8 @@ static int kvm_events_live(struct perf_kvm_stat *kvm,
 	perf_session__set_id_hdr_size(kvm->session);
 	ordered_events__set_copy_on_queue(&kvm->session->ordered_events, true);
 	machine__synthesize_threads(&kvm->session->machines.host, &kvm->opts.target,
-				    kvm->evlist->threads, false, kvm->opts.proc_map_timeout);
+				    kvm->evlist->threads, false,
+				    kvm->opts.proc_map_timeout, 1);
 	err = kvm_live_open_events(kvm);
 	if (err)
 		goto out;
diff --git a/tools/perf/builtin-record.c b/tools/perf/builtin-record.c
index 56f8142..f8a6c89 100644
--- a/tools/perf/builtin-record.c
+++ b/tools/perf/builtin-record.c
@@ -863,7 +863,7 @@ static int record__synthesize(struct record *rec, bool tail)
 
 	err = __machine__synthesize_threads(machine, tool, &opts->target, rec->evlist->threads,
 					    process_synthesized_event, opts->sample_address,
-					    opts->proc_map_timeout);
+					    opts->proc_map_timeout, 1);
 out:
 	return err;
 }
diff --git a/tools/perf/builtin-top.c b/tools/perf/builtin-top.c
index ee954bd..4b8fdc1 100644
--- a/tools/perf/builtin-top.c
+++ b/tools/perf/builtin-top.c
@@ -959,7 +959,9 @@ static int __cmd_top(struct perf_top *top)
 		goto out_delete;
 
 	machine__synthesize_threads(&top->session->machines.host, &opts->target,
-				    top->evlist->threads, false, opts->proc_map_timeout);
+				    top->evlist->threads, false,
+				    opts->proc_map_timeout,
+				    (unsigned int)sysconf(_SC_NPROCESSORS_ONLN));
 
 	if (perf_hpp_list.socket) {
 		ret = perf_env__read_cpu_topology_map(&perf_env);
diff --git a/tools/perf/builtin-trace.c b/tools/perf/builtin-trace.c
index af4e4e4..1186783 100644
--- a/tools/perf/builtin-trace.c
+++ b/tools/perf/builtin-trace.c
@@ -1131,7 +1131,7 @@ static int trace__symbols_init(struct trace *trace, struct perf_evlist *evlist)
 
 	err = __machine__synthesize_threads(trace->host, &trace->tool, &trace->opts.target,
 					    evlist->threads, trace__tool_process, false,
-					    trace->opts.proc_map_timeout);
+					    trace->opts.proc_map_timeout, 1);
 	if (err)
 		symbol__exit();
 
diff --git a/tools/perf/tests/mmap-thread-lookup.c b/tools/perf/tests/mmap-thread-lookup.c
index f94a419..2a0068a 100644
--- a/tools/perf/tests/mmap-thread-lookup.c
+++ b/tools/perf/tests/mmap-thread-lookup.c
@@ -131,7 +131,7 @@ static int synth_all(struct machine *machine)
 {
 	return perf_event__synthesize_threads(NULL,
 					      perf_event__process,
-					      machine, 0, 500);
+					      machine, 0, 500, 1);
 }
 
 static int synth_process(struct machine *machine)
diff --git a/tools/perf/util/event.c b/tools/perf/util/event.c
index 17c21ea..4f565ff 100644
--- a/tools/perf/util/event.c
+++ b/tools/perf/util/event.c
@@ -677,23 +677,21 @@ int perf_event__synthesize_thread_map(struct perf_tool *tool,
 	return err;
 }
 
-int perf_event__synthesize_threads(struct perf_tool *tool,
-				   perf_event__handler_t process,
-				   struct machine *machine,
-				   bool mmap_data,
-				   unsigned int proc_map_timeout)
+static int __perf_event__synthesize_threads(struct perf_tool *tool,
+					    perf_event__handler_t process,
+					    struct machine *machine,
+					    bool mmap_data,
+					    unsigned int proc_map_timeout,
+					    struct dirent **dirent,
+					    int start,
+					    int num)
 {
 	union perf_event *comm_event, *mmap_event, *fork_event;
 	union perf_event *namespaces_event;
-	char proc_path[PATH_MAX];
-	struct dirent **dirent;
 	int err = -1;
 	char *end;
 	pid_t pid;
-	int n, i;
-
-	if (machine__is_default_guest(machine))
-		return 0;
+	int i;
 
 	comm_event = malloc(sizeof(comm_event->comm) + machine->id_hdr_size);
 	if (comm_event == NULL)
@@ -713,34 +711,24 @@ int perf_event__synthesize_threads(struct perf_tool *tool,
 	if (namespaces_event == NULL)
 		goto out_free_fork;
 
-	snprintf(proc_path, sizeof(proc_path), "%s/proc", machine->root_dir);
-	n = scandir(proc_path, &dirent, 0, alphasort);
-
-	if (n < 0)
-		goto out_free_namespaces;
-
-	for (i = 0; i < n; i++) {
+	for (i = start; i < start + num; i++) {
 		if (!isdigit(dirent[i]->d_name[0]))
 			continue;
 
 		pid = (pid_t)strtol(dirent[i]->d_name, &end, 10);
 		/* only interested in proper numerical dirents */
-		if (!*end) {
-			/*
-			 * We may race with exiting thread, so don't stop just because
-			 * one thread couldn't be synthesized.
-			 */
-			__event__synthesize_thread(comm_event, mmap_event, fork_event,
-						   namespaces_event, pid, 1, process,
-						   tool, machine, mmap_data,
-						   proc_map_timeout);
-		}
-		free(dirent[i]);
+		if (*end)
+			continue;
+		/*
+		 * We may race with exiting thread, so don't stop just because
+		 * one thread couldn't be synthesized.
+		 */
+		__event__synthesize_thread(comm_event, mmap_event, fork_event,
+					   namespaces_event, pid, 1, process,
+					   tool, machine, mmap_data,
+					   proc_map_timeout);
 	}
-	free(dirent);
 	err = 0;
-
-out_free_namespaces:
 	free(namespaces_event);
 out_free_fork:
 	free(fork_event);
@@ -752,6 +740,98 @@ int perf_event__synthesize_threads(struct perf_tool *tool,
 	return err;
 }
 
+static void *synthesize_threads_worker(void *arg)
+{
+	struct synthesize_threads_arg *args = arg;
+
+	__perf_event__synthesize_threads(args->tool, args->process,
+					 args->machine, args->mmap_data,
+					 args->proc_map_timeout, args->dirent,
+					 args->start, args->num);
+	return NULL;
+}
+
+int perf_event__synthesize_threads(struct perf_tool *tool,
+				   perf_event__handler_t process,
+				   struct machine *machine,
+				   bool mmap_data,
+				   unsigned int proc_map_timeout,
+				   unsigned int nr_threads_synthesize)
+{
+	struct synthesize_threads_arg *args = NULL;
+	pthread_t *synthesize_threads = NULL;
+	char proc_path[PATH_MAX];
+	struct dirent **dirent;
+	int num_per_thread;
+	int m, n, i, j;
+	int thread_nr;
+	int base = 0;
+	int err = -1;
+
+
+	if (machine__is_default_guest(machine))
+		return 0;
+
+	snprintf(proc_path, sizeof(proc_path), "%s/proc", machine->root_dir);
+	n = scandir(proc_path, &dirent, 0, alphasort);
+	if (n < 0)
+		return err;
+
+	thread_nr = nr_threads_synthesize;
+	if (thread_nr <= 0)
+		thread_nr = 1;
+	if (thread_nr > n)
+		thread_nr = n;
+
+	synthesize_threads = calloc(sizeof(pthread_t), thread_nr);
+	if (synthesize_threads == NULL)
+		goto free_dirent;
+
+	args = calloc(sizeof(*args), thread_nr);
+	if (args == NULL)
+		goto free_threads;
+
+	num_per_thread = n / thread_nr;
+	m = n % thread_nr;
+	for (i = 0; i < thread_nr; i++) {
+		args[i].tool = tool;
+		args[i].process = process;
+		args[i].machine = machine;
+		args[i].mmap_data = mmap_data;
+		args[i].proc_map_timeout = proc_map_timeout;
+		args[i].dirent = dirent;
+	}
+	for (i = 0; i < m; i++) {
+		args[i].num = num_per_thread + 1;
+		args[i].start = i * args[i].num;
+	}
+	if (i != 0)
+		base = args[i-1].start + args[i-1].num;
+	for (j = i; j < thread_nr; j++) {
+		args[j].num = num_per_thread;
+		args[j].start = base + (j - i) * args[i].num;
+	}
+
+	for (i = 0; i < thread_nr; i++) {
+		if (pthread_create(&synthesize_threads[i], NULL,
+				   synthesize_threads_worker, &args[i]))
+			goto out_join;
+	}
+	err = 0;
+out_join:
+	for (i = 0; i < thread_nr; i++)
+		pthread_join(synthesize_threads[i], NULL);
+	free(args);
+free_threads:
+	free(synthesize_threads);
+free_dirent:
+	for (i = 0; i < n; i++)
+		free(dirent[i]);
+	free(dirent);
+
+	return err;
+}
+
 struct process_symbol_args {
 	const char *name;
 	u64	   start;
diff --git a/tools/perf/util/event.h b/tools/perf/util/event.h
index ee7bcc8..7b987c8 100644
--- a/tools/perf/util/event.h
+++ b/tools/perf/util/event.h
@@ -664,6 +664,17 @@ typedef int (*perf_event__handler_t)(struct perf_tool *tool,
 				     struct perf_sample *sample,
 				     struct machine *machine);
 
+struct synthesize_threads_arg {
+	struct perf_tool *tool;
+	perf_event__handler_t process;
+	struct machine *machine;
+	bool mmap_data;
+	unsigned int proc_map_timeout;
+	struct dirent **dirent;
+	int num;
+	int start;
+};
+
 int perf_event__synthesize_thread_map(struct perf_tool *tool,
 				      struct thread_map *threads,
 				      perf_event__handler_t process,
@@ -680,7 +691,8 @@ int perf_event__synthesize_cpu_map(struct perf_tool *tool,
 int perf_event__synthesize_threads(struct perf_tool *tool,
 				   perf_event__handler_t process,
 				   struct machine *machine, bool mmap_data,
-				   unsigned int proc_map_timeout);
+				   unsigned int proc_map_timeout,
+				   unsigned int nr_threads_synthesize);
 int perf_event__synthesize_kernel_mmap(struct perf_tool *tool,
 				       perf_event__handler_t process,
 				       struct machine *machine);
diff --git a/tools/perf/util/machine.c b/tools/perf/util/machine.c
index f568c22..33728d0 100644
--- a/tools/perf/util/machine.c
+++ b/tools/perf/util/machine.c
@@ -2221,12 +2221,16 @@ int machines__for_each_thread(struct machines *machines,
 int __machine__synthesize_threads(struct machine *machine, struct perf_tool *tool,
 				  struct target *target, struct thread_map *threads,
 				  perf_event__handler_t process, bool data_mmap,
-				  unsigned int proc_map_timeout)
+				  unsigned int proc_map_timeout,
+				  unsigned int nr_threads_synthesize)
 {
 	if (target__has_task(target))
 		return perf_event__synthesize_thread_map(tool, threads, process, machine, data_mmap, proc_map_timeout);
 	else if (target__has_cpu(target))
-		return perf_event__synthesize_threads(tool, process, machine, data_mmap, proc_map_timeout);
+		return perf_event__synthesize_threads(tool, process,
+						      machine, data_mmap,
+						      proc_map_timeout,
+						      nr_threads_synthesize);
 	/* command specified */
 	return 0;
 }
diff --git a/tools/perf/util/machine.h b/tools/perf/util/machine.h
index ec9a47a..db264b8 100644
--- a/tools/perf/util/machine.h
+++ b/tools/perf/util/machine.h
@@ -256,15 +256,18 @@ int machines__for_each_thread(struct machines *machines,
 int __machine__synthesize_threads(struct machine *machine, struct perf_tool *tool,
 				  struct target *target, struct thread_map *threads,
 				  perf_event__handler_t process, bool data_mmap,
-				  unsigned int proc_map_timeout);
+				  unsigned int proc_map_timeout,
+				  unsigned int nr_threads_synthesize);
 static inline
 int machine__synthesize_threads(struct machine *machine, struct target *target,
 				struct thread_map *threads, bool data_mmap,
-				unsigned int proc_map_timeout)
+				unsigned int proc_map_timeout,
+				unsigned int nr_threads_synthesize)
 {
 	return __machine__synthesize_threads(machine, NULL, target, threads,
 					     perf_event__process, data_mmap,
-					     proc_map_timeout);
+					     proc_map_timeout,
+					     nr_threads_synthesize);
 }
 
 pid_t machine__get_current_tid(struct machine *machine, int cpu);
-- 
2.5.5

^ permalink raw reply related	[flat|nested] 15+ messages in thread

* [PATCH RFC 09/10] perf top: add option to set the number of thread for event synthesize
  2017-09-07 17:55 [PATCH RFC 00/10] perf top optimization kan.liang
                   ` (7 preceding siblings ...)
  2017-09-07 17:55 ` [PATCH RFC 08/10] perf top: implement multithreading for perf_event__synthesize_threads kan.liang
@ 2017-09-07 17:55 ` kan.liang
  2017-09-07 17:55 ` [PATCH RFC 10/10] perf top: switch back to overwrite mode kan.liang
  9 siblings, 0 replies; 15+ messages in thread
From: kan.liang @ 2017-09-07 17:55 UTC (permalink / raw)
  To: acme, peterz, mingo, linux-kernel
  Cc: jolsa, namhyung, adrian.hunter, lukasz.odzioba, ak, Kan Liang

From: Kan Liang <kan.liang@intel.com>

Using UINT_MAX to indicate the default thread#, which is the max number
of online CPU.

Signed-off-by: Kan Liang <kan.liang@intel.com>
---
 tools/perf/builtin-top.c | 5 ++++-
 tools/perf/util/event.c  | 5 ++++-
 tools/perf/util/top.h    | 1 +
 3 files changed, 9 insertions(+), 2 deletions(-)

diff --git a/tools/perf/builtin-top.c b/tools/perf/builtin-top.c
index 4b8fdc1..5f59aa7 100644
--- a/tools/perf/builtin-top.c
+++ b/tools/perf/builtin-top.c
@@ -961,7 +961,7 @@ static int __cmd_top(struct perf_top *top)
 	machine__synthesize_threads(&top->session->machines.host, &opts->target,
 				    top->evlist->threads, false,
 				    opts->proc_map_timeout,
-				    (unsigned int)sysconf(_SC_NPROCESSORS_ONLN));
+				    top->nr_threads_synthesize);
 
 	if (perf_hpp_list.socket) {
 		ret = perf_env__read_cpu_topology_map(&perf_env);
@@ -1114,6 +1114,7 @@ int cmd_top(int argc, const char **argv)
 		},
 		.max_stack	     = sysctl_perf_event_max_stack,
 		.sym_pcnt_filter     = 5,
+		.nr_threads_synthesize = UINT_MAX,
 	};
 	struct record_opts *opts = &top.record_opts;
 	struct target *target = &opts->target;
@@ -1223,6 +1224,8 @@ int cmd_top(int argc, const char **argv)
 	OPT_BOOLEAN(0, "hierarchy", &symbol_conf.report_hierarchy,
 		    "Show entries in a hierarchy"),
 	OPT_BOOLEAN(0, "force", &symbol_conf.force, "don't complain, do it"),
+	OPT_UINTEGER(0, "num-thread-synthesize", &top.nr_threads_synthesize,
+			"number of thread to run event synthesize"),
 	OPT_END()
 	};
 	const char * const top_usage[] = {
diff --git a/tools/perf/util/event.c b/tools/perf/util/event.c
index 4f565ff..2104603 100644
--- a/tools/perf/util/event.c
+++ b/tools/perf/util/event.c
@@ -777,7 +777,10 @@ int perf_event__synthesize_threads(struct perf_tool *tool,
 	if (n < 0)
 		return err;
 
-	thread_nr = nr_threads_synthesize;
+	if (nr_threads_synthesize == UINT_MAX)
+		thread_nr = sysconf(_SC_NPROCESSORS_ONLN);
+	else
+		thread_nr = nr_threads_synthesize;
 	if (thread_nr <= 0)
 		thread_nr = 1;
 	if (thread_nr > n)
diff --git a/tools/perf/util/top.h b/tools/perf/util/top.h
index 9bdfb78..f4296e1 100644
--- a/tools/perf/util/top.h
+++ b/tools/perf/util/top.h
@@ -37,6 +37,7 @@ struct perf_top {
 	int		   sym_pcnt_filter;
 	const char	   *sym_filter;
 	float		   min_percent;
+	unsigned int	   nr_threads_synthesize;
 };
 
 #define CONSOLE_CLEAR "^[[H^[[2J"
-- 
2.5.5

^ permalink raw reply related	[flat|nested] 15+ messages in thread

* [PATCH RFC 10/10] perf top: switch back to overwrite mode
  2017-09-07 17:55 [PATCH RFC 00/10] perf top optimization kan.liang
                   ` (8 preceding siblings ...)
  2017-09-07 17:55 ` [PATCH RFC 09/10] perf top: add option to set the number of thread for event synthesize kan.liang
@ 2017-09-07 17:55 ` kan.liang
  9 siblings, 0 replies; 15+ messages in thread
From: kan.liang @ 2017-09-07 17:55 UTC (permalink / raw)
  To: acme, peterz, mingo, linux-kernel
  Cc: jolsa, namhyung, adrian.hunter, lukasz.odzioba, ak, Kan Liang

From: Kan Liang <kan.liang@intel.com>

perf_top__mmap_read has severe performance issue in
Knights Landing/Mill, when monitoring in heavy load system. It costs
several minutes to finish, which is unacceptable.

perf top was overwrite mode. But it is changed to non overwrite mode
since commit 93fc64f14472 ("perf top: Switch to non overwrite mode").
For non overwrite mode, it tries to read everything in the ring buffer
and does not check the messup. Once there are lots of samples delivered
shortly, the processing time could be very long.
Knights Landing/Mill as a manycore processor contains a large number of
small cores. Because of the huge core number, it will generated lots of
samples in a heavy load system. Also, since the huge sample#, the mmap
writer probably bite the tail and mess up the samples.

Switching to overwrite mode, which dropping the unsure mmap entries,
significantly speeds up the whole progress.
Considering the real time requirement for perf top, it should switch
back to overwrite mode.

Only warning once if the messup is detected.
Providing some hints to users.

Signed-off-by: Kan Liang <kan.liang@intel.com>
---
 tools/perf/builtin-top.c | 2 +-
 tools/perf/util/evlist.c | 5 ++++-
 2 files changed, 5 insertions(+), 2 deletions(-)

diff --git a/tools/perf/builtin-top.c b/tools/perf/builtin-top.c
index 5f59aa7..8124b8f 100644
--- a/tools/perf/builtin-top.c
+++ b/tools/perf/builtin-top.c
@@ -902,7 +902,7 @@ static int perf_top__start_counters(struct perf_top *top)
 		}
 	}
 
-	if (perf_evlist__mmap(evlist, opts->mmap_pages, false) < 0) {
+	if (perf_evlist__mmap(evlist, opts->mmap_pages, true) < 0) {
 		ui__error("Failed to mmap with %d (%s)\n",
 			    errno, str_error_r(errno, msg, sizeof(msg)));
 		goto out_err;
diff --git a/tools/perf/util/evlist.c b/tools/perf/util/evlist.c
index 6a0d7ff..ef0b6b1 100644
--- a/tools/perf/util/evlist.c
+++ b/tools/perf/util/evlist.c
@@ -723,7 +723,10 @@ perf_mmap__read(struct perf_mmap *md, bool check_messup, u64 start,
 		 * In either case, truncate and restart at 'end'.
 		 */
 		if (diff > md->mask / 2 || diff < 0) {
-			fprintf(stderr, "WARNING: failed to keep up with mmap data.\n");
+			WARN_ONCE(1, "WARNING: failed to keep up with mmap data.\n"
+				     "Please try increasing the period (-c) or\n"
+				     "decreasing the freq (-F) or\n"
+				     "limiting the number of CPUs");
 
 			/*
 			 * 'end' points to a known good entry, start there.
-- 
2.5.5

^ permalink raw reply related	[flat|nested] 15+ messages in thread

* Re: [PATCH RFC 02/10] perf tools: using scandir to replace readdir
  2017-09-07 17:55 ` [PATCH RFC 02/10] perf tools: using scandir to replace readdir kan.liang
@ 2017-09-08 14:11   ` Arnaldo Carvalho de Melo
  2017-09-22 16:35   ` [tip:perf/core] perf tools: Use scandir() to replace readdir() tip-bot for Kan Liang
  1 sibling, 0 replies; 15+ messages in thread
From: Arnaldo Carvalho de Melo @ 2017-09-08 14:11 UTC (permalink / raw)
  To: kan.liang
  Cc: peterz, mingo, linux-kernel, jolsa, namhyung, adrian.hunter,
	lukasz.odzioba, ak

Em Thu, Sep 07, 2017 at 10:55:46AM -0700, kan.liang@intel.com escreveu:
> From: Kan Liang <kan.liang@intel.com>
> 
> For perf_event__synthesize_threads, perf goes through all proc files
> serially by readdir.
> scandir did a snapshoot of proc, which is multithreading friendly.
> 
> It's possible that some threads which are added during event synthesize.
> But the number of lost threads should be small.
> They should not impact the final analysis.

Looks ok, applied locally, will see how it builds in my containers
tests.

- Arnaldo
 
> Signed-off-by: Kan Liang <kan.liang@intel.com>
> ---
>  tools/perf/util/event.c | 45 +++++++++++++++++++++++++--------------------
>  1 file changed, 25 insertions(+), 20 deletions(-)
> 
> diff --git a/tools/perf/util/event.c b/tools/perf/util/event.c
> index 1c905ba..17c21ea 100644
> --- a/tools/perf/util/event.c
> +++ b/tools/perf/util/event.c
> @@ -683,12 +683,14 @@ int perf_event__synthesize_threads(struct perf_tool *tool,
>  				   bool mmap_data,
>  				   unsigned int proc_map_timeout)
>  {
> -	DIR *proc;
> -	char proc_path[PATH_MAX];
> -	struct dirent *dirent;
>  	union perf_event *comm_event, *mmap_event, *fork_event;
>  	union perf_event *namespaces_event;
> +	char proc_path[PATH_MAX];
> +	struct dirent **dirent;
>  	int err = -1;
> +	char *end;
> +	pid_t pid;
> +	int n, i;
>  
>  	if (machine__is_default_guest(machine))
>  		return 0;
> @@ -712,29 +714,32 @@ int perf_event__synthesize_threads(struct perf_tool *tool,
>  		goto out_free_fork;
>  
>  	snprintf(proc_path, sizeof(proc_path), "%s/proc", machine->root_dir);
> -	proc = opendir(proc_path);
> +	n = scandir(proc_path, &dirent, 0, alphasort);
>  
> -	if (proc == NULL)
> +	if (n < 0)
>  		goto out_free_namespaces;
>  
> -	while ((dirent = readdir(proc)) != NULL) {
> -		char *end;
> -		pid_t pid = strtol(dirent->d_name, &end, 10);
> -
> -		if (*end) /* only interested in proper numerical dirents */
> +	for (i = 0; i < n; i++) {
> +		if (!isdigit(dirent[i]->d_name[0]))
>  			continue;
> -		/*
> - 		 * We may race with exiting thread, so don't stop just because
> - 		 * one thread couldn't be synthesized.
> - 		 */
> -		__event__synthesize_thread(comm_event, mmap_event, fork_event,
> -					   namespaces_event, pid, 1, process,
> -					   tool, machine, mmap_data,
> -					   proc_map_timeout);
> -	}
>  
> +		pid = (pid_t)strtol(dirent[i]->d_name, &end, 10);
> +		/* only interested in proper numerical dirents */
> +		if (!*end) {
> +			/*
> +			 * We may race with exiting thread, so don't stop just because
> +			 * one thread couldn't be synthesized.
> +			 */
> +			__event__synthesize_thread(comm_event, mmap_event, fork_event,
> +						   namespaces_event, pid, 1, process,
> +						   tool, machine, mmap_data,
> +						   proc_map_timeout);
> +		}
> +		free(dirent[i]);
> +	}
> +	free(dirent);
>  	err = 0;
> -	closedir(proc);
> +
>  out_free_namespaces:
>  	free(namespaces_event);
>  out_free_fork:
> -- 
> 2.5.5

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [PATCH RFC 01/10] perf tools: hashtable for machine threads
  2017-09-07 17:55 ` [PATCH RFC 01/10] perf tools: hashtable for machine threads kan.liang
@ 2017-09-08 14:18   ` Arnaldo Carvalho de Melo
  0 siblings, 0 replies; 15+ messages in thread
From: Arnaldo Carvalho de Melo @ 2017-09-08 14:18 UTC (permalink / raw)
  To: kan.liang
  Cc: peterz, mingo, linux-kernel, jolsa, namhyung, adrian.hunter,
	lukasz.odzioba, ak

Em Thu, Sep 07, 2017 at 10:55:45AM -0700, kan.liang@intel.com escreveu:
> From: Kan Liang <kan.liang@intel.com>
> 
> To process any events, it needs to find the thread in the machine first.
> The machine maintains a rb tree to store all threads. The rb tree is
> protected by a rw lock.
> It is not a problem for current perf which serially processing events.
> However, it will have scalability performance issue to process events in
> parallel, especially on a heave load system which have many threads.
> 
> Introduce a hashtable to divide the big rb tree into many samll rb tree
> for threads. The index is thread id % hashtable size. It can reduce the
> lock contention.

<SNIP>
 
> +++ b/tools/perf/util/machine.h
> @@ -23,6 +23,17 @@ extern const char *ref_reloc_sym_names[];
>  
>  struct vdso_info;
>  
> +#define MACHINE_TH_TABLE_BITS	8
> +#define MACHINE_TH_TABLE_SIZE	(1 << MACHINE_TH_TABLE_BITS)
> +
> +struct machine_th {
> +	struct rb_root	  threads;
> +	pthread_rwlock_t  threads_lock;
> +	unsigned int	  nr_threads;
> +	struct list_head  dead_threads;
> +	struct thread	  *last_match;
> +};
> +

Call it just "threads", no need to call it then threads->threads, but
threads->entries, also no threads->threads_lock, but threads->lock,
threads->deads, threads->nr.

MACHINE_TH_TABLE_SIZE -> THREADS__TABLE_SIZE, etc.

>  struct machine {
>  	struct rb_node	  rb_node;
>  	pid_t		  pid;
> @@ -30,11 +41,7 @@ struct machine {
>  	bool		  comm_exec;
>  	bool		  kptr_restrict_warned;
>  	char		  *root_dir;
> -	struct rb_root	  threads;
> -	pthread_rwlock_t  threads_lock;
> -	unsigned int	  nr_threads;
> -	struct list_head  dead_threads;
> -	struct thread	  *last_match;
> +	struct machine_th threads[MACHINE_TH_TABLE_SIZE];
>  	struct vdso_info  *vdso_info;
>  	struct perf_env   *env;
>  	struct dsos	  dsos;
> @@ -49,6 +56,12 @@ struct machine {
>  };
>  
>  static inline
> +struct machine_th *machine_thread(struct machine *machine, pid_t tid)

We separate the class name (machine) from the method name (thread) using
double underscores, i.e. the above becomes:

static inline threads *machine__threads(struct machine *machine, pid_t tid)

> +{
> +	return &machine->threads[tid % MACHINE_TH_TABLE_SIZE];
> +}
> +
> +static inline
>  struct map *__machine__kernel_map(struct machine *machine, enum map_type type)
>  {
>  	return machine->vmlinux_maps[type];
> diff --git a/tools/perf/util/rb_resort.h b/tools/perf/util/rb_resort.h
> index 808cc45..bbd78fa 100644
> --- a/tools/perf/util/rb_resort.h
> +++ b/tools/perf/util/rb_resort.h
> @@ -143,7 +143,8 @@ struct __name##_sorted *__name = __name##_sorted__new
>  				  __ilist->rblist.nr_entries)
>  
>  /* For 'struct machine->threads' */
> -#define DECLARE_RESORT_RB_MACHINE_THREADS(__name, __machine)			\
> -	DECLARE_RESORT_RB(__name)(&__machine->threads, __machine->nr_threads)
> +#define DECLARE_RESORT_RB_MACHINE_THREADS(__name, __machine, tid)		\
> +	DECLARE_RESORT_RB(__name)(&__machine->threads[tid].threads,		\
> +				  __machine->threads[tid].nr_threads)
>  
>  #endif /* _PERF_RESORT_RB_H_ */
> -- 
> 2.5.5

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [PATCH RFC 06/10] perf tools: lock to protect comm_str rb tree
  2017-09-07 17:55 ` [PATCH RFC 06/10] perf tools: lock to protect comm_str rb tree kan.liang
@ 2017-09-08 14:27   ` Arnaldo Carvalho de Melo
  0 siblings, 0 replies; 15+ messages in thread
From: Arnaldo Carvalho de Melo @ 2017-09-08 14:27 UTC (permalink / raw)
  To: kan.liang
  Cc: peterz, mingo, linux-kernel, jolsa, namhyung, adrian.hunter,
	lukasz.odzioba, ak

Em Thu, Sep 07, 2017 at 10:55:50AM -0700, kan.liang@intel.com escreveu:
> From: Kan Liang <kan.liang@intel.com>
> 
> Add comm_str_lock to protect comm_str rb tree.
> 
> Signed-off-by: Kan Liang <kan.liang@intel.com>
> ---
>  tools/perf/util/comm.c | 18 ++++++++++++++----
>  1 file changed, 14 insertions(+), 4 deletions(-)
> 
> diff --git a/tools/perf/util/comm.c b/tools/perf/util/comm.c
> index 7bc981b..1bdfef1 100644
> --- a/tools/perf/util/comm.c
> +++ b/tools/perf/util/comm.c
> @@ -5,6 +5,7 @@
>  #include <stdio.h>
>  #include <string.h>
>  #include <linux/refcount.h>
> +#include <pthread.h>
>  
>  struct comm_str {
>  	char *str;
> @@ -14,6 +15,7 @@ struct comm_str {
>  
>  /* Should perhaps be moved to struct machine */
>  static struct rb_root comm_str_root;
> +static pthread_mutex_t comm_str_lock = PTHREAD_MUTEX_INITIALIZER;
>  
>  static struct comm_str *comm_str__get(struct comm_str *cs)
>  {
> @@ -24,11 +26,13 @@ static struct comm_str *comm_str__get(struct comm_str *cs)
>  
>  static void comm_str__put(struct comm_str *cs)
>  {
> +	pthread_mutex_lock(&comm_str_lock);
>  	if (cs && refcount_dec_and_test(&cs->refcnt)) {
>  		rb_erase(&cs->rb_node, &comm_str_root);
>  		zfree(&cs->str);
>  		free(cs);
>  	}
> +	pthread_mutex_unlock(&comm_str_lock);
>  }

The above should use a smaller locked section, i.e.:

 static void comm_str__put(struct comm_str *cs)
 {
   	if (cs && refcount_dec_and_test(&cs->refcnt)) {
+		pthread_mutex_lock(&comm_str_lock);
   		rb_erase(&cs->rb_node, &comm_str_root);
+		pthread_mutex_unlock(&comm_str_lock);
   		zfree(&cs->str);
   		free(cs);
   	}
 }
  
>  static struct comm_str *comm_str__alloc(const char *str)
> @@ -52,18 +56,22 @@ static struct comm_str *comm_str__alloc(const char *str)
>  
>  static struct comm_str *comm_str__findnew(const char *str, struct rb_root *root)

The usual way is to just rename the above to __comm_str__findnew(),
leaving it unlocked, and then add a locked wrapper:

static struct comm_str *comm_str__findnew(const char *str, struct rb_root *root)
{
	struct comm_str *cs;
	pthread_mutex_lock(&comm_str_lock);
	cs = __comm_str__findnew(str, root);
	pthread_mutex_unlock(&comm_str_lock);
	return cs;
}

>  {
> -	struct rb_node **p = &root->rb_node;
>  	struct rb_node *parent = NULL;
>  	struct comm_str *iter, *new;
> +	struct rb_node **p;
>  	int cmp;
>  
> +	pthread_mutex_lock(&comm_str_lock);
> +	p = &root->rb_node;
>  	while (*p != NULL) {
>  		parent = *p;
>  		iter = rb_entry(parent, struct comm_str, rb_node);
>  
>  		cmp = strcmp(str, iter->str);
> -		if (!cmp)
> -			return comm_str__get(iter);
> +		if (!cmp) {
> +			new = comm_str__get(iter);
> +			goto unlock;
> +		}
>  
>  		if (cmp < 0)
>  			p = &(*p)->rb_left;
> @@ -73,11 +81,13 @@ static struct comm_str *comm_str__findnew(const char *str, struct rb_root *root)
>  
>  	new = comm_str__alloc(str);
>  	if (!new)
> -		return NULL;
> +		goto unlock;
>  
>  	rb_link_node(&new->rb_node, parent, p);
>  	rb_insert_color(&new->rb_node, root);
>  
> +unlock:
> +	pthread_mutex_unlock(&comm_str_lock);
>  	return new;
>  }
>  
> -- 
> 2.5.5

^ permalink raw reply	[flat|nested] 15+ messages in thread

* [tip:perf/core] perf tools: Use scandir() to replace readdir()
  2017-09-07 17:55 ` [PATCH RFC 02/10] perf tools: using scandir to replace readdir kan.liang
  2017-09-08 14:11   ` Arnaldo Carvalho de Melo
@ 2017-09-22 16:35   ` tip-bot for Kan Liang
  1 sibling, 0 replies; 15+ messages in thread
From: tip-bot for Kan Liang @ 2017-09-22 16:35 UTC (permalink / raw)
  To: linux-tip-commits
  Cc: kan.liang, namhyung, tglx, ak, lukasz.odzioba, linux-kernel,
	peterz, hpa, acme, jolsa, adrian.hunter, mingo

Commit-ID:  ecdad24d7a4480c9af0ff6dbe00ac8bbae720d19
Gitweb:     http://git.kernel.org/tip/ecdad24d7a4480c9af0ff6dbe00ac8bbae720d19
Author:     Kan Liang <kan.liang@intel.com>
AuthorDate: Thu, 7 Sep 2017 10:55:46 -0700
Committer:  Arnaldo Carvalho de Melo <acme@redhat.com>
CommitDate: Wed, 13 Sep 2017 09:49:15 -0300

perf tools: Use scandir() to replace readdir()

In perf_event__synthesize_threads() perf goes through all proc files
serially by readdir.

scandir() does a snapshoot of /proc, which is multithreading friendly.

It's possible that some threads which are added during event synthesize.
But the number of lost threads should be small.  They should not impact
the final analysis.

Signed-off-by: Kan Liang <kan.liang@intel.com>
Cc: Adrian Hunter <adrian.hunter@intel.com>
Cc: Andi Kleen <ak@linux.intel.com>
Cc: Jiri Olsa <jolsa@kernel.org>
Cc: Lukasz Odzioba <lukasz.odzioba@intel.com>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Link: http://lkml.kernel.org/r/1504806954-150842-3-git-send-email-kan.liang@intel.com
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
---
 tools/perf/util/event.c | 45 +++++++++++++++++++++++++--------------------
 1 file changed, 25 insertions(+), 20 deletions(-)

diff --git a/tools/perf/util/event.c b/tools/perf/util/event.c
index 1c905ba..17c21ea 100644
--- a/tools/perf/util/event.c
+++ b/tools/perf/util/event.c
@@ -683,12 +683,14 @@ int perf_event__synthesize_threads(struct perf_tool *tool,
 				   bool mmap_data,
 				   unsigned int proc_map_timeout)
 {
-	DIR *proc;
-	char proc_path[PATH_MAX];
-	struct dirent *dirent;
 	union perf_event *comm_event, *mmap_event, *fork_event;
 	union perf_event *namespaces_event;
+	char proc_path[PATH_MAX];
+	struct dirent **dirent;
 	int err = -1;
+	char *end;
+	pid_t pid;
+	int n, i;
 
 	if (machine__is_default_guest(machine))
 		return 0;
@@ -712,29 +714,32 @@ int perf_event__synthesize_threads(struct perf_tool *tool,
 		goto out_free_fork;
 
 	snprintf(proc_path, sizeof(proc_path), "%s/proc", machine->root_dir);
-	proc = opendir(proc_path);
+	n = scandir(proc_path, &dirent, 0, alphasort);
 
-	if (proc == NULL)
+	if (n < 0)
 		goto out_free_namespaces;
 
-	while ((dirent = readdir(proc)) != NULL) {
-		char *end;
-		pid_t pid = strtol(dirent->d_name, &end, 10);
-
-		if (*end) /* only interested in proper numerical dirents */
+	for (i = 0; i < n; i++) {
+		if (!isdigit(dirent[i]->d_name[0]))
 			continue;
-		/*
- 		 * We may race with exiting thread, so don't stop just because
- 		 * one thread couldn't be synthesized.
- 		 */
-		__event__synthesize_thread(comm_event, mmap_event, fork_event,
-					   namespaces_event, pid, 1, process,
-					   tool, machine, mmap_data,
-					   proc_map_timeout);
-	}
 
+		pid = (pid_t)strtol(dirent[i]->d_name, &end, 10);
+		/* only interested in proper numerical dirents */
+		if (!*end) {
+			/*
+			 * We may race with exiting thread, so don't stop just because
+			 * one thread couldn't be synthesized.
+			 */
+			__event__synthesize_thread(comm_event, mmap_event, fork_event,
+						   namespaces_event, pid, 1, process,
+						   tool, machine, mmap_data,
+						   proc_map_timeout);
+		}
+		free(dirent[i]);
+	}
+	free(dirent);
 	err = 0;
-	closedir(proc);
+
 out_free_namespaces:
 	free(namespaces_event);
 out_free_fork:

^ permalink raw reply related	[flat|nested] 15+ messages in thread

end of thread, other threads:[~2017-09-22 16:38 UTC | newest]

Thread overview: 15+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2017-09-07 17:55 [PATCH RFC 00/10] perf top optimization kan.liang
2017-09-07 17:55 ` [PATCH RFC 01/10] perf tools: hashtable for machine threads kan.liang
2017-09-08 14:18   ` Arnaldo Carvalho de Melo
2017-09-07 17:55 ` [PATCH RFC 02/10] perf tools: using scandir to replace readdir kan.liang
2017-09-08 14:11   ` Arnaldo Carvalho de Melo
2017-09-22 16:35   ` [tip:perf/core] perf tools: Use scandir() to replace readdir() tip-bot for Kan Liang
2017-09-07 17:55 ` [PATCH RFC 03/10] petf tools: using comm_str to replace comm in hist_entry kan.liang
2017-09-07 17:55 ` [PATCH RFC 04/10] petf tools: introduce a new function to set namespaces id kan.liang
2017-09-07 17:55 ` [PATCH RFC 05/10] perf tools: lock to protect thread list kan.liang
2017-09-07 17:55 ` [PATCH RFC 06/10] perf tools: lock to protect comm_str rb tree kan.liang
2017-09-08 14:27   ` Arnaldo Carvalho de Melo
2017-09-07 17:55 ` [PATCH RFC 07/10] perf tools: change machine comm_exec type to atomic kan.liang
2017-09-07 17:55 ` [PATCH RFC 08/10] perf top: implement multithreading for perf_event__synthesize_threads kan.liang
2017-09-07 17:55 ` [PATCH RFC 09/10] perf top: add option to set the number of thread for event synthesize kan.liang
2017-09-07 17:55 ` [PATCH RFC 10/10] perf top: switch back to overwrite mode kan.liang

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).