linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: kan.liang@intel.com
To: acme@kernel.org, peterz@infradead.org, mingo@redhat.com,
	linux-kernel@vger.kernel.org
Cc: jolsa@kernel.org, namhyung@kernel.org, adrian.hunter@intel.com,
	lukasz.odzioba@intel.com, ak@linux.intel.com,
	Kan Liang <kan.liang@intel.com>
Subject: [PATCH RFC 08/10] perf top: implement multithreading for perf_event__synthesize_threads
Date: Thu,  7 Sep 2017 10:55:52 -0700	[thread overview]
Message-ID: <1504806954-150842-9-git-send-email-kan.liang@intel.com> (raw)
In-Reply-To: <1504806954-150842-1-git-send-email-kan.liang@intel.com>

From: Kan Liang <kan.liang@intel.com>

The proc files which is sorted with alphabetical order are evenly
assigned to several synthesize threads to be processed in parallel.

For perf top, the threads number hard code to online CPU number. The
following patch will introduce an option to set it.
For other perf tools, the thread number is 1. Because the process
function is not ready for multithreading, e.g.
process_synthesized_event.
This patch series only support event synthesize multithreading for perf
top. For other tools, it can be done separately later.

With multithread applied, the total processing time can get up to 1.56x
speedup on Knights Mill for perf top.

For specific single event processing, the processing time could increase
because of the lock contention. So proc_map_timeout may need to be
increased. Otherwise some proc maps will be truncated.
Based on my test, increasing the proc_map_timeout has small impact
on the total processing time. The total processing time still get 1.49x
speedup on Knights Mill after increasing the proc_map_timeout.
The patch itself doesn't increase the proc_map_timeout.

Doesn't need to implement multithreading for
perf_event__synthesize_thread_map. It doesn't have performance issue.

Signed-off-by: Kan Liang <kan.liang@intel.com>
---
 tools/perf/builtin-kvm.c              |   3 +-
 tools/perf/builtin-record.c           |   2 +-
 tools/perf/builtin-top.c              |   4 +-
 tools/perf/builtin-trace.c            |   2 +-
 tools/perf/tests/mmap-thread-lookup.c |   2 +-
 tools/perf/util/event.c               | 144 ++++++++++++++++++++++++++--------
 tools/perf/util/event.h               |  14 +++-
 tools/perf/util/machine.c             |   8 +-
 tools/perf/util/machine.h             |   9 ++-
 9 files changed, 145 insertions(+), 43 deletions(-)

diff --git a/tools/perf/builtin-kvm.c b/tools/perf/builtin-kvm.c
index f309c37..edd14cb 100644
--- a/tools/perf/builtin-kvm.c
+++ b/tools/perf/builtin-kvm.c
@@ -1442,7 +1442,8 @@ static int kvm_events_live(struct perf_kvm_stat *kvm,
 	perf_session__set_id_hdr_size(kvm->session);
 	ordered_events__set_copy_on_queue(&kvm->session->ordered_events, true);
 	machine__synthesize_threads(&kvm->session->machines.host, &kvm->opts.target,
-				    kvm->evlist->threads, false, kvm->opts.proc_map_timeout);
+				    kvm->evlist->threads, false,
+				    kvm->opts.proc_map_timeout, 1);
 	err = kvm_live_open_events(kvm);
 	if (err)
 		goto out;
diff --git a/tools/perf/builtin-record.c b/tools/perf/builtin-record.c
index 56f8142..f8a6c89 100644
--- a/tools/perf/builtin-record.c
+++ b/tools/perf/builtin-record.c
@@ -863,7 +863,7 @@ static int record__synthesize(struct record *rec, bool tail)
 
 	err = __machine__synthesize_threads(machine, tool, &opts->target, rec->evlist->threads,
 					    process_synthesized_event, opts->sample_address,
-					    opts->proc_map_timeout);
+					    opts->proc_map_timeout, 1);
 out:
 	return err;
 }
diff --git a/tools/perf/builtin-top.c b/tools/perf/builtin-top.c
index ee954bd..4b8fdc1 100644
--- a/tools/perf/builtin-top.c
+++ b/tools/perf/builtin-top.c
@@ -959,7 +959,9 @@ static int __cmd_top(struct perf_top *top)
 		goto out_delete;
 
 	machine__synthesize_threads(&top->session->machines.host, &opts->target,
-				    top->evlist->threads, false, opts->proc_map_timeout);
+				    top->evlist->threads, false,
+				    opts->proc_map_timeout,
+				    (unsigned int)sysconf(_SC_NPROCESSORS_ONLN));
 
 	if (perf_hpp_list.socket) {
 		ret = perf_env__read_cpu_topology_map(&perf_env);
diff --git a/tools/perf/builtin-trace.c b/tools/perf/builtin-trace.c
index af4e4e4..1186783 100644
--- a/tools/perf/builtin-trace.c
+++ b/tools/perf/builtin-trace.c
@@ -1131,7 +1131,7 @@ static int trace__symbols_init(struct trace *trace, struct perf_evlist *evlist)
 
 	err = __machine__synthesize_threads(trace->host, &trace->tool, &trace->opts.target,
 					    evlist->threads, trace__tool_process, false,
-					    trace->opts.proc_map_timeout);
+					    trace->opts.proc_map_timeout, 1);
 	if (err)
 		symbol__exit();
 
diff --git a/tools/perf/tests/mmap-thread-lookup.c b/tools/perf/tests/mmap-thread-lookup.c
index f94a419..2a0068a 100644
--- a/tools/perf/tests/mmap-thread-lookup.c
+++ b/tools/perf/tests/mmap-thread-lookup.c
@@ -131,7 +131,7 @@ static int synth_all(struct machine *machine)
 {
 	return perf_event__synthesize_threads(NULL,
 					      perf_event__process,
-					      machine, 0, 500);
+					      machine, 0, 500, 1);
 }
 
 static int synth_process(struct machine *machine)
diff --git a/tools/perf/util/event.c b/tools/perf/util/event.c
index 17c21ea..4f565ff 100644
--- a/tools/perf/util/event.c
+++ b/tools/perf/util/event.c
@@ -677,23 +677,21 @@ int perf_event__synthesize_thread_map(struct perf_tool *tool,
 	return err;
 }
 
-int perf_event__synthesize_threads(struct perf_tool *tool,
-				   perf_event__handler_t process,
-				   struct machine *machine,
-				   bool mmap_data,
-				   unsigned int proc_map_timeout)
+static int __perf_event__synthesize_threads(struct perf_tool *tool,
+					    perf_event__handler_t process,
+					    struct machine *machine,
+					    bool mmap_data,
+					    unsigned int proc_map_timeout,
+					    struct dirent **dirent,
+					    int start,
+					    int num)
 {
 	union perf_event *comm_event, *mmap_event, *fork_event;
 	union perf_event *namespaces_event;
-	char proc_path[PATH_MAX];
-	struct dirent **dirent;
 	int err = -1;
 	char *end;
 	pid_t pid;
-	int n, i;
-
-	if (machine__is_default_guest(machine))
-		return 0;
+	int i;
 
 	comm_event = malloc(sizeof(comm_event->comm) + machine->id_hdr_size);
 	if (comm_event == NULL)
@@ -713,34 +711,24 @@ int perf_event__synthesize_threads(struct perf_tool *tool,
 	if (namespaces_event == NULL)
 		goto out_free_fork;
 
-	snprintf(proc_path, sizeof(proc_path), "%s/proc", machine->root_dir);
-	n = scandir(proc_path, &dirent, 0, alphasort);
-
-	if (n < 0)
-		goto out_free_namespaces;
-
-	for (i = 0; i < n; i++) {
+	for (i = start; i < start + num; i++) {
 		if (!isdigit(dirent[i]->d_name[0]))
 			continue;
 
 		pid = (pid_t)strtol(dirent[i]->d_name, &end, 10);
 		/* only interested in proper numerical dirents */
-		if (!*end) {
-			/*
-			 * We may race with exiting thread, so don't stop just because
-			 * one thread couldn't be synthesized.
-			 */
-			__event__synthesize_thread(comm_event, mmap_event, fork_event,
-						   namespaces_event, pid, 1, process,
-						   tool, machine, mmap_data,
-						   proc_map_timeout);
-		}
-		free(dirent[i]);
+		if (*end)
+			continue;
+		/*
+		 * We may race with exiting thread, so don't stop just because
+		 * one thread couldn't be synthesized.
+		 */
+		__event__synthesize_thread(comm_event, mmap_event, fork_event,
+					   namespaces_event, pid, 1, process,
+					   tool, machine, mmap_data,
+					   proc_map_timeout);
 	}
-	free(dirent);
 	err = 0;
-
-out_free_namespaces:
 	free(namespaces_event);
 out_free_fork:
 	free(fork_event);
@@ -752,6 +740,98 @@ int perf_event__synthesize_threads(struct perf_tool *tool,
 	return err;
 }
 
+static void *synthesize_threads_worker(void *arg)
+{
+	struct synthesize_threads_arg *args = arg;
+
+	__perf_event__synthesize_threads(args->tool, args->process,
+					 args->machine, args->mmap_data,
+					 args->proc_map_timeout, args->dirent,
+					 args->start, args->num);
+	return NULL;
+}
+
+int perf_event__synthesize_threads(struct perf_tool *tool,
+				   perf_event__handler_t process,
+				   struct machine *machine,
+				   bool mmap_data,
+				   unsigned int proc_map_timeout,
+				   unsigned int nr_threads_synthesize)
+{
+	struct synthesize_threads_arg *args = NULL;
+	pthread_t *synthesize_threads = NULL;
+	char proc_path[PATH_MAX];
+	struct dirent **dirent;
+	int num_per_thread;
+	int m, n, i, j;
+	int thread_nr;
+	int base = 0;
+	int err = -1;
+
+
+	if (machine__is_default_guest(machine))
+		return 0;
+
+	snprintf(proc_path, sizeof(proc_path), "%s/proc", machine->root_dir);
+	n = scandir(proc_path, &dirent, 0, alphasort);
+	if (n < 0)
+		return err;
+
+	thread_nr = nr_threads_synthesize;
+	if (thread_nr <= 0)
+		thread_nr = 1;
+	if (thread_nr > n)
+		thread_nr = n;
+
+	synthesize_threads = calloc(sizeof(pthread_t), thread_nr);
+	if (synthesize_threads == NULL)
+		goto free_dirent;
+
+	args = calloc(sizeof(*args), thread_nr);
+	if (args == NULL)
+		goto free_threads;
+
+	num_per_thread = n / thread_nr;
+	m = n % thread_nr;
+	for (i = 0; i < thread_nr; i++) {
+		args[i].tool = tool;
+		args[i].process = process;
+		args[i].machine = machine;
+		args[i].mmap_data = mmap_data;
+		args[i].proc_map_timeout = proc_map_timeout;
+		args[i].dirent = dirent;
+	}
+	for (i = 0; i < m; i++) {
+		args[i].num = num_per_thread + 1;
+		args[i].start = i * args[i].num;
+	}
+	if (i != 0)
+		base = args[i-1].start + args[i-1].num;
+	for (j = i; j < thread_nr; j++) {
+		args[j].num = num_per_thread;
+		args[j].start = base + (j - i) * args[i].num;
+	}
+
+	for (i = 0; i < thread_nr; i++) {
+		if (pthread_create(&synthesize_threads[i], NULL,
+				   synthesize_threads_worker, &args[i]))
+			goto out_join;
+	}
+	err = 0;
+out_join:
+	for (i = 0; i < thread_nr; i++)
+		pthread_join(synthesize_threads[i], NULL);
+	free(args);
+free_threads:
+	free(synthesize_threads);
+free_dirent:
+	for (i = 0; i < n; i++)
+		free(dirent[i]);
+	free(dirent);
+
+	return err;
+}
+
 struct process_symbol_args {
 	const char *name;
 	u64	   start;
diff --git a/tools/perf/util/event.h b/tools/perf/util/event.h
index ee7bcc8..7b987c8 100644
--- a/tools/perf/util/event.h
+++ b/tools/perf/util/event.h
@@ -664,6 +664,17 @@ typedef int (*perf_event__handler_t)(struct perf_tool *tool,
 				     struct perf_sample *sample,
 				     struct machine *machine);
 
+struct synthesize_threads_arg {
+	struct perf_tool *tool;
+	perf_event__handler_t process;
+	struct machine *machine;
+	bool mmap_data;
+	unsigned int proc_map_timeout;
+	struct dirent **dirent;
+	int num;
+	int start;
+};
+
 int perf_event__synthesize_thread_map(struct perf_tool *tool,
 				      struct thread_map *threads,
 				      perf_event__handler_t process,
@@ -680,7 +691,8 @@ int perf_event__synthesize_cpu_map(struct perf_tool *tool,
 int perf_event__synthesize_threads(struct perf_tool *tool,
 				   perf_event__handler_t process,
 				   struct machine *machine, bool mmap_data,
-				   unsigned int proc_map_timeout);
+				   unsigned int proc_map_timeout,
+				   unsigned int nr_threads_synthesize);
 int perf_event__synthesize_kernel_mmap(struct perf_tool *tool,
 				       perf_event__handler_t process,
 				       struct machine *machine);
diff --git a/tools/perf/util/machine.c b/tools/perf/util/machine.c
index f568c22..33728d0 100644
--- a/tools/perf/util/machine.c
+++ b/tools/perf/util/machine.c
@@ -2221,12 +2221,16 @@ int machines__for_each_thread(struct machines *machines,
 int __machine__synthesize_threads(struct machine *machine, struct perf_tool *tool,
 				  struct target *target, struct thread_map *threads,
 				  perf_event__handler_t process, bool data_mmap,
-				  unsigned int proc_map_timeout)
+				  unsigned int proc_map_timeout,
+				  unsigned int nr_threads_synthesize)
 {
 	if (target__has_task(target))
 		return perf_event__synthesize_thread_map(tool, threads, process, machine, data_mmap, proc_map_timeout);
 	else if (target__has_cpu(target))
-		return perf_event__synthesize_threads(tool, process, machine, data_mmap, proc_map_timeout);
+		return perf_event__synthesize_threads(tool, process,
+						      machine, data_mmap,
+						      proc_map_timeout,
+						      nr_threads_synthesize);
 	/* command specified */
 	return 0;
 }
diff --git a/tools/perf/util/machine.h b/tools/perf/util/machine.h
index ec9a47a..db264b8 100644
--- a/tools/perf/util/machine.h
+++ b/tools/perf/util/machine.h
@@ -256,15 +256,18 @@ int machines__for_each_thread(struct machines *machines,
 int __machine__synthesize_threads(struct machine *machine, struct perf_tool *tool,
 				  struct target *target, struct thread_map *threads,
 				  perf_event__handler_t process, bool data_mmap,
-				  unsigned int proc_map_timeout);
+				  unsigned int proc_map_timeout,
+				  unsigned int nr_threads_synthesize);
 static inline
 int machine__synthesize_threads(struct machine *machine, struct target *target,
 				struct thread_map *threads, bool data_mmap,
-				unsigned int proc_map_timeout)
+				unsigned int proc_map_timeout,
+				unsigned int nr_threads_synthesize)
 {
 	return __machine__synthesize_threads(machine, NULL, target, threads,
 					     perf_event__process, data_mmap,
-					     proc_map_timeout);
+					     proc_map_timeout,
+					     nr_threads_synthesize);
 }
 
 pid_t machine__get_current_tid(struct machine *machine, int cpu);
-- 
2.5.5

  parent reply	other threads:[~2017-09-07 17:57 UTC|newest]

Thread overview: 15+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2017-09-07 17:55 [PATCH RFC 00/10] perf top optimization kan.liang
2017-09-07 17:55 ` [PATCH RFC 01/10] perf tools: hashtable for machine threads kan.liang
2017-09-08 14:18   ` Arnaldo Carvalho de Melo
2017-09-07 17:55 ` [PATCH RFC 02/10] perf tools: using scandir to replace readdir kan.liang
2017-09-08 14:11   ` Arnaldo Carvalho de Melo
2017-09-22 16:35   ` [tip:perf/core] perf tools: Use scandir() to replace readdir() tip-bot for Kan Liang
2017-09-07 17:55 ` [PATCH RFC 03/10] petf tools: using comm_str to replace comm in hist_entry kan.liang
2017-09-07 17:55 ` [PATCH RFC 04/10] petf tools: introduce a new function to set namespaces id kan.liang
2017-09-07 17:55 ` [PATCH RFC 05/10] perf tools: lock to protect thread list kan.liang
2017-09-07 17:55 ` [PATCH RFC 06/10] perf tools: lock to protect comm_str rb tree kan.liang
2017-09-08 14:27   ` Arnaldo Carvalho de Melo
2017-09-07 17:55 ` [PATCH RFC 07/10] perf tools: change machine comm_exec type to atomic kan.liang
2017-09-07 17:55 ` kan.liang [this message]
2017-09-07 17:55 ` [PATCH RFC 09/10] perf top: add option to set the number of thread for event synthesize kan.liang
2017-09-07 17:55 ` [PATCH RFC 10/10] perf top: switch back to overwrite mode kan.liang

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=1504806954-150842-9-git-send-email-kan.liang@intel.com \
    --to=kan.liang@intel.com \
    --cc=acme@kernel.org \
    --cc=adrian.hunter@intel.com \
    --cc=ak@linux.intel.com \
    --cc=jolsa@kernel.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=lukasz.odzioba@intel.com \
    --cc=mingo@redhat.com \
    --cc=namhyung@kernel.org \
    --cc=peterz@infradead.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).