All of lore.kernel.org
 help / color / mirror / Atom feed
From: Arnaldo Carvalho de Melo <arnaldo.melo@gmail.com>
To: Namhyung Kim <namhyung@kernel.org>
Cc: Jiri Olsa <jolsa@redhat.com>, David Ahern <dsahern@gmail.com>,
	Jiri Olsa <jolsa@kernel.org>,
	Stephane Eranian <eranian@google.com>,
	LKML <linux-kernel@vger.kernel.org>
Subject: Re: [BUG] segfault in perf-top -- thread refcnt
Date: Mon, 30 Mar 2015 11:58:05 -0300	[thread overview]
Message-ID: <20150330145805.GC32560@kernel.org> (raw)
In-Reply-To: <20150330124852.GA4507@danjae.kornet>

Em Mon, Mar 30, 2015 at 09:48:52PM +0900, Namhyung Kim escreveu:
> Hi Jiri,
> 
> On Mon, Mar 30, 2015 at 01:49:07PM +0200, Jiri Olsa wrote:
> > On Mon, Mar 30, 2015 at 01:21:08PM +0200, Jiri Olsa wrote:
> > > On Mon, Mar 30, 2015 at 12:22:20PM +0200, Jiri Olsa wrote:
> > > > On Mon, Mar 30, 2015 at 10:07:37AM +0200, Jiri Olsa wrote:
> > > > 
> > > > SNIP
> > > > 
> > > > > > 
> > > > > > 2 things:
> > > > > > 1. let run for a long time. go about using the server. do lots of builds,
> > > > > > etc. it takes time
> > > > > > 
> > > > > > 2. use a box with a LOT of cpus (1024 in my case)
> > > > > > 
> > > > > > Make sure ulimit is set to get the core.
> > > > > 
> > > > > reproduced under 24 cpu box with kernel build (make -j25)
> > > > > running on background.. will try to look closer
> > > > > 
> > > > > perf: Segmentation fault
> > > > > -------- backtrace --------
> > > > > ./perf[0x4fd79b]
> > > > > /lib64/libc.so.6(+0x358f0)[0x7f9cbff528f0]
> > > > > ./perf(thread__put+0x5b)[0x4b1a7b]
> > > > > ./perf(hists__delete_entries+0x70)[0x4c8670]
> > > > > ./perf[0x436a88]
> > > > > ./perf[0x4fa73d]
> > > > > ./perf(perf_evlist__tui_browse_hists+0x97)[0x4fc437]
> > > > > ./perf[0x4381d0]
> > > > > /lib64/libpthread.so.0(+0x7ee5)[0x7f9cc1ff2ee5]
> > > > > /lib64/libc.so.6(clone+0x6d)[0x7f9cc0011b8d]
> > > > > [0x0]
> > > > 
> > > > looks like race among __machine__findnew_thread and thread__put
> > > > over the machine->threads rb_tree insert/removal
> > > > 
> > > > is there a reason why thread__put does not erase itself from machine->threads?
> > 
> > that was the reason.. we do this separately.. not in thread__put..
> > is there a reason for this? ;-)
> > 
> > testing attached patch..
> > 
> > jirka
> > 
> > 
> > ---
> > diff --git a/tools/perf/util/build-id.c b/tools/perf/util/build-id.c
> > index f7fb258..966564a 100644
> > --- a/tools/perf/util/build-id.c
> > +++ b/tools/perf/util/build-id.c
> > @@ -60,7 +60,6 @@ static int perf_event__exit_del_thread(struct perf_tool *tool __maybe_unused,
> >  		    event->fork.ppid, event->fork.ptid);
> >  
> >  	if (thread) {
> > -		rb_erase(&thread->rb_node, &machine->threads);
> >  		if (machine->last_match == thread)
> >  			thread__zput(machine->last_match);
> >  		thread__put(thread);
> > diff --git a/tools/perf/util/machine.c b/tools/perf/util/machine.c
> > index e335330..a8443ef 100644
> > --- a/tools/perf/util/machine.c
> > +++ b/tools/perf/util/machine.c
> > @@ -30,6 +30,7 @@ int machine__init(struct machine *machine, const char *root_dir, pid_t pid)
> >  	dsos__init(&machine->kernel_dsos);
> >  
> >  	machine->threads = RB_ROOT;
> > +	pthread_mutex_init(&machine->threads_lock, NULL);
> >  	INIT_LIST_HEAD(&machine->dead_threads);
> >  	machine->last_match = NULL;
> >  
> > @@ -380,10 +381,13 @@ static struct thread *__machine__findnew_thread(struct machine *machine,
> >  	if (!create)
> >  		return NULL;
> >  
> > -	th = thread__new(pid, tid);
> > +	th = thread__new(machine, pid, tid);
> >  	if (th != NULL) {
> > +
> > +		pthread_mutex_lock(&machine->threads_lock);
> >  		rb_link_node(&th->rb_node, parent, p);
> >  		rb_insert_color(&th->rb_node, &machine->threads);
> > +		pthread_mutex_unlock(&machine->threads_lock);
> 
> I think you also need to protect the rb tree traversal above.
> 
> But this makes every sample processing grabs and releases the lock so
> might cause high overhead.  It can be a problem if such processing is
> done parallelly like my multi-thread work. :-/

Still untested, using rw lock, next step is auditing the
machine__findnew_thread users that really should be using
machine__find_thread, i.e. grabbing just the reader lock, and measuring
the overhead of using a pthread rw lock instead of pthread_mutex_t as
Jiri is doing.

- Arnaldo

diff --git a/tools/perf/util/build-id.c b/tools/perf/util/build-id.c
index f7fb2587df69..e3c80bab47a3 100644
--- a/tools/perf/util/build-id.c
+++ b/tools/perf/util/build-id.c
@@ -60,7 +60,9 @@ static int perf_event__exit_del_thread(struct perf_tool *tool __maybe_unused,
 		    event->fork.ppid, event->fork.ptid);
 
 	if (thread) {
+		pthread_rwlock_wrlock(&machine->threads_lock);
 		rb_erase(&thread->rb_node, &machine->threads);
+		pthread_rwlock_unlock(&machine->threads_lock);
 		if (machine->last_match == thread)
 			thread__zput(machine->last_match);
 		thread__put(thread);
diff --git a/tools/perf/util/machine.c b/tools/perf/util/machine.c
index e45c8f33a8fd..b901ed27a793 100644
--- a/tools/perf/util/machine.c
+++ b/tools/perf/util/machine.c
@@ -30,6 +30,7 @@ int machine__init(struct machine *machine, const char *root_dir, pid_t pid)
 	dsos__init(&machine->kernel_dsos);
 
 	machine->threads = RB_ROOT;
+	pthread_rwlock_init(&machine->threads_lock, NULL);
 	INIT_LIST_HEAD(&machine->dead_threads);
 	machine->last_match = NULL;
 
@@ -111,6 +112,7 @@ void machine__exit(struct machine *machine)
 	vdso__exit(machine);
 	zfree(&machine->root_dir);
 	zfree(&machine->current_tid);
+	pthread_rwlock_destroy(&machines->threads_lock);
 }
 
 void machine__delete(struct machine *machine)
@@ -411,13 +413,22 @@ static struct thread *__machine__findnew_thread(struct machine *machine,
 struct thread *machine__findnew_thread(struct machine *machine, pid_t pid,
 				       pid_t tid)
 {
-	return __machine__findnew_thread(machine, pid, tid, true);
+	struct thread *th;
+
+	pthread_rwlock_wrlock(&machine->threads_lock);
+	th = __machine__findnew_thread(machine, pid, tid, true);
+	pthread_rwlock_unlock(&machine->threads_lock);
+	return th;
 }
 
 struct thread *machine__find_thread(struct machine *machine, pid_t pid,
 				    pid_t tid)
 {
-	return __machine__findnew_thread(machine, pid, tid, false);
+	struct thread *th;
+	pthread_rwlock_rdlock(&machine->threads_lock);
+	th =  __machine__findnew_thread(machine, pid, tid, false);
+	pthread_rwlock_unlock(&machine->threads_lock);
+	return th;
 }
 
 struct comm *machine__thread_exec_comm(struct machine *machine,
@@ -1258,7 +1269,9 @@ static void machine__remove_thread(struct machine *machine, struct thread *th)
 	if (machine->last_match == th)
 		thread__zput(machine->last_match);
 
+	pthread_rwlock_wrlock(&machine->threads_lock);
 	rb_erase(&th->rb_node, &machine->threads);
+	pthread_rwlock_unlock(&machine->threads_lock);
 	/*
 	 * Move it first to the dead_threads list, then drop the reference,
 	 * if this is the last reference, then the thread__delete destructor
diff --git a/tools/perf/util/machine.h b/tools/perf/util/machine.h
index e2faf3b47e7b..c2b9402921fc 100644
--- a/tools/perf/util/machine.h
+++ b/tools/perf/util/machine.h
@@ -30,6 +30,7 @@ struct machine {
 	bool		  comm_exec;
 	char		  *root_dir;
 	struct rb_root	  threads;
+	pthread_rwlock_t  threads_lock;
 	struct list_head  dead_threads;
 	struct thread	  *last_match;
 	struct vdso_info  *vdso_info;

  parent reply	other threads:[~2015-03-30 14:58 UTC|newest]

Thread overview: 24+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2015-03-27 17:31 [BUG] segfault in perf-top -- thread refcnt David Ahern
2015-03-27 19:51 ` Arnaldo Carvalho de Melo
2015-03-27 20:11 ` Arnaldo Carvalho de Melo
2015-03-27 20:13   ` David Ahern
2015-03-30  8:07     ` Jiri Olsa
2015-03-30 10:22       ` Jiri Olsa
2015-03-30 11:21         ` Jiri Olsa
2015-03-30 11:49           ` Jiri Olsa
2015-03-30 12:48             ` Namhyung Kim
2015-03-30 12:56               ` Jiri Olsa
2015-03-30 13:06                 ` Namhyung Kim
2015-03-30 14:02                   ` Arnaldo Carvalho de Melo
2015-03-31  0:15                     ` Namhyung Kim
2015-03-30 13:07                 ` Arnaldo Carvalho de Melo
2015-03-30 13:20                   ` Jiri Olsa
2015-03-30 13:59                     ` Arnaldo Carvalho de Melo
2015-03-30 14:58               ` Arnaldo Carvalho de Melo [this message]
2015-03-30 15:13                 ` Arnaldo Carvalho de Melo
2015-03-31  0:27                   ` Namhyung Kim
2015-03-31  0:46                     ` Arnaldo Carvalho de Melo
2015-03-31  7:21                       ` Namhyung Kim
2015-03-30 13:22             ` Arnaldo Carvalho de Melo
2015-03-30 13:09         ` Arnaldo Carvalho de Melo
2015-03-30 13:17         ` Arnaldo Carvalho de Melo

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20150330145805.GC32560@kernel.org \
    --to=arnaldo.melo@gmail.com \
    --cc=dsahern@gmail.com \
    --cc=eranian@google.com \
    --cc=jolsa@kernel.org \
    --cc=jolsa@redhat.com \
    --cc=linux-kernel@vger.kernel.org \
    --cc=namhyung@kernel.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.