linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [BUG] segfault in perf-top -- thread refcnt
@ 2015-03-27 17:31 David Ahern
  2015-03-27 19:51 ` Arnaldo Carvalho de Melo
  2015-03-27 20:11 ` Arnaldo Carvalho de Melo
  0 siblings, 2 replies; 24+ messages in thread
From: David Ahern @ 2015-03-27 17:31 UTC (permalink / raw)
  To: Arnaldo Carvalho de Melo, Jiri Olsa, Namhyung Kim, Stephane Eranian; +Cc: LKML

Hi Arnaldo:

If I leave 'perf top -z' running it eventually crashes with the 
backtrace in the new thread refcnt code:

(gdb) bt
#0  __list_del_entry (thread=0x670bd30)
     at 
/home/dahern/kernels/linux.git/tools/perf/util/include/../../../../include/linux/list.h:102
#1  list_del_init (thread=0x670bd30)
     at 
/home/dahern/kernels/linux.git/tools/perf/util/include/../../../../include/linux/list.h:145
#2  thread__put (thread=0x670bd30) at util/thread.c:94
#3  0x00000000001bfa3c in __thread__zput (he=0x820adc0) at util/thread.h:48
#4  hist_entry__delete (he=0x820adc0) at util/hist.c:939
#5  0x00000000001c18a4 in hists__delete_entry (hists=0x4fe0f8) at 
util/hist.c:255
#6  hists__delete_entries (hists=0x4fe0f8) at util/hist.c:283
#7  0x00000000001250e8 in perf_top__sort_new_samples (arg=0x7feffffbca8) 
at builtin-top.c:549
#8  0x00000000001e9108 in hist_browser__run (evsel=0x4fdfc0, 
nr_events=<value optimized out>,
     helpline=<value optimized out>, left_exits=<value optimized out>, 
hbt=0xfff800012cd7cce0, min_pcnt=0, env=0x4fe4c0)
     at ui/browsers/hists.c:431
#9  perf_evsel__hists_browse (evsel=0x4fdfc0, nr_events=<value optimized 
out>, helpline=<value optimized out>,
     left_exits=<value optimized out>, hbt=0xfff800012cd7cce0, 
min_pcnt=0, env=0x4fe4c0) at ui/browsers/hists.c:1501
#10 0x00000000001eb390 in perf_evlist__tui_browse_hists (evlist=0x4c9ae0,
     help=0x28dc00 "For a higher level overview, try: perf top --sort 
comm,dso", hbt=0xfff800012cd7cce0,
     min_pcnt=<value optimized out>, env=0x4fe4c0) at 
ui/browsers/hists.c:2022
#11 0x0000000000126188 in display_thread_tui (arg=<value optimized out>) 
at builtin-top.c:582
#12 0xfff8000100130a68 in start_thread () from /lib64/libpthread.so.0
#13 0xfff800010139c598 in __thread_start () from /lib64/libc.so.6
#14 0xfff800010139c598 in __thread_start () from /lib64/libc.so.6


(gdb) fra 2
#2  thread__put (thread=0x670bd30) at util/thread.c:94
94	util/thread.c: No such file or directory.
	in util/thread.c
(gdb) p *thread
$2 = {{rb_node = {__rb_parent_color = 106933920, rb_right = 0x0, rb_left 
= 0x0}, node = {next = 0x65faea0, prev = 0x0}},
   mg = 0x670be20, pid_ = 153034, tid = 153034, ppid = -1, cpu = -1, 
refcnt = 0, shortname = "\000\000", comm_set = false,
   dead = false, comm_list = {next = 0x670bdc0, prev = 0x670bdc0}, 
comm_len = 7, db_id = 0, priv = 0x0, ts = 0x0}


In node element, prev is NULL

^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: [BUG] segfault in perf-top -- thread refcnt
  2015-03-27 17:31 [BUG] segfault in perf-top -- thread refcnt David Ahern
@ 2015-03-27 19:51 ` Arnaldo Carvalho de Melo
  2015-03-27 20:11 ` Arnaldo Carvalho de Melo
  1 sibling, 0 replies; 24+ messages in thread
From: Arnaldo Carvalho de Melo @ 2015-03-27 19:51 UTC (permalink / raw)
  To: David Ahern; +Cc: Jiri Olsa, Namhyung Kim, Stephane Eranian, LKML

Em Fri, Mar 27, 2015 at 11:31:22AM -0600, David Ahern escreveu:
> Hi Arnaldo:
> 
> If I leave 'perf top -z' running it eventually crashes with the backtrace in
> the new thread refcnt code:

Debugging, thanks for the report!

- Arnaldo
 
> (gdb) bt
> #0  __list_del_entry (thread=0x670bd30)
>     at /home/dahern/kernels/linux.git/tools/perf/util/include/../../../../include/linux/list.h:102
> #1  list_del_init (thread=0x670bd30)
>     at /home/dahern/kernels/linux.git/tools/perf/util/include/../../../../include/linux/list.h:145
> #2  thread__put (thread=0x670bd30) at util/thread.c:94
> #3  0x00000000001bfa3c in __thread__zput (he=0x820adc0) at util/thread.h:48
> #4  hist_entry__delete (he=0x820adc0) at util/hist.c:939
> #5  0x00000000001c18a4 in hists__delete_entry (hists=0x4fe0f8) at
> util/hist.c:255
> #6  hists__delete_entries (hists=0x4fe0f8) at util/hist.c:283
> #7  0x00000000001250e8 in perf_top__sort_new_samples (arg=0x7feffffbca8) at
> builtin-top.c:549
> #8  0x00000000001e9108 in hist_browser__run (evsel=0x4fdfc0,
> nr_events=<value optimized out>,
>     helpline=<value optimized out>, left_exits=<value optimized out>,
> hbt=0xfff800012cd7cce0, min_pcnt=0, env=0x4fe4c0)
>     at ui/browsers/hists.c:431
> #9  perf_evsel__hists_browse (evsel=0x4fdfc0, nr_events=<value optimized
> out>, helpline=<value optimized out>,
>     left_exits=<value optimized out>, hbt=0xfff800012cd7cce0, min_pcnt=0,
> env=0x4fe4c0) at ui/browsers/hists.c:1501
> #10 0x00000000001eb390 in perf_evlist__tui_browse_hists (evlist=0x4c9ae0,
>     help=0x28dc00 "For a higher level overview, try: perf top --sort
> comm,dso", hbt=0xfff800012cd7cce0,
>     min_pcnt=<value optimized out>, env=0x4fe4c0) at
> ui/browsers/hists.c:2022
> #11 0x0000000000126188 in display_thread_tui (arg=<value optimized out>) at
> builtin-top.c:582
> #12 0xfff8000100130a68 in start_thread () from /lib64/libpthread.so.0
> #13 0xfff800010139c598 in __thread_start () from /lib64/libc.so.6
> #14 0xfff800010139c598 in __thread_start () from /lib64/libc.so.6
> 
> 
> (gdb) fra 2
> #2  thread__put (thread=0x670bd30) at util/thread.c:94
> 94	util/thread.c: No such file or directory.
> 	in util/thread.c
> (gdb) p *thread
> $2 = {{rb_node = {__rb_parent_color = 106933920, rb_right = 0x0, rb_left =
> 0x0}, node = {next = 0x65faea0, prev = 0x0}},
>   mg = 0x670be20, pid_ = 153034, tid = 153034, ppid = -1, cpu = -1, refcnt =
> 0, shortname = "\000\000", comm_set = false,
>   dead = false, comm_list = {next = 0x670bdc0, prev = 0x670bdc0}, comm_len =
> 7, db_id = 0, priv = 0x0, ts = 0x0}
> 
> 
> In node element, prev is NULL

^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: [BUG] segfault in perf-top -- thread refcnt
  2015-03-27 17:31 [BUG] segfault in perf-top -- thread refcnt David Ahern
  2015-03-27 19:51 ` Arnaldo Carvalho de Melo
@ 2015-03-27 20:11 ` Arnaldo Carvalho de Melo
  2015-03-27 20:13   ` David Ahern
  1 sibling, 1 reply; 24+ messages in thread
From: Arnaldo Carvalho de Melo @ 2015-03-27 20:11 UTC (permalink / raw)
  To: David Ahern; +Cc: Jiri Olsa, Namhyung Kim, Stephane Eranian, LKML

Em Fri, Mar 27, 2015 at 11:31:22AM -0600, David Ahern escreveu:
> Hi Arnaldo:
> 
> If I leave 'perf top -z' running it eventually crashes with the backtrace in
> the new thread refcnt code:

Humm, tried here with:

perf top -d 1 -z

I.e. as you, with the --tui, and also with --stdio, so far couldn't
reproduce.
 
> (gdb) bt
> #0  __list_del_entry (thread=0x670bd30)
>     at /home/dahern/kernels/linux.git/tools/perf/util/include/../../../../include/linux/list.h:102
> #1  list_del_init (thread=0x670bd30)
>     at /home/dahern/kernels/linux.git/tools/perf/util/include/../../../../include/linux/list.h:145
> #2  thread__put (thread=0x670bd30) at util/thread.c:94
> #3  0x00000000001bfa3c in __thread__zput (he=0x820adc0) at util/thread.h:48
> #4  hist_entry__delete (he=0x820adc0) at util/hist.c:939
> #5  0x00000000001c18a4 in hists__delete_entry (hists=0x4fe0f8) at
> util/hist.c:255
> #6  hists__delete_entries (hists=0x4fe0f8) at util/hist.c:283
> #7  0x00000000001250e8 in perf_top__sort_new_samples (arg=0x7feffffbca8) at
> builtin-top.c:549
> #8  0x00000000001e9108 in hist_browser__run (evsel=0x4fdfc0,
> nr_events=<value optimized out>,
>     helpline=<value optimized out>, left_exits=<value optimized out>,
> hbt=0xfff800012cd7cce0, min_pcnt=0, env=0x4fe4c0)
>     at ui/browsers/hists.c:431
> #9  perf_evsel__hists_browse (evsel=0x4fdfc0, nr_events=<value optimized
> out>, helpline=<value optimized out>,
>     left_exits=<value optimized out>, hbt=0xfff800012cd7cce0, min_pcnt=0,
> env=0x4fe4c0) at ui/browsers/hists.c:1501
> #10 0x00000000001eb390 in perf_evlist__tui_browse_hists (evlist=0x4c9ae0,
>     help=0x28dc00 "For a higher level overview, try: perf top --sort
> comm,dso", hbt=0xfff800012cd7cce0,
>     min_pcnt=<value optimized out>, env=0x4fe4c0) at
> ui/browsers/hists.c:2022
> #11 0x0000000000126188 in display_thread_tui (arg=<value optimized out>) at
> builtin-top.c:582
> #12 0xfff8000100130a68 in start_thread () from /lib64/libpthread.so.0
> #13 0xfff800010139c598 in __thread_start () from /lib64/libc.so.6
> #14 0xfff800010139c598 in __thread_start () from /lib64/libc.so.6
> 
> 
> (gdb) fra 2
> #2  thread__put (thread=0x670bd30) at util/thread.c:94
> 94	util/thread.c: No such file or directory.
> 	in util/thread.c
> (gdb) p *thread
> $2 = {{rb_node = {__rb_parent_color = 106933920, rb_right = 0x0, rb_left =
> 0x0}, node = {next = 0x65faea0, prev = 0x0}},
>   mg = 0x670be20, pid_ = 153034, tid = 153034, ppid = -1, cpu = -1, refcnt =
> 0, shortname = "\000\000", comm_set = false,
>   dead = false, comm_list = {next = 0x670bdc0, prev = 0x670bdc0}, comm_len =
> 7, db_id = 0, priv = 0x0, ts = 0x0}
> 
> 
> In node element, prev is NULL

^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: [BUG] segfault in perf-top -- thread refcnt
  2015-03-27 20:11 ` Arnaldo Carvalho de Melo
@ 2015-03-27 20:13   ` David Ahern
  2015-03-30  8:07     ` Jiri Olsa
  0 siblings, 1 reply; 24+ messages in thread
From: David Ahern @ 2015-03-27 20:13 UTC (permalink / raw)
  To: Arnaldo Carvalho de Melo; +Cc: Jiri Olsa, Namhyung Kim, Stephane Eranian, LKML

On 3/27/15 2:11 PM, Arnaldo Carvalho de Melo wrote:
> Em Fri, Mar 27, 2015 at 11:31:22AM -0600, David Ahern escreveu:
>> >Hi Arnaldo:
>> >
>> >If I leave 'perf top -z' running it eventually crashes with the backtrace in
>> >the new thread refcnt code:
> Humm, tried here with:
>
> perf top -d 1 -z
>
> I.e. as you, with the --tui, and also with --stdio, so far couldn't
> reproduce.
>

2 things:
1. let run for a long time. go about using the server. do lots of 
builds, etc. it takes time

2. use a box with a LOT of cpus (1024 in my case)

Make sure ulimit is set to get the core.


^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: [BUG] segfault in perf-top -- thread refcnt
  2015-03-27 20:13   ` David Ahern
@ 2015-03-30  8:07     ` Jiri Olsa
  2015-03-30 10:22       ` Jiri Olsa
  0 siblings, 1 reply; 24+ messages in thread
From: Jiri Olsa @ 2015-03-30  8:07 UTC (permalink / raw)
  To: David Ahern
  Cc: Arnaldo Carvalho de Melo, Jiri Olsa, Namhyung Kim,
	Stephane Eranian, LKML

On Fri, Mar 27, 2015 at 02:13:26PM -0600, David Ahern wrote:
> On 3/27/15 2:11 PM, Arnaldo Carvalho de Melo wrote:
> >Em Fri, Mar 27, 2015 at 11:31:22AM -0600, David Ahern escreveu:
> >>>Hi Arnaldo:
> >>>
> >>>If I leave 'perf top -z' running it eventually crashes with the backtrace in
> >>>the new thread refcnt code:
> >Humm, tried here with:
> >
> >perf top -d 1 -z
> >
> >I.e. as you, with the --tui, and also with --stdio, so far couldn't
> >reproduce.
> >
> 
> 2 things:
> 1. let run for a long time. go about using the server. do lots of builds,
> etc. it takes time
> 
> 2. use a box with a LOT of cpus (1024 in my case)
> 
> Make sure ulimit is set to get the core.

reproduced under 24 cpu box with kernel build (make -j25)
running on background.. will try to look closer

perf: Segmentation fault
-------- backtrace --------
./perf[0x4fd79b]
/lib64/libc.so.6(+0x358f0)[0x7f9cbff528f0]
./perf(thread__put+0x5b)[0x4b1a7b]
./perf(hists__delete_entries+0x70)[0x4c8670]
./perf[0x436a88]
./perf[0x4fa73d]
./perf(perf_evlist__tui_browse_hists+0x97)[0x4fc437]
./perf[0x4381d0]
/lib64/libpthread.so.0(+0x7ee5)[0x7f9cc1ff2ee5]
/lib64/libc.so.6(clone+0x6d)[0x7f9cc0011b8d]
[0x0]


jirka

^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: [BUG] segfault in perf-top -- thread refcnt
  2015-03-30  8:07     ` Jiri Olsa
@ 2015-03-30 10:22       ` Jiri Olsa
  2015-03-30 11:21         ` Jiri Olsa
                           ` (2 more replies)
  0 siblings, 3 replies; 24+ messages in thread
From: Jiri Olsa @ 2015-03-30 10:22 UTC (permalink / raw)
  To: David Ahern
  Cc: Arnaldo Carvalho de Melo, Jiri Olsa, Namhyung Kim,
	Stephane Eranian, LKML

On Mon, Mar 30, 2015 at 10:07:37AM +0200, Jiri Olsa wrote:

SNIP

> > 
> > 2 things:
> > 1. let run for a long time. go about using the server. do lots of builds,
> > etc. it takes time
> > 
> > 2. use a box with a LOT of cpus (1024 in my case)
> > 
> > Make sure ulimit is set to get the core.
> 
> reproduced under 24 cpu box with kernel build (make -j25)
> running on background.. will try to look closer
> 
> perf: Segmentation fault
> -------- backtrace --------
> ./perf[0x4fd79b]
> /lib64/libc.so.6(+0x358f0)[0x7f9cbff528f0]
> ./perf(thread__put+0x5b)[0x4b1a7b]
> ./perf(hists__delete_entries+0x70)[0x4c8670]
> ./perf[0x436a88]
> ./perf[0x4fa73d]
> ./perf(perf_evlist__tui_browse_hists+0x97)[0x4fc437]
> ./perf[0x4381d0]
> /lib64/libpthread.so.0(+0x7ee5)[0x7f9cc1ff2ee5]
> /lib64/libc.so.6(clone+0x6d)[0x7f9cc0011b8d]
> [0x0]

looks like race among __machine__findnew_thread and thread__put
over the machine->threads rb_tree insert/removal

is there a reason why thread__put does not erase itself from machine->threads?

I'm trying attached patch.. so far so gut ;-)

jirka


---
diff --git a/tools/perf/util/machine.c b/tools/perf/util/machine.c
index e335330..7e6abc7 100644
--- a/tools/perf/util/machine.c
+++ b/tools/perf/util/machine.c
@@ -30,6 +30,7 @@ int machine__init(struct machine *machine, const char *root_dir, pid_t pid)
 	dsos__init(&machine->kernel_dsos);
 
 	machine->threads = RB_ROOT;
+	pthread_mutex_init(&machine->threads_lock, NULL);
 	INIT_LIST_HEAD(&machine->dead_threads);
 	machine->last_match = NULL;
 
@@ -380,10 +381,13 @@ static struct thread *__machine__findnew_thread(struct machine *machine,
 	if (!create)
 		return NULL;
 
-	th = thread__new(pid, tid);
+	th = thread__new(machine, pid, tid);
 	if (th != NULL) {
+
+		pthread_mutex_lock(&machine->threads_lock);
 		rb_link_node(&th->rb_node, parent, p);
 		rb_insert_color(&th->rb_node, &machine->threads);
+		pthread_mutex_unlock(&machine->threads_lock);
 
 		/*
 		 * We have to initialize map_groups separately
@@ -394,8 +398,10 @@ static struct thread *__machine__findnew_thread(struct machine *machine,
 		 * leader and that would screwed the rb tree.
 		 */
 		if (thread__init_map_groups(th, machine)) {
+			pthread_mutex_lock(&machine->threads_lock);
 			rb_erase(&th->rb_node, &machine->threads);
 			thread__delete(th);
+			pthread_mutex_unlock(&machine->threads_lock);
 			return NULL;
 		}
 		/*
diff --git a/tools/perf/util/machine.h b/tools/perf/util/machine.h
index e2faf3b..e3468d6 100644
--- a/tools/perf/util/machine.h
+++ b/tools/perf/util/machine.h
@@ -30,6 +30,7 @@ struct machine {
 	bool		  comm_exec;
 	char		  *root_dir;
 	struct rb_root	  threads;
+	pthread_mutex_t	  threads_lock;
 	struct list_head  dead_threads;
 	struct thread	  *last_match;
 	struct vdso_info  *vdso_info;
diff --git a/tools/perf/util/thread.c b/tools/perf/util/thread.c
index 1c8fbc9..4592fc4 100644
--- a/tools/perf/util/thread.c
+++ b/tools/perf/util/thread.c
@@ -26,7 +26,7 @@ int thread__init_map_groups(struct thread *thread, struct machine *machine)
 	return thread->mg ? 0 : -1;
 }
 
-struct thread *thread__new(pid_t pid, pid_t tid)
+struct thread *thread__new(struct machine *machine, pid_t pid, pid_t tid)
 {
 	char *comm_str;
 	struct comm *comm;
@@ -38,6 +38,7 @@ struct thread *thread__new(pid_t pid, pid_t tid)
 		thread->ppid = -1;
 		thread->cpu = -1;
 		INIT_LIST_HEAD(&thread->comm_list);
+		thread->machine = machine;
 
 		if (unwind__prepare_access(thread) < 0)
 			goto err_thread;
@@ -91,7 +92,14 @@ struct thread *thread__get(struct thread *thread)
 void thread__put(struct thread *thread)
 {
 	if (thread && --thread->refcnt == 0) {
+		struct machine *machine = thread->machine;
+
 		list_del_init(&thread->node);
+
+		pthread_mutex_lock(&machine->threads_lock);
+		rb_erase(&thread->rb_node, &machine->threads);
+		pthread_mutex_unlock(&machine->threads_lock);
+
 		thread__delete(thread);
 	}
 }
diff --git a/tools/perf/util/thread.h b/tools/perf/util/thread.h
index 9b8a54d..df6fb69 100644
--- a/tools/perf/util/thread.h
+++ b/tools/perf/util/thread.h
@@ -31,12 +31,13 @@ struct thread {
 
 	void			*priv;
 	struct thread_stack	*ts;
+	struct machine		*machine;
 };
 
 struct machine;
 struct comm;
 
-struct thread *thread__new(pid_t pid, pid_t tid);
+struct thread *thread__new(struct machine *machine, pid_t pid, pid_t tid);
 int thread__init_map_groups(struct thread *thread, struct machine *machine);
 void thread__delete(struct thread *thread);
 

^ permalink raw reply related	[flat|nested] 24+ messages in thread

* Re: [BUG] segfault in perf-top -- thread refcnt
  2015-03-30 10:22       ` Jiri Olsa
@ 2015-03-30 11:21         ` Jiri Olsa
  2015-03-30 11:49           ` Jiri Olsa
  2015-03-30 13:09         ` Arnaldo Carvalho de Melo
  2015-03-30 13:17         ` Arnaldo Carvalho de Melo
  2 siblings, 1 reply; 24+ messages in thread
From: Jiri Olsa @ 2015-03-30 11:21 UTC (permalink / raw)
  To: David Ahern
  Cc: Arnaldo Carvalho de Melo, Jiri Olsa, Namhyung Kim,
	Stephane Eranian, LKML

On Mon, Mar 30, 2015 at 12:22:20PM +0200, Jiri Olsa wrote:
> On Mon, Mar 30, 2015 at 10:07:37AM +0200, Jiri Olsa wrote:
> 
> SNIP
> 
> > > 
> > > 2 things:
> > > 1. let run for a long time. go about using the server. do lots of builds,
> > > etc. it takes time
> > > 
> > > 2. use a box with a LOT of cpus (1024 in my case)
> > > 
> > > Make sure ulimit is set to get the core.
> > 
> > reproduced under 24 cpu box with kernel build (make -j25)
> > running on background.. will try to look closer
> > 
> > perf: Segmentation fault
> > -------- backtrace --------
> > ./perf[0x4fd79b]
> > /lib64/libc.so.6(+0x358f0)[0x7f9cbff528f0]
> > ./perf(thread__put+0x5b)[0x4b1a7b]
> > ./perf(hists__delete_entries+0x70)[0x4c8670]
> > ./perf[0x436a88]
> > ./perf[0x4fa73d]
> > ./perf(perf_evlist__tui_browse_hists+0x97)[0x4fc437]
> > ./perf[0x4381d0]
> > /lib64/libpthread.so.0(+0x7ee5)[0x7f9cc1ff2ee5]
> > /lib64/libc.so.6(clone+0x6d)[0x7f9cc0011b8d]
> > [0x0]
> 
> looks like race among __machine__findnew_thread and thread__put
> over the machine->threads rb_tree insert/removal
> 
> is there a reason why thread__put does not erase itself from machine->threads?
> 
> I'm trying attached patch.. so far so gut ;-)

arghh.. it blowed up during the lunch :-\

jirka

^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: [BUG] segfault in perf-top -- thread refcnt
  2015-03-30 11:21         ` Jiri Olsa
@ 2015-03-30 11:49           ` Jiri Olsa
  2015-03-30 12:48             ` Namhyung Kim
  2015-03-30 13:22             ` Arnaldo Carvalho de Melo
  0 siblings, 2 replies; 24+ messages in thread
From: Jiri Olsa @ 2015-03-30 11:49 UTC (permalink / raw)
  To: David Ahern
  Cc: Arnaldo Carvalho de Melo, Jiri Olsa, Namhyung Kim,
	Stephane Eranian, LKML

On Mon, Mar 30, 2015 at 01:21:08PM +0200, Jiri Olsa wrote:
> On Mon, Mar 30, 2015 at 12:22:20PM +0200, Jiri Olsa wrote:
> > On Mon, Mar 30, 2015 at 10:07:37AM +0200, Jiri Olsa wrote:
> > 
> > SNIP
> > 
> > > > 
> > > > 2 things:
> > > > 1. let run for a long time. go about using the server. do lots of builds,
> > > > etc. it takes time
> > > > 
> > > > 2. use a box with a LOT of cpus (1024 in my case)
> > > > 
> > > > Make sure ulimit is set to get the core.
> > > 
> > > reproduced under 24 cpu box with kernel build (make -j25)
> > > running on background.. will try to look closer
> > > 
> > > perf: Segmentation fault
> > > -------- backtrace --------
> > > ./perf[0x4fd79b]
> > > /lib64/libc.so.6(+0x358f0)[0x7f9cbff528f0]
> > > ./perf(thread__put+0x5b)[0x4b1a7b]
> > > ./perf(hists__delete_entries+0x70)[0x4c8670]
> > > ./perf[0x436a88]
> > > ./perf[0x4fa73d]
> > > ./perf(perf_evlist__tui_browse_hists+0x97)[0x4fc437]
> > > ./perf[0x4381d0]
> > > /lib64/libpthread.so.0(+0x7ee5)[0x7f9cc1ff2ee5]
> > > /lib64/libc.so.6(clone+0x6d)[0x7f9cc0011b8d]
> > > [0x0]
> > 
> > looks like race among __machine__findnew_thread and thread__put
> > over the machine->threads rb_tree insert/removal
> > 
> > is there a reason why thread__put does not erase itself from machine->threads?

that was the reason.. we do this separately.. not in thread__put..
is there a reason for this? ;-)

testing attached patch..

jirka


---
diff --git a/tools/perf/util/build-id.c b/tools/perf/util/build-id.c
index f7fb258..966564a 100644
--- a/tools/perf/util/build-id.c
+++ b/tools/perf/util/build-id.c
@@ -60,7 +60,6 @@ static int perf_event__exit_del_thread(struct perf_tool *tool __maybe_unused,
 		    event->fork.ppid, event->fork.ptid);
 
 	if (thread) {
-		rb_erase(&thread->rb_node, &machine->threads);
 		if (machine->last_match == thread)
 			thread__zput(machine->last_match);
 		thread__put(thread);
diff --git a/tools/perf/util/machine.c b/tools/perf/util/machine.c
index e335330..a8443ef 100644
--- a/tools/perf/util/machine.c
+++ b/tools/perf/util/machine.c
@@ -30,6 +30,7 @@ int machine__init(struct machine *machine, const char *root_dir, pid_t pid)
 	dsos__init(&machine->kernel_dsos);
 
 	machine->threads = RB_ROOT;
+	pthread_mutex_init(&machine->threads_lock, NULL);
 	INIT_LIST_HEAD(&machine->dead_threads);
 	machine->last_match = NULL;
 
@@ -380,10 +381,13 @@ static struct thread *__machine__findnew_thread(struct machine *machine,
 	if (!create)
 		return NULL;
 
-	th = thread__new(pid, tid);
+	th = thread__new(machine, pid, tid);
 	if (th != NULL) {
+
+		pthread_mutex_lock(&machine->threads_lock);
 		rb_link_node(&th->rb_node, parent, p);
 		rb_insert_color(&th->rb_node, &machine->threads);
+		pthread_mutex_unlock(&machine->threads_lock);
 
 		/*
 		 * We have to initialize map_groups separately
@@ -394,7 +398,9 @@ static struct thread *__machine__findnew_thread(struct machine *machine,
 		 * leader and that would screwed the rb tree.
 		 */
 		if (thread__init_map_groups(th, machine)) {
+			pthread_mutex_lock(&machine->threads_lock);
 			rb_erase(&th->rb_node, &machine->threads);
+			pthread_mutex_unlock(&machine->threads_lock);
 			thread__delete(th);
 			return NULL;
 		}
@@ -1258,7 +1264,6 @@ static void machine__remove_thread(struct machine *machine, struct thread *th)
 	if (machine->last_match == th)
 		thread__zput(machine->last_match);
 
-	rb_erase(&th->rb_node, &machine->threads);
 	/*
 	 * Move it first to the dead_threads list, then drop the reference,
 	 * if this is the last reference, then the thread__delete destructor
diff --git a/tools/perf/util/machine.h b/tools/perf/util/machine.h
index e2faf3b..e3468d6 100644
--- a/tools/perf/util/machine.h
+++ b/tools/perf/util/machine.h
@@ -30,6 +30,7 @@ struct machine {
 	bool		  comm_exec;
 	char		  *root_dir;
 	struct rb_root	  threads;
+	pthread_mutex_t	  threads_lock;
 	struct list_head  dead_threads;
 	struct thread	  *last_match;
 	struct vdso_info  *vdso_info;
diff --git a/tools/perf/util/thread.c b/tools/perf/util/thread.c
index 1c8fbc9..4592fc4 100644
--- a/tools/perf/util/thread.c
+++ b/tools/perf/util/thread.c
@@ -26,7 +26,7 @@ int thread__init_map_groups(struct thread *thread, struct machine *machine)
 	return thread->mg ? 0 : -1;
 }
 
-struct thread *thread__new(pid_t pid, pid_t tid)
+struct thread *thread__new(struct machine *machine, pid_t pid, pid_t tid)
 {
 	char *comm_str;
 	struct comm *comm;
@@ -38,6 +38,7 @@ struct thread *thread__new(pid_t pid, pid_t tid)
 		thread->ppid = -1;
 		thread->cpu = -1;
 		INIT_LIST_HEAD(&thread->comm_list);
+		thread->machine = machine;
 
 		if (unwind__prepare_access(thread) < 0)
 			goto err_thread;
@@ -91,7 +92,14 @@ struct thread *thread__get(struct thread *thread)
 void thread__put(struct thread *thread)
 {
 	if (thread && --thread->refcnt == 0) {
+		struct machine *machine = thread->machine;
+
 		list_del_init(&thread->node);
+
+		pthread_mutex_lock(&machine->threads_lock);
+		rb_erase(&thread->rb_node, &machine->threads);
+		pthread_mutex_unlock(&machine->threads_lock);
+
 		thread__delete(thread);
 	}
 }
diff --git a/tools/perf/util/thread.h b/tools/perf/util/thread.h
index 9b8a54d..df6fb69 100644
--- a/tools/perf/util/thread.h
+++ b/tools/perf/util/thread.h
@@ -31,12 +31,13 @@ struct thread {
 
 	void			*priv;
 	struct thread_stack	*ts;
+	struct machine		*machine;
 };
 
 struct machine;
 struct comm;
 
-struct thread *thread__new(pid_t pid, pid_t tid);
+struct thread *thread__new(struct machine *machine, pid_t pid, pid_t tid);
 int thread__init_map_groups(struct thread *thread, struct machine *machine);
 void thread__delete(struct thread *thread);
 

^ permalink raw reply related	[flat|nested] 24+ messages in thread

* Re: [BUG] segfault in perf-top -- thread refcnt
  2015-03-30 11:49           ` Jiri Olsa
@ 2015-03-30 12:48             ` Namhyung Kim
  2015-03-30 12:56               ` Jiri Olsa
  2015-03-30 14:58               ` Arnaldo Carvalho de Melo
  2015-03-30 13:22             ` Arnaldo Carvalho de Melo
  1 sibling, 2 replies; 24+ messages in thread
From: Namhyung Kim @ 2015-03-30 12:48 UTC (permalink / raw)
  To: Jiri Olsa
  Cc: David Ahern, Arnaldo Carvalho de Melo, Jiri Olsa, Stephane Eranian, LKML

Hi Jiri,

On Mon, Mar 30, 2015 at 01:49:07PM +0200, Jiri Olsa wrote:
> On Mon, Mar 30, 2015 at 01:21:08PM +0200, Jiri Olsa wrote:
> > On Mon, Mar 30, 2015 at 12:22:20PM +0200, Jiri Olsa wrote:
> > > On Mon, Mar 30, 2015 at 10:07:37AM +0200, Jiri Olsa wrote:
> > > 
> > > SNIP
> > > 
> > > > > 
> > > > > 2 things:
> > > > > 1. let run for a long time. go about using the server. do lots of builds,
> > > > > etc. it takes time
> > > > > 
> > > > > 2. use a box with a LOT of cpus (1024 in my case)
> > > > > 
> > > > > Make sure ulimit is set to get the core.
> > > > 
> > > > reproduced under 24 cpu box with kernel build (make -j25)
> > > > running on background.. will try to look closer
> > > > 
> > > > perf: Segmentation fault
> > > > -------- backtrace --------
> > > > ./perf[0x4fd79b]
> > > > /lib64/libc.so.6(+0x358f0)[0x7f9cbff528f0]
> > > > ./perf(thread__put+0x5b)[0x4b1a7b]
> > > > ./perf(hists__delete_entries+0x70)[0x4c8670]
> > > > ./perf[0x436a88]
> > > > ./perf[0x4fa73d]
> > > > ./perf(perf_evlist__tui_browse_hists+0x97)[0x4fc437]
> > > > ./perf[0x4381d0]
> > > > /lib64/libpthread.so.0(+0x7ee5)[0x7f9cc1ff2ee5]
> > > > /lib64/libc.so.6(clone+0x6d)[0x7f9cc0011b8d]
> > > > [0x0]
> > > 
> > > looks like race among __machine__findnew_thread and thread__put
> > > over the machine->threads rb_tree insert/removal
> > > 
> > > is there a reason why thread__put does not erase itself from machine->threads?
> 
> that was the reason.. we do this separately.. not in thread__put..
> is there a reason for this? ;-)
> 
> testing attached patch..
> 
> jirka
> 
> 
> ---
> diff --git a/tools/perf/util/build-id.c b/tools/perf/util/build-id.c
> index f7fb258..966564a 100644
> --- a/tools/perf/util/build-id.c
> +++ b/tools/perf/util/build-id.c
> @@ -60,7 +60,6 @@ static int perf_event__exit_del_thread(struct perf_tool *tool __maybe_unused,
>  		    event->fork.ppid, event->fork.ptid);
>  
>  	if (thread) {
> -		rb_erase(&thread->rb_node, &machine->threads);
>  		if (machine->last_match == thread)
>  			thread__zput(machine->last_match);
>  		thread__put(thread);
> diff --git a/tools/perf/util/machine.c b/tools/perf/util/machine.c
> index e335330..a8443ef 100644
> --- a/tools/perf/util/machine.c
> +++ b/tools/perf/util/machine.c
> @@ -30,6 +30,7 @@ int machine__init(struct machine *machine, const char *root_dir, pid_t pid)
>  	dsos__init(&machine->kernel_dsos);
>  
>  	machine->threads = RB_ROOT;
> +	pthread_mutex_init(&machine->threads_lock, NULL);
>  	INIT_LIST_HEAD(&machine->dead_threads);
>  	machine->last_match = NULL;
>  
> @@ -380,10 +381,13 @@ static struct thread *__machine__findnew_thread(struct machine *machine,
>  	if (!create)
>  		return NULL;
>  
> -	th = thread__new(pid, tid);
> +	th = thread__new(machine, pid, tid);
>  	if (th != NULL) {
> +
> +		pthread_mutex_lock(&machine->threads_lock);
>  		rb_link_node(&th->rb_node, parent, p);
>  		rb_insert_color(&th->rb_node, &machine->threads);
> +		pthread_mutex_unlock(&machine->threads_lock);

I think you also need to protect the rb tree traversal above.

But this makes every sample processing grabs and releases the lock so
might cause high overhead.  It can be a problem if such processing is
done parallelly like my multi-thread work. :-/

Thanks,
Namhyung


>  
>  		/*
>  		 * We have to initialize map_groups separately
> @@ -394,7 +398,9 @@ static struct thread *__machine__findnew_thread(struct machine *machine,
>  		 * leader and that would screwed the rb tree.
>  		 */
>  		if (thread__init_map_groups(th, machine)) {
> +			pthread_mutex_lock(&machine->threads_lock);
>  			rb_erase(&th->rb_node, &machine->threads);
> +			pthread_mutex_unlock(&machine->threads_lock);
>  			thread__delete(th);
>  			return NULL;
>  		}
> @@ -1258,7 +1264,6 @@ static void machine__remove_thread(struct machine *machine, struct thread *th)
>  	if (machine->last_match == th)
>  		thread__zput(machine->last_match);
>  
> -	rb_erase(&th->rb_node, &machine->threads);
>  	/*
>  	 * Move it first to the dead_threads list, then drop the reference,
>  	 * if this is the last reference, then the thread__delete destructor
> diff --git a/tools/perf/util/machine.h b/tools/perf/util/machine.h
> index e2faf3b..e3468d6 100644
> --- a/tools/perf/util/machine.h
> +++ b/tools/perf/util/machine.h
> @@ -30,6 +30,7 @@ struct machine {
>  	bool		  comm_exec;
>  	char		  *root_dir;
>  	struct rb_root	  threads;
> +	pthread_mutex_t	  threads_lock;
>  	struct list_head  dead_threads;
>  	struct thread	  *last_match;
>  	struct vdso_info  *vdso_info;
> diff --git a/tools/perf/util/thread.c b/tools/perf/util/thread.c
> index 1c8fbc9..4592fc4 100644
> --- a/tools/perf/util/thread.c
> +++ b/tools/perf/util/thread.c
> @@ -26,7 +26,7 @@ int thread__init_map_groups(struct thread *thread, struct machine *machine)
>  	return thread->mg ? 0 : -1;
>  }
>  
> -struct thread *thread__new(pid_t pid, pid_t tid)
> +struct thread *thread__new(struct machine *machine, pid_t pid, pid_t tid)
>  {
>  	char *comm_str;
>  	struct comm *comm;
> @@ -38,6 +38,7 @@ struct thread *thread__new(pid_t pid, pid_t tid)
>  		thread->ppid = -1;
>  		thread->cpu = -1;
>  		INIT_LIST_HEAD(&thread->comm_list);
> +		thread->machine = machine;
>  
>  		if (unwind__prepare_access(thread) < 0)
>  			goto err_thread;
> @@ -91,7 +92,14 @@ struct thread *thread__get(struct thread *thread)
>  void thread__put(struct thread *thread)
>  {
>  	if (thread && --thread->refcnt == 0) {
> +		struct machine *machine = thread->machine;
> +
>  		list_del_init(&thread->node);
> +
> +		pthread_mutex_lock(&machine->threads_lock);
> +		rb_erase(&thread->rb_node, &machine->threads);
> +		pthread_mutex_unlock(&machine->threads_lock);
> +
>  		thread__delete(thread);
>  	}
>  }
> diff --git a/tools/perf/util/thread.h b/tools/perf/util/thread.h
> index 9b8a54d..df6fb69 100644
> --- a/tools/perf/util/thread.h
> +++ b/tools/perf/util/thread.h
> @@ -31,12 +31,13 @@ struct thread {
>  
>  	void			*priv;
>  	struct thread_stack	*ts;
> +	struct machine		*machine;
>  };
>  
>  struct machine;
>  struct comm;
>  
> -struct thread *thread__new(pid_t pid, pid_t tid);
> +struct thread *thread__new(struct machine *machine, pid_t pid, pid_t tid);
>  int thread__init_map_groups(struct thread *thread, struct machine *machine);
>  void thread__delete(struct thread *thread);
>  

^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: [BUG] segfault in perf-top -- thread refcnt
  2015-03-30 12:48             ` Namhyung Kim
@ 2015-03-30 12:56               ` Jiri Olsa
  2015-03-30 13:06                 ` Namhyung Kim
  2015-03-30 13:07                 ` Arnaldo Carvalho de Melo
  2015-03-30 14:58               ` Arnaldo Carvalho de Melo
  1 sibling, 2 replies; 24+ messages in thread
From: Jiri Olsa @ 2015-03-30 12:56 UTC (permalink / raw)
  To: Namhyung Kim
  Cc: David Ahern, Arnaldo Carvalho de Melo, Jiri Olsa, Stephane Eranian, LKML

On Mon, Mar 30, 2015 at 09:48:52PM +0900, Namhyung Kim wrote:
> Hi Jiri,
> 
> On Mon, Mar 30, 2015 at 01:49:07PM +0200, Jiri Olsa wrote:
> > On Mon, Mar 30, 2015 at 01:21:08PM +0200, Jiri Olsa wrote:
> > > On Mon, Mar 30, 2015 at 12:22:20PM +0200, Jiri Olsa wrote:
> > > > On Mon, Mar 30, 2015 at 10:07:37AM +0200, Jiri Olsa wrote:
> > > > 
> > > > SNIP
> > > > 
> > > > > > 
> > > > > > 2 things:
> > > > > > 1. let run for a long time. go about using the server. do lots of builds,
> > > > > > etc. it takes time
> > > > > > 
> > > > > > 2. use a box with a LOT of cpus (1024 in my case)
> > > > > > 
> > > > > > Make sure ulimit is set to get the core.
> > > > > 
> > > > > reproduced under 24 cpu box with kernel build (make -j25)
> > > > > running on background.. will try to look closer
> > > > > 
> > > > > perf: Segmentation fault
> > > > > -------- backtrace --------
> > > > > ./perf[0x4fd79b]
> > > > > /lib64/libc.so.6(+0x358f0)[0x7f9cbff528f0]
> > > > > ./perf(thread__put+0x5b)[0x4b1a7b]
> > > > > ./perf(hists__delete_entries+0x70)[0x4c8670]
> > > > > ./perf[0x436a88]
> > > > > ./perf[0x4fa73d]
> > > > > ./perf(perf_evlist__tui_browse_hists+0x97)[0x4fc437]
> > > > > ./perf[0x4381d0]
> > > > > /lib64/libpthread.so.0(+0x7ee5)[0x7f9cc1ff2ee5]
> > > > > /lib64/libc.so.6(clone+0x6d)[0x7f9cc0011b8d]
> > > > > [0x0]
> > > > 
> > > > looks like race among __machine__findnew_thread and thread__put
> > > > over the machine->threads rb_tree insert/removal
> > > > 
> > > > is there a reason why thread__put does not erase itself from machine->threads?
> > 
> > that was the reason.. we do this separately.. not in thread__put..
> > is there a reason for this? ;-)
> > 
> > testing attached patch..
> > 
> > jirka
> > 
> > 
> > ---
> > diff --git a/tools/perf/util/build-id.c b/tools/perf/util/build-id.c
> > index f7fb258..966564a 100644
> > --- a/tools/perf/util/build-id.c
> > +++ b/tools/perf/util/build-id.c
> > @@ -60,7 +60,6 @@ static int perf_event__exit_del_thread(struct perf_tool *tool __maybe_unused,
> >  		    event->fork.ppid, event->fork.ptid);
> >  
> >  	if (thread) {
> > -		rb_erase(&thread->rb_node, &machine->threads);
> >  		if (machine->last_match == thread)
> >  			thread__zput(machine->last_match);
> >  		thread__put(thread);
> > diff --git a/tools/perf/util/machine.c b/tools/perf/util/machine.c
> > index e335330..a8443ef 100644
> > --- a/tools/perf/util/machine.c
> > +++ b/tools/perf/util/machine.c
> > @@ -30,6 +30,7 @@ int machine__init(struct machine *machine, const char *root_dir, pid_t pid)
> >  	dsos__init(&machine->kernel_dsos);
> >  
> >  	machine->threads = RB_ROOT;
> > +	pthread_mutex_init(&machine->threads_lock, NULL);
> >  	INIT_LIST_HEAD(&machine->dead_threads);
> >  	machine->last_match = NULL;
> >  
> > @@ -380,10 +381,13 @@ static struct thread *__machine__findnew_thread(struct machine *machine,
> >  	if (!create)
> >  		return NULL;
> >  
> > -	th = thread__new(pid, tid);
> > +	th = thread__new(machine, pid, tid);
> >  	if (th != NULL) {
> > +
> > +		pthread_mutex_lock(&machine->threads_lock);
> >  		rb_link_node(&th->rb_node, parent, p);
> >  		rb_insert_color(&th->rb_node, &machine->threads);
> > +		pthread_mutex_unlock(&machine->threads_lock);
> 
> I think you also need to protect the rb tree traversal above.

yep, I already have another version.. but it blows on another place ;-)

> 
> But this makes every sample processing grabs and releases the lock so
> might cause high overhead.  It can be a problem if such processing is
> done parallelly like my multi-thread work. :-/

yep.. perhaps instead of more locking we need to find a way where
only single thread do the update on hists/threads

jirka

^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: [BUG] segfault in perf-top -- thread refcnt
  2015-03-30 12:56               ` Jiri Olsa
@ 2015-03-30 13:06                 ` Namhyung Kim
  2015-03-30 14:02                   ` Arnaldo Carvalho de Melo
  2015-03-30 13:07                 ` Arnaldo Carvalho de Melo
  1 sibling, 1 reply; 24+ messages in thread
From: Namhyung Kim @ 2015-03-30 13:06 UTC (permalink / raw)
  To: Jiri Olsa
  Cc: David Ahern, Arnaldo Carvalho de Melo, Jiri Olsa, Stephane Eranian, LKML

On Mon, Mar 30, 2015 at 9:56 PM, Jiri Olsa <jolsa@redhat.com> wrote:
> On Mon, Mar 30, 2015 at 09:48:52PM +0900, Namhyung Kim wrote:
>> Hi Jiri,
>>
>> On Mon, Mar 30, 2015 at 01:49:07PM +0200, Jiri Olsa wrote:
>> > On Mon, Mar 30, 2015 at 01:21:08PM +0200, Jiri Olsa wrote:
>> > > On Mon, Mar 30, 2015 at 12:22:20PM +0200, Jiri Olsa wrote:
>> > > > On Mon, Mar 30, 2015 at 10:07:37AM +0200, Jiri Olsa wrote:
>> > > >
>> > > > SNIP
>> > > >
>> > > > > >
>> > > > > > 2 things:
>> > > > > > 1. let run for a long time. go about using the server. do lots of builds,
>> > > > > > etc. it takes time
>> > > > > >
>> > > > > > 2. use a box with a LOT of cpus (1024 in my case)
>> > > > > >
>> > > > > > Make sure ulimit is set to get the core.
>> > > > >
>> > > > > reproduced under 24 cpu box with kernel build (make -j25)
>> > > > > running on background.. will try to look closer
>> > > > >
>> > > > > perf: Segmentation fault
>> > > > > -------- backtrace --------
>> > > > > ./perf[0x4fd79b]
>> > > > > /lib64/libc.so.6(+0x358f0)[0x7f9cbff528f0]
>> > > > > ./perf(thread__put+0x5b)[0x4b1a7b]
>> > > > > ./perf(hists__delete_entries+0x70)[0x4c8670]
>> > > > > ./perf[0x436a88]
>> > > > > ./perf[0x4fa73d]
>> > > > > ./perf(perf_evlist__tui_browse_hists+0x97)[0x4fc437]
>> > > > > ./perf[0x4381d0]
>> > > > > /lib64/libpthread.so.0(+0x7ee5)[0x7f9cc1ff2ee5]
>> > > > > /lib64/libc.so.6(clone+0x6d)[0x7f9cc0011b8d]
>> > > > > [0x0]
>> > > >
>> > > > looks like race among __machine__findnew_thread and thread__put
>> > > > over the machine->threads rb_tree insert/removal
>> > > >
>> > > > is there a reason why thread__put does not erase itself from machine->threads?
>> >
>> > that was the reason.. we do this separately.. not in thread__put..
>> > is there a reason for this? ;-)
>> >
>> > testing attached patch..
>> >
>> > jirka
>> >
>> >
>> > ---
>> > diff --git a/tools/perf/util/build-id.c b/tools/perf/util/build-id.c
>> > index f7fb258..966564a 100644
>> > --- a/tools/perf/util/build-id.c
>> > +++ b/tools/perf/util/build-id.c
>> > @@ -60,7 +60,6 @@ static int perf_event__exit_del_thread(struct perf_tool *tool __maybe_unused,
>> >                 event->fork.ppid, event->fork.ptid);
>> >
>> >     if (thread) {
>> > -           rb_erase(&thread->rb_node, &machine->threads);
>> >             if (machine->last_match == thread)
>> >                     thread__zput(machine->last_match);
>> >             thread__put(thread);
>> > diff --git a/tools/perf/util/machine.c b/tools/perf/util/machine.c
>> > index e335330..a8443ef 100644
>> > --- a/tools/perf/util/machine.c
>> > +++ b/tools/perf/util/machine.c
>> > @@ -30,6 +30,7 @@ int machine__init(struct machine *machine, const char *root_dir, pid_t pid)
>> >     dsos__init(&machine->kernel_dsos);
>> >
>> >     machine->threads = RB_ROOT;
>> > +   pthread_mutex_init(&machine->threads_lock, NULL);
>> >     INIT_LIST_HEAD(&machine->dead_threads);
>> >     machine->last_match = NULL;
>> >
>> > @@ -380,10 +381,13 @@ static struct thread *__machine__findnew_thread(struct machine *machine,
>> >     if (!create)
>> >             return NULL;
>> >
>> > -   th = thread__new(pid, tid);
>> > +   th = thread__new(machine, pid, tid);
>> >     if (th != NULL) {
>> > +
>> > +           pthread_mutex_lock(&machine->threads_lock);
>> >             rb_link_node(&th->rb_node, parent, p);
>> >             rb_insert_color(&th->rb_node, &machine->threads);
>> > +           pthread_mutex_unlock(&machine->threads_lock);
>>
>> I think you also need to protect the rb tree traversal above.
>
> yep, I already have another version.. but it blows on another place ;-)
>
>>
>> But this makes every sample processing grabs and releases the lock so
>> might cause high overhead.  It can be a problem if such processing is
>> done parallelly like my multi-thread work. :-/
>
> yep.. perhaps instead of more locking we need to find a way where
> only single thread do the update on hists/threads

Agreed.

AFAIK the reason we do ref-counting is to cleanup dead/exited thread
for live session like perf top.  In that case we can somehow mark
to-be-deleted thread and kill it in a safe time/place..

Thanks,
Namhyung

^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: [BUG] segfault in perf-top -- thread refcnt
  2015-03-30 12:56               ` Jiri Olsa
  2015-03-30 13:06                 ` Namhyung Kim
@ 2015-03-30 13:07                 ` Arnaldo Carvalho de Melo
  2015-03-30 13:20                   ` Jiri Olsa
  1 sibling, 1 reply; 24+ messages in thread
From: Arnaldo Carvalho de Melo @ 2015-03-30 13:07 UTC (permalink / raw)
  To: Jiri Olsa; +Cc: Namhyung Kim, David Ahern, Jiri Olsa, Stephane Eranian, LKML

Em Mon, Mar 30, 2015 at 02:56:31PM +0200, Jiri Olsa escreveu:
> On Mon, Mar 30, 2015 at 09:48:52PM +0900, Namhyung Kim wrote:
> > Hi Jiri,
> > 
> > On Mon, Mar 30, 2015 at 01:49:07PM +0200, Jiri Olsa wrote:
> > > On Mon, Mar 30, 2015 at 01:21:08PM +0200, Jiri Olsa wrote:
> > > > On Mon, Mar 30, 2015 at 12:22:20PM +0200, Jiri Olsa wrote:
> > > > > On Mon, Mar 30, 2015 at 10:07:37AM +0200, Jiri Olsa wrote:
> > > -	th = thread__new(pid, tid);
> > > +	th = thread__new(machine, pid, tid);
> > >  	if (th != NULL) {
> > > +
> > > +		pthread_mutex_lock(&machine->threads_lock);
> > >  		rb_link_node(&th->rb_node, parent, p);
> > >  		rb_insert_color(&th->rb_node, &machine->threads);
> > > +		pthread_mutex_unlock(&machine->threads_lock);
> > 
> > I think you also need to protect the rb tree traversal above.
> 
> yep, I already have another version.. but it blows on another place ;-)

Well, why? The point of refcounting is that the structure will not go
away while we have it in the rbtree.

Or are you talking about two threads trying to insert entries in the
rbtree? Can you point where this can happen?
 
> > But this makes every sample processing grabs and releases the lock so
> > might cause high overhead.  It can be a problem if such processing is
> > done parallelly like my multi-thread work. :-/
 
> yep.. perhaps instead of more locking we need to find a way where
> only single thread do the update on hists/threads

But that should be really rare, no? The problem is to search in one
thread (the fast path) and inserting entries (slow path), no?

- Arnaldo

^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: [BUG] segfault in perf-top -- thread refcnt
  2015-03-30 10:22       ` Jiri Olsa
  2015-03-30 11:21         ` Jiri Olsa
@ 2015-03-30 13:09         ` Arnaldo Carvalho de Melo
  2015-03-30 13:17         ` Arnaldo Carvalho de Melo
  2 siblings, 0 replies; 24+ messages in thread
From: Arnaldo Carvalho de Melo @ 2015-03-30 13:09 UTC (permalink / raw)
  To: Jiri Olsa; +Cc: David Ahern, Jiri Olsa, Namhyung Kim, Stephane Eranian, LKML

Em Mon, Mar 30, 2015 at 12:22:20PM +0200, Jiri Olsa escreveu:
> On Mon, Mar 30, 2015 at 10:07:37AM +0200, Jiri Olsa wrote:
> 
> SNIP
> 
> > > 
> > > 2 things:
> > > 1. let run for a long time. go about using the server. do lots of builds,
> > > etc. it takes time
> > > 
> > > 2. use a box with a LOT of cpus (1024 in my case)
> > > 
> > > Make sure ulimit is set to get the core.
> > 
> > reproduced under 24 cpu box with kernel build (make -j25)
> > running on background.. will try to look closer
> > 
> > perf: Segmentation fault
> > -------- backtrace --------
> > ./perf[0x4fd79b]
> > /lib64/libc.so.6(+0x358f0)[0x7f9cbff528f0]
> > ./perf(thread__put+0x5b)[0x4b1a7b]
> > ./perf(hists__delete_entries+0x70)[0x4c8670]
> > ./perf[0x436a88]
> > ./perf[0x4fa73d]
> > ./perf(perf_evlist__tui_browse_hists+0x97)[0x4fc437]
> > ./perf[0x4381d0]
> > /lib64/libpthread.so.0(+0x7ee5)[0x7f9cc1ff2ee5]
> > /lib64/libc.so.6(clone+0x6d)[0x7f9cc0011b8d]
> > [0x0]
> 
> looks like race among __machine__findnew_thread and thread__put
> over the machine->threads rb_tree insert/removal
> 
> is there a reason why thread__put does not erase itself from machine->threads?

Thread put should only remove stuff from the rbtree when the refcount
hits zero, and then, it should grab a lock, that is the buggy part, as
this lock isn't there, and remove the entry from the rbtree. Now I see
what Namhyung was talking about...

- Arnaldo
 
> I'm trying attached patch.. so far so gut ;-)
> 
> jirka
> 
> 
> ---
> diff --git a/tools/perf/util/machine.c b/tools/perf/util/machine.c
> index e335330..7e6abc7 100644
> --- a/tools/perf/util/machine.c
> +++ b/tools/perf/util/machine.c
> @@ -30,6 +30,7 @@ int machine__init(struct machine *machine, const char *root_dir, pid_t pid)
>  	dsos__init(&machine->kernel_dsos);
>  
>  	machine->threads = RB_ROOT;
> +	pthread_mutex_init(&machine->threads_lock, NULL);
>  	INIT_LIST_HEAD(&machine->dead_threads);
>  	machine->last_match = NULL;
>  
> @@ -380,10 +381,13 @@ static struct thread *__machine__findnew_thread(struct machine *machine,
>  	if (!create)
>  		return NULL;
>  
> -	th = thread__new(pid, tid);
> +	th = thread__new(machine, pid, tid);
>  	if (th != NULL) {
> +
> +		pthread_mutex_lock(&machine->threads_lock);
>  		rb_link_node(&th->rb_node, parent, p);
>  		rb_insert_color(&th->rb_node, &machine->threads);
> +		pthread_mutex_unlock(&machine->threads_lock);
>  
>  		/*
>  		 * We have to initialize map_groups separately
> @@ -394,8 +398,10 @@ static struct thread *__machine__findnew_thread(struct machine *machine,
>  		 * leader and that would screwed the rb tree.
>  		 */
>  		if (thread__init_map_groups(th, machine)) {
> +			pthread_mutex_lock(&machine->threads_lock);
>  			rb_erase(&th->rb_node, &machine->threads);
>  			thread__delete(th);
> +			pthread_mutex_unlock(&machine->threads_lock);
>  			return NULL;
>  		}
>  		/*
> diff --git a/tools/perf/util/machine.h b/tools/perf/util/machine.h
> index e2faf3b..e3468d6 100644
> --- a/tools/perf/util/machine.h
> +++ b/tools/perf/util/machine.h
> @@ -30,6 +30,7 @@ struct machine {
>  	bool		  comm_exec;
>  	char		  *root_dir;
>  	struct rb_root	  threads;
> +	pthread_mutex_t	  threads_lock;
>  	struct list_head  dead_threads;
>  	struct thread	  *last_match;
>  	struct vdso_info  *vdso_info;
> diff --git a/tools/perf/util/thread.c b/tools/perf/util/thread.c
> index 1c8fbc9..4592fc4 100644
> --- a/tools/perf/util/thread.c
> +++ b/tools/perf/util/thread.c
> @@ -26,7 +26,7 @@ int thread__init_map_groups(struct thread *thread, struct machine *machine)
>  	return thread->mg ? 0 : -1;
>  }
>  
> -struct thread *thread__new(pid_t pid, pid_t tid)
> +struct thread *thread__new(struct machine *machine, pid_t pid, pid_t tid)
>  {
>  	char *comm_str;
>  	struct comm *comm;
> @@ -38,6 +38,7 @@ struct thread *thread__new(pid_t pid, pid_t tid)
>  		thread->ppid = -1;
>  		thread->cpu = -1;
>  		INIT_LIST_HEAD(&thread->comm_list);
> +		thread->machine = machine;
>  
>  		if (unwind__prepare_access(thread) < 0)
>  			goto err_thread;
> @@ -91,7 +92,14 @@ struct thread *thread__get(struct thread *thread)
>  void thread__put(struct thread *thread)
>  {
>  	if (thread && --thread->refcnt == 0) {
> +		struct machine *machine = thread->machine;
> +
>  		list_del_init(&thread->node);
> +
> +		pthread_mutex_lock(&machine->threads_lock);
> +		rb_erase(&thread->rb_node, &machine->threads);
> +		pthread_mutex_unlock(&machine->threads_lock);
> +
>  		thread__delete(thread);
>  	}
>  }
> diff --git a/tools/perf/util/thread.h b/tools/perf/util/thread.h
> index 9b8a54d..df6fb69 100644
> --- a/tools/perf/util/thread.h
> +++ b/tools/perf/util/thread.h
> @@ -31,12 +31,13 @@ struct thread {
>  
>  	void			*priv;
>  	struct thread_stack	*ts;
> +	struct machine		*machine;
>  };
>  
>  struct machine;
>  struct comm;
>  
> -struct thread *thread__new(pid_t pid, pid_t tid);
> +struct thread *thread__new(struct machine *machine, pid_t pid, pid_t tid);
>  int thread__init_map_groups(struct thread *thread, struct machine *machine);
>  void thread__delete(struct thread *thread);
>  

^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: [BUG] segfault in perf-top -- thread refcnt
  2015-03-30 10:22       ` Jiri Olsa
  2015-03-30 11:21         ` Jiri Olsa
  2015-03-30 13:09         ` Arnaldo Carvalho de Melo
@ 2015-03-30 13:17         ` Arnaldo Carvalho de Melo
  2 siblings, 0 replies; 24+ messages in thread
From: Arnaldo Carvalho de Melo @ 2015-03-30 13:17 UTC (permalink / raw)
  To: Jiri Olsa; +Cc: David Ahern, Jiri Olsa, Namhyung Kim, Stephane Eranian, LKML

Em Mon, Mar 30, 2015 at 12:22:20PM +0200, Jiri Olsa escreveu:
> On Mon, Mar 30, 2015 at 10:07:37AM +0200, Jiri Olsa wrote:
> looks like race among __machine__findnew_thread and thread__put
> over the machine->threads rb_tree insert/removal
> 
> is there a reason why thread__put does not erase itself from machine->threads?

IIRC it has to first pass thru:

machine__remove_thread() that will remove the reference count obtained
when inserting the thread on the rbtree and move it to the dead_threads
list, where it will stay until the last reference count obtained when
adding the thread to some hist_entry is finally dropped via
thread__put().

- Arnaldo

^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: [BUG] segfault in perf-top -- thread refcnt
  2015-03-30 13:07                 ` Arnaldo Carvalho de Melo
@ 2015-03-30 13:20                   ` Jiri Olsa
  2015-03-30 13:59                     ` Arnaldo Carvalho de Melo
  0 siblings, 1 reply; 24+ messages in thread
From: Jiri Olsa @ 2015-03-30 13:20 UTC (permalink / raw)
  To: Arnaldo Carvalho de Melo
  Cc: Namhyung Kim, David Ahern, Jiri Olsa, Stephane Eranian, LKML

On Mon, Mar 30, 2015 at 10:07:08AM -0300, Arnaldo Carvalho de Melo wrote:
> Em Mon, Mar 30, 2015 at 02:56:31PM +0200, Jiri Olsa escreveu:
> > On Mon, Mar 30, 2015 at 09:48:52PM +0900, Namhyung Kim wrote:
> > > Hi Jiri,
> > > 
> > > On Mon, Mar 30, 2015 at 01:49:07PM +0200, Jiri Olsa wrote:
> > > > On Mon, Mar 30, 2015 at 01:21:08PM +0200, Jiri Olsa wrote:
> > > > > On Mon, Mar 30, 2015 at 12:22:20PM +0200, Jiri Olsa wrote:
> > > > > > On Mon, Mar 30, 2015 at 10:07:37AM +0200, Jiri Olsa wrote:
> > > > -	th = thread__new(pid, tid);
> > > > +	th = thread__new(machine, pid, tid);
> > > >  	if (th != NULL) {
> > > > +
> > > > +		pthread_mutex_lock(&machine->threads_lock);
> > > >  		rb_link_node(&th->rb_node, parent, p);
> > > >  		rb_insert_color(&th->rb_node, &machine->threads);
> > > > +		pthread_mutex_unlock(&machine->threads_lock);
> > > 
> > > I think you also need to protect the rb tree traversal above.
> > 
> > yep, I already have another version.. but it blows on another place ;-)
> 
> Well, why? The point of refcounting is that the structure will not go
> away while we have it in the rbtree.
> 
> Or are you talking about two threads trying to insert entries in the
> rbtree? Can you point where this can happen?

yep, as I wrote in previous email:

> looks like race among __machine__findnew_thread and thread__put
> over the machine->threads rb_tree insert/removal

update thread:
  perf_event__process_sample
    perf_event__preprocess_sample
      machine__findnew_thread
        traverse machine->threads

display thread:
   perf_top__sort_new_samples
      hists__delete_entries
        ...
        hist_entry__delete(struct hist_entry *he)
          thread__zput(he->thread);

>  
> > > But this makes every sample processing grabs and releases the lock so
> > > might cause high overhead.  It can be a problem if such processing is
> > > done parallelly like my multi-thread work. :-/
>  
> > yep.. perhaps instead of more locking we need to find a way where
> > only single thread do the update on hists/threads
> 
> But that should be really rare, no? The problem is to search in one
> thread (the fast path) and inserting entries (slow path), no?

there're many new threads on kernel make -j25 workload ;-)

jirka

^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: [BUG] segfault in perf-top -- thread refcnt
  2015-03-30 11:49           ` Jiri Olsa
  2015-03-30 12:48             ` Namhyung Kim
@ 2015-03-30 13:22             ` Arnaldo Carvalho de Melo
  1 sibling, 0 replies; 24+ messages in thread
From: Arnaldo Carvalho de Melo @ 2015-03-30 13:22 UTC (permalink / raw)
  To: Jiri Olsa; +Cc: David Ahern, Jiri Olsa, Namhyung Kim, Stephane Eranian, LKML

Em Mon, Mar 30, 2015 at 01:49:07PM +0200, Jiri Olsa escreveu:
> On Mon, Mar 30, 2015 at 01:21:08PM +0200, Jiri Olsa wrote:
> > > looks like race among __machine__findnew_thread and thread__put
> > > over the machine->threads rb_tree insert/removal

> > > is there a reason why thread__put does not erase itself from machine->threads?
 
> that was the reason.. we do this separately.. not in thread__put..
> is there a reason for this? ;-)
 
> testing attached patch..
 
>  void thread__put(struct thread *thread)
>  {
>  	if (thread && --thread->refcnt == 0) {
> +		struct machine *machine = thread->machine;
> +
>  		list_del_init(&thread->node);
> +
> +		pthread_mutex_lock(&machine->threads_lock);
> +		rb_erase(&thread->rb_node, &machine->threads);
> +		pthread_mutex_unlock(&machine->threads_lock);
> +
>  		thread__delete(thread);

You can't do that, look at the definition of struct thread:

struct thread {
        union {
                struct rb_node   rb_node;
                struct list_head node;
        };

I.e. a thread can't be at the same time in the rbtree (live threads) and
on the list (dead threads).

When the thread dies (PERF_RECORD_EXIT) we transition it from the live
threads rb_tree to the dead_threads list, where it will wait till the
last hist_entry holding a reference to it to call thread__put(), when we
will want to remove it _only_ from the dead_threads _list_.

- Arnaldo

^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: [BUG] segfault in perf-top -- thread refcnt
  2015-03-30 13:20                   ` Jiri Olsa
@ 2015-03-30 13:59                     ` Arnaldo Carvalho de Melo
  0 siblings, 0 replies; 24+ messages in thread
From: Arnaldo Carvalho de Melo @ 2015-03-30 13:59 UTC (permalink / raw)
  To: Jiri Olsa; +Cc: Namhyung Kim, David Ahern, Jiri Olsa, Stephane Eranian, LKML

Em Mon, Mar 30, 2015 at 03:20:59PM +0200, Jiri Olsa escreveu:
> On Mon, Mar 30, 2015 at 10:07:08AM -0300, Arnaldo Carvalho de Melo wrote:
> > Em Mon, Mar 30, 2015 at 02:56:31PM +0200, Jiri Olsa escreveu:
> > > On Mon, Mar 30, 2015 at 09:48:52PM +0900, Namhyung Kim wrote:
> > > > Hi Jiri,
> > > > 
> > > > On Mon, Mar 30, 2015 at 01:49:07PM +0200, Jiri Olsa wrote:
> > > > > On Mon, Mar 30, 2015 at 01:21:08PM +0200, Jiri Olsa wrote:
> > > > > > On Mon, Mar 30, 2015 at 12:22:20PM +0200, Jiri Olsa wrote:
> > > > > > > On Mon, Mar 30, 2015 at 10:07:37AM +0200, Jiri Olsa wrote:
> > > > > -	th = thread__new(pid, tid);
> > > > > +	th = thread__new(machine, pid, tid);
> > > > >  	if (th != NULL) {
> > > > > +
> > > > > +		pthread_mutex_lock(&machine->threads_lock);
> > > > >  		rb_link_node(&th->rb_node, parent, p);
> > > > >  		rb_insert_color(&th->rb_node, &machine->threads);
> > > > > +		pthread_mutex_unlock(&machine->threads_lock);
> > > > 
> > > > I think you also need to protect the rb tree traversal above.
> > > 
> > > yep, I already have another version.. but it blows on another place ;-)
> > 
> > Well, why? The point of refcounting is that the structure will not go
> > away while we have it in the rbtree.
> > 
> > Or are you talking about two threads trying to insert entries in the
> > rbtree? Can you point where this can happen?
> 
> yep, as I wrote in previous email:
> 
> > looks like race among __machine__findnew_thread and thread__put
> > over the machine->threads rb_tree insert/removal
> 
> update thread:
>   perf_event__process_sample
>     perf_event__preprocess_sample
>       machine__findnew_thread
>         traverse machine->threads
> 
> display thread:
>    perf_top__sort_new_samples
>       hists__delete_entries
>         ...
>         hist_entry__delete(struct hist_entry *he)
>           thread__zput(he->thread);
> 
> >  
> > > > But this makes every sample processing grabs and releases the lock so
> > > > might cause high overhead.  It can be a problem if such processing is
> > > > done parallelly like my multi-thread work. :-/
> >  
> > > yep.. perhaps instead of more locking we need to find a way where
> > > only single thread do the update on hists/threads
> > 
> > But that should be really rare, no? The problem is to search in one
> > thread (the fast path) and inserting entries (slow path), no?
> 
> there're many new threads on kernel make -j25 workload ;-)

yeah, dumb me, hey, its still early here in .br ;-)

- Arnaldo

^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: [BUG] segfault in perf-top -- thread refcnt
  2015-03-30 13:06                 ` Namhyung Kim
@ 2015-03-30 14:02                   ` Arnaldo Carvalho de Melo
  2015-03-31  0:15                     ` Namhyung Kim
  0 siblings, 1 reply; 24+ messages in thread
From: Arnaldo Carvalho de Melo @ 2015-03-30 14:02 UTC (permalink / raw)
  To: Namhyung Kim; +Cc: Jiri Olsa, David Ahern, Jiri Olsa, Stephane Eranian, LKML

Em Mon, Mar 30, 2015 at 10:06:35PM +0900, Namhyung Kim escreveu:
> On Mon, Mar 30, 2015 at 9:56 PM, Jiri Olsa <jolsa@redhat.com> wrote:
> > On Mon, Mar 30, 2015 at 09:48:52PM +0900, Namhyung Kim wrote:
> >> > @@ -380,10 +381,13 @@ static struct thread *__machine__findnew_thread(struct machine *machine,
> >> >     if (!create)
> >> >             return NULL;
> >> >
> >> > -   th = thread__new(pid, tid);
> >> > +   th = thread__new(machine, pid, tid);
> >> >     if (th != NULL) {
> >> > +
> >> > +           pthread_mutex_lock(&machine->threads_lock);
> >> >             rb_link_node(&th->rb_node, parent, p);
> >> >             rb_insert_color(&th->rb_node, &machine->threads);
> >> > +           pthread_mutex_unlock(&machine->threads_lock);
> >>
> >> I think you also need to protect the rb tree traversal above.
> >
> > yep, I already have another version.. but it blows on another place ;-)
> >
> >>
> >> But this makes every sample processing grabs and releases the lock so
> >> might cause high overhead.  It can be a problem if such processing is
> >> done parallelly like my multi-thread work. :-/
> >
> > yep.. perhaps instead of more locking we need to find a way where
> > only single thread do the update on hists/threads
> 
> Agreed.
> 
> AFAIK the reason we do ref-counting is to cleanup dead/exited thread
> for live session like perf top.  In that case we can somehow mark
> to-be-deleted thread and kill it in a safe time/place..

Humm, you mean have another list node in struct threads and add threads
to another dead_threads like list, i.e. one that is _really_ dead as no
more refcounts point to it, and then amortize the costs of removing it
from the rb_tree by removing multiple threads instead of just one?

- Arnaldo

^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: [BUG] segfault in perf-top -- thread refcnt
  2015-03-30 12:48             ` Namhyung Kim
  2015-03-30 12:56               ` Jiri Olsa
@ 2015-03-30 14:58               ` Arnaldo Carvalho de Melo
  2015-03-30 15:13                 ` Arnaldo Carvalho de Melo
  1 sibling, 1 reply; 24+ messages in thread
From: Arnaldo Carvalho de Melo @ 2015-03-30 14:58 UTC (permalink / raw)
  To: Namhyung Kim; +Cc: Jiri Olsa, David Ahern, Jiri Olsa, Stephane Eranian, LKML

Em Mon, Mar 30, 2015 at 09:48:52PM +0900, Namhyung Kim escreveu:
> Hi Jiri,
> 
> On Mon, Mar 30, 2015 at 01:49:07PM +0200, Jiri Olsa wrote:
> > On Mon, Mar 30, 2015 at 01:21:08PM +0200, Jiri Olsa wrote:
> > > On Mon, Mar 30, 2015 at 12:22:20PM +0200, Jiri Olsa wrote:
> > > > On Mon, Mar 30, 2015 at 10:07:37AM +0200, Jiri Olsa wrote:
> > > > 
> > > > SNIP
> > > > 
> > > > > > 
> > > > > > 2 things:
> > > > > > 1. let run for a long time. go about using the server. do lots of builds,
> > > > > > etc. it takes time
> > > > > > 
> > > > > > 2. use a box with a LOT of cpus (1024 in my case)
> > > > > > 
> > > > > > Make sure ulimit is set to get the core.
> > > > > 
> > > > > reproduced under 24 cpu box with kernel build (make -j25)
> > > > > running on background.. will try to look closer
> > > > > 
> > > > > perf: Segmentation fault
> > > > > -------- backtrace --------
> > > > > ./perf[0x4fd79b]
> > > > > /lib64/libc.so.6(+0x358f0)[0x7f9cbff528f0]
> > > > > ./perf(thread__put+0x5b)[0x4b1a7b]
> > > > > ./perf(hists__delete_entries+0x70)[0x4c8670]
> > > > > ./perf[0x436a88]
> > > > > ./perf[0x4fa73d]
> > > > > ./perf(perf_evlist__tui_browse_hists+0x97)[0x4fc437]
> > > > > ./perf[0x4381d0]
> > > > > /lib64/libpthread.so.0(+0x7ee5)[0x7f9cc1ff2ee5]
> > > > > /lib64/libc.so.6(clone+0x6d)[0x7f9cc0011b8d]
> > > > > [0x0]
> > > > 
> > > > looks like race among __machine__findnew_thread and thread__put
> > > > over the machine->threads rb_tree insert/removal
> > > > 
> > > > is there a reason why thread__put does not erase itself from machine->threads?
> > 
> > that was the reason.. we do this separately.. not in thread__put..
> > is there a reason for this? ;-)
> > 
> > testing attached patch..
> > 
> > jirka
> > 
> > 
> > ---
> > diff --git a/tools/perf/util/build-id.c b/tools/perf/util/build-id.c
> > index f7fb258..966564a 100644
> > --- a/tools/perf/util/build-id.c
> > +++ b/tools/perf/util/build-id.c
> > @@ -60,7 +60,6 @@ static int perf_event__exit_del_thread(struct perf_tool *tool __maybe_unused,
> >  		    event->fork.ppid, event->fork.ptid);
> >  
> >  	if (thread) {
> > -		rb_erase(&thread->rb_node, &machine->threads);
> >  		if (machine->last_match == thread)
> >  			thread__zput(machine->last_match);
> >  		thread__put(thread);
> > diff --git a/tools/perf/util/machine.c b/tools/perf/util/machine.c
> > index e335330..a8443ef 100644
> > --- a/tools/perf/util/machine.c
> > +++ b/tools/perf/util/machine.c
> > @@ -30,6 +30,7 @@ int machine__init(struct machine *machine, const char *root_dir, pid_t pid)
> >  	dsos__init(&machine->kernel_dsos);
> >  
> >  	machine->threads = RB_ROOT;
> > +	pthread_mutex_init(&machine->threads_lock, NULL);
> >  	INIT_LIST_HEAD(&machine->dead_threads);
> >  	machine->last_match = NULL;
> >  
> > @@ -380,10 +381,13 @@ static struct thread *__machine__findnew_thread(struct machine *machine,
> >  	if (!create)
> >  		return NULL;
> >  
> > -	th = thread__new(pid, tid);
> > +	th = thread__new(machine, pid, tid);
> >  	if (th != NULL) {
> > +
> > +		pthread_mutex_lock(&machine->threads_lock);
> >  		rb_link_node(&th->rb_node, parent, p);
> >  		rb_insert_color(&th->rb_node, &machine->threads);
> > +		pthread_mutex_unlock(&machine->threads_lock);
> 
> I think you also need to protect the rb tree traversal above.
> 
> But this makes every sample processing grabs and releases the lock so
> might cause high overhead.  It can be a problem if such processing is
> done parallelly like my multi-thread work. :-/

Still untested, using rw lock, next step is auditing the
machine__findnew_thread users that really should be using
machine__find_thread, i.e. grabbing just the reader lock, and measuring
the overhead of using a pthread rw lock instead of pthread_mutex_t as
Jiri is doing.

- Arnaldo

diff --git a/tools/perf/util/build-id.c b/tools/perf/util/build-id.c
index f7fb2587df69..e3c80bab47a3 100644
--- a/tools/perf/util/build-id.c
+++ b/tools/perf/util/build-id.c
@@ -60,7 +60,9 @@ static int perf_event__exit_del_thread(struct perf_tool *tool __maybe_unused,
 		    event->fork.ppid, event->fork.ptid);
 
 	if (thread) {
+		pthread_rwlock_wrlock(&machine->threads_lock);
 		rb_erase(&thread->rb_node, &machine->threads);
+		pthread_rwlock_unlock(&machine->threads_lock);
 		if (machine->last_match == thread)
 			thread__zput(machine->last_match);
 		thread__put(thread);
diff --git a/tools/perf/util/machine.c b/tools/perf/util/machine.c
index e45c8f33a8fd..b901ed27a793 100644
--- a/tools/perf/util/machine.c
+++ b/tools/perf/util/machine.c
@@ -30,6 +30,7 @@ int machine__init(struct machine *machine, const char *root_dir, pid_t pid)
 	dsos__init(&machine->kernel_dsos);
 
 	machine->threads = RB_ROOT;
+	pthread_rwlock_init(&machine->threads_lock, NULL);
 	INIT_LIST_HEAD(&machine->dead_threads);
 	machine->last_match = NULL;
 
@@ -111,6 +112,7 @@ void machine__exit(struct machine *machine)
 	vdso__exit(machine);
 	zfree(&machine->root_dir);
 	zfree(&machine->current_tid);
+	pthread_rwlock_destroy(&machines->threads_lock);
 }
 
 void machine__delete(struct machine *machine)
@@ -411,13 +413,22 @@ static struct thread *__machine__findnew_thread(struct machine *machine,
 struct thread *machine__findnew_thread(struct machine *machine, pid_t pid,
 				       pid_t tid)
 {
-	return __machine__findnew_thread(machine, pid, tid, true);
+	struct thread *th;
+
+	pthread_rwlock_wrlock(&machine->threads_lock);
+	th = __machine__findnew_thread(machine, pid, tid, true);
+	pthread_rwlock_unlock(&machine->threads_lock);
+	return th;
 }
 
 struct thread *machine__find_thread(struct machine *machine, pid_t pid,
 				    pid_t tid)
 {
-	return __machine__findnew_thread(machine, pid, tid, false);
+	struct thread *th;
+	pthread_rwlock_rdlock(&machine->threads_lock);
+	th =  __machine__findnew_thread(machine, pid, tid, false);
+	pthread_rwlock_unlock(&machine->threads_lock);
+	return th;
 }
 
 struct comm *machine__thread_exec_comm(struct machine *machine,
@@ -1258,7 +1269,9 @@ static void machine__remove_thread(struct machine *machine, struct thread *th)
 	if (machine->last_match == th)
 		thread__zput(machine->last_match);
 
+	pthread_rwlock_wrlock(&machine->threads_lock);
 	rb_erase(&th->rb_node, &machine->threads);
+	pthread_rwlock_unlock(&machine->threads_lock);
 	/*
 	 * Move it first to the dead_threads list, then drop the reference,
 	 * if this is the last reference, then the thread__delete destructor
diff --git a/tools/perf/util/machine.h b/tools/perf/util/machine.h
index e2faf3b47e7b..c2b9402921fc 100644
--- a/tools/perf/util/machine.h
+++ b/tools/perf/util/machine.h
@@ -30,6 +30,7 @@ struct machine {
 	bool		  comm_exec;
 	char		  *root_dir;
 	struct rb_root	  threads;
+	pthread_rwlock_t  threads_lock;
 	struct list_head  dead_threads;
 	struct thread	  *last_match;
 	struct vdso_info  *vdso_info;

^ permalink raw reply related	[flat|nested] 24+ messages in thread

* Re: [BUG] segfault in perf-top -- thread refcnt
  2015-03-30 14:58               ` Arnaldo Carvalho de Melo
@ 2015-03-30 15:13                 ` Arnaldo Carvalho de Melo
  2015-03-31  0:27                   ` Namhyung Kim
  0 siblings, 1 reply; 24+ messages in thread
From: Arnaldo Carvalho de Melo @ 2015-03-30 15:13 UTC (permalink / raw)
  To: Namhyung Kim; +Cc: Jiri Olsa, David Ahern, Jiri Olsa, Stephane Eranian, LKML

Em Mon, Mar 30, 2015 at 11:58:05AM -0300, Arnaldo Carvalho de Melo escreveu:
> Em Mon, Mar 30, 2015 at 09:48:52PM +0900, Namhyung Kim escreveu:
> > But this makes every sample processing grabs and releases the lock so
> > might cause high overhead.  It can be a problem if such processing is
> > done parallelly like my multi-thread work. :-/
> 
> Still untested, using rw lock, next step is auditing the
> machine__findnew_thread users that really should be using
> machine__find_thread, i.e. grabbing just the reader lock, and measuring
> the overhead of using a pthread rw lock instead of pthread_mutex_t as
> Jiri is doing.

Don't bother trying it, doesn't even compile ;-\

As I said, untested, more as a heads up, will fix it up properly and
test after lunch.

- Arnaldo

^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: [BUG] segfault in perf-top -- thread refcnt
  2015-03-30 14:02                   ` Arnaldo Carvalho de Melo
@ 2015-03-31  0:15                     ` Namhyung Kim
  0 siblings, 0 replies; 24+ messages in thread
From: Namhyung Kim @ 2015-03-31  0:15 UTC (permalink / raw)
  To: Arnaldo Carvalho de Melo
  Cc: Jiri Olsa, David Ahern, Jiri Olsa, Stephane Eranian, LKML

Hi Arnaldo,

On Mon, Mar 30, 2015 at 11:02:39AM -0300, Arnaldo Carvalho de Melo wrote:
> Em Mon, Mar 30, 2015 at 10:06:35PM +0900, Namhyung Kim escreveu:
> > On Mon, Mar 30, 2015 at 9:56 PM, Jiri Olsa <jolsa@redhat.com> wrote:
> > > On Mon, Mar 30, 2015 at 09:48:52PM +0900, Namhyung Kim wrote:
> > >> > @@ -380,10 +381,13 @@ static struct thread *__machine__findnew_thread(struct machine *machine,
> > >> >     if (!create)
> > >> >             return NULL;
> > >> >
> > >> > -   th = thread__new(pid, tid);
> > >> > +   th = thread__new(machine, pid, tid);
> > >> >     if (th != NULL) {
> > >> > +
> > >> > +           pthread_mutex_lock(&machine->threads_lock);
> > >> >             rb_link_node(&th->rb_node, parent, p);
> > >> >             rb_insert_color(&th->rb_node, &machine->threads);
> > >> > +           pthread_mutex_unlock(&machine->threads_lock);
> > >>
> > >> I think you also need to protect the rb tree traversal above.
> > >
> > > yep, I already have another version.. but it blows on another place ;-)
> > >
> > >>
> > >> But this makes every sample processing grabs and releases the lock so
> > >> might cause high overhead.  It can be a problem if such processing is
> > >> done parallelly like my multi-thread work. :-/
> > >
> > > yep.. perhaps instead of more locking we need to find a way where
> > > only single thread do the update on hists/threads
> > 
> > Agreed.
> > 
> > AFAIK the reason we do ref-counting is to cleanup dead/exited thread
> > for live session like perf top.  In that case we can somehow mark
> > to-be-deleted thread and kill it in a safe time/place..
> 
> Humm, you mean have another list node in struct threads and add threads
> to another dead_threads like list, i.e. one that is _really_ dead as no
> more refcounts point to it, and then amortize the costs of removing it
> from the rb_tree by removing multiple threads instead of just one?

Yes, I really want to avoid any overhead on the fastpath.  Instead of
refcnt, how about marking and deleting dead threads by the perf top's
display thread unless it's not a selected one?

Thanks,
Namhyung

^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: [BUG] segfault in perf-top -- thread refcnt
  2015-03-30 15:13                 ` Arnaldo Carvalho de Melo
@ 2015-03-31  0:27                   ` Namhyung Kim
  2015-03-31  0:46                     ` Arnaldo Carvalho de Melo
  0 siblings, 1 reply; 24+ messages in thread
From: Namhyung Kim @ 2015-03-31  0:27 UTC (permalink / raw)
  To: Arnaldo Carvalho de Melo
  Cc: Jiri Olsa, David Ahern, Jiri Olsa, Stephane Eranian, LKML

On Mon, Mar 30, 2015 at 12:13:03PM -0300, Arnaldo Carvalho de Melo wrote:
> Em Mon, Mar 30, 2015 at 11:58:05AM -0300, Arnaldo Carvalho de Melo escreveu:
> > Em Mon, Mar 30, 2015 at 09:48:52PM +0900, Namhyung Kim escreveu:
> > > But this makes every sample processing grabs and releases the lock so
> > > might cause high overhead.  It can be a problem if such processing is
> > > done parallelly like my multi-thread work. :-/
> > 
> > Still untested, using rw lock, next step is auditing the
> > machine__findnew_thread users that really should be using
> > machine__find_thread, i.e. grabbing just the reader lock, and measuring
> > the overhead of using a pthread rw lock instead of pthread_mutex_t as
> > Jiri is doing.
> 
> Don't bother trying it, doesn't even compile ;-\

OK. :)

But I think rw lock still has not-so-low overhead as it involves
atomic operations and cache misses.

Thanks,
Namhyung

^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: [BUG] segfault in perf-top -- thread refcnt
  2015-03-31  0:27                   ` Namhyung Kim
@ 2015-03-31  0:46                     ` Arnaldo Carvalho de Melo
  2015-03-31  7:21                       ` Namhyung Kim
  0 siblings, 1 reply; 24+ messages in thread
From: Arnaldo Carvalho de Melo @ 2015-03-31  0:46 UTC (permalink / raw)
  To: Namhyung Kim; +Cc: Jiri Olsa, David Ahern, Jiri Olsa, Stephane Eranian, LKML

Em Tue, Mar 31, 2015 at 09:27:30AM +0900, Namhyung Kim escreveu:
> On Mon, Mar 30, 2015 at 12:13:03PM -0300, Arnaldo Carvalho de Melo wrote:
> > Em Mon, Mar 30, 2015 at 11:58:05AM -0300, Arnaldo Carvalho de Melo escreveu:
> > > Em Mon, Mar 30, 2015 at 09:48:52PM +0900, Namhyung Kim escreveu:
> > > > But this makes every sample processing grabs and releases the lock so
> > > > might cause high overhead.  It can be a problem if such processing is
> > > > done parallelly like my multi-thread work. :-/
> > > 
> > > Still untested, using rw lock, next step is auditing the
> > > machine__findnew_thread users that really should be using
> > > machine__find_thread, i.e. grabbing just the reader lock, and measuring
> > > the overhead of using a pthread rw lock instead of pthread_mutex_t as
> > > Jiri is doing.
> > 
> > Don't bother trying it, doesn't even compile ;-\
> 
> OK. :)
> 
> But I think rw lock still has not-so-low overhead as it involves
> atomic operations and cache misses.

But we will have to serialize access to the data structure at some
point...

- Arnaldo

^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: [BUG] segfault in perf-top -- thread refcnt
  2015-03-31  0:46                     ` Arnaldo Carvalho de Melo
@ 2015-03-31  7:21                       ` Namhyung Kim
  0 siblings, 0 replies; 24+ messages in thread
From: Namhyung Kim @ 2015-03-31  7:21 UTC (permalink / raw)
  To: Arnaldo Carvalho de Melo
  Cc: Jiri Olsa, David Ahern, Jiri Olsa, Stephane Eranian, LKML

On Mon, Mar 30, 2015 at 09:46:35PM -0300, Arnaldo Carvalho de Melo wrote:
> Em Tue, Mar 31, 2015 at 09:27:30AM +0900, Namhyung Kim escreveu:
> > On Mon, Mar 30, 2015 at 12:13:03PM -0300, Arnaldo Carvalho de Melo wrote:
> > > Em Mon, Mar 30, 2015 at 11:58:05AM -0300, Arnaldo Carvalho de Melo escreveu:
> > > > Em Mon, Mar 30, 2015 at 09:48:52PM +0900, Namhyung Kim escreveu:
> > > > > But this makes every sample processing grabs and releases the lock so
> > > > > might cause high overhead.  It can be a problem if such processing is
> > > > > done parallelly like my multi-thread work. :-/
> > > > 
> > > > Still untested, using rw lock, next step is auditing the
> > > > machine__findnew_thread users that really should be using
> > > > machine__find_thread, i.e. grabbing just the reader lock, and measuring
> > > > the overhead of using a pthread rw lock instead of pthread_mutex_t as
> > > > Jiri is doing.
> > > 
> > > Don't bother trying it, doesn't even compile ;-\
> > 
> > OK. :)
> > 
> > But I think rw lock still has not-so-low overhead as it involves
> > atomic operations and cache misses.
> 
> But we will have to serialize access to the data structure at some
> point...

Yes, as long as we keep ref-counting.

I'm guessing if we only focus on the perf top case, there might be a
way to cleanup dead threads without ref-counting (i.e. w/o affecting
fastpath on the perf report).

Thanks,
Namhyung

^ permalink raw reply	[flat|nested] 24+ messages in thread

end of thread, other threads:[~2015-03-31  7:27 UTC | newest]

Thread overview: 24+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2015-03-27 17:31 [BUG] segfault in perf-top -- thread refcnt David Ahern
2015-03-27 19:51 ` Arnaldo Carvalho de Melo
2015-03-27 20:11 ` Arnaldo Carvalho de Melo
2015-03-27 20:13   ` David Ahern
2015-03-30  8:07     ` Jiri Olsa
2015-03-30 10:22       ` Jiri Olsa
2015-03-30 11:21         ` Jiri Olsa
2015-03-30 11:49           ` Jiri Olsa
2015-03-30 12:48             ` Namhyung Kim
2015-03-30 12:56               ` Jiri Olsa
2015-03-30 13:06                 ` Namhyung Kim
2015-03-30 14:02                   ` Arnaldo Carvalho de Melo
2015-03-31  0:15                     ` Namhyung Kim
2015-03-30 13:07                 ` Arnaldo Carvalho de Melo
2015-03-30 13:20                   ` Jiri Olsa
2015-03-30 13:59                     ` Arnaldo Carvalho de Melo
2015-03-30 14:58               ` Arnaldo Carvalho de Melo
2015-03-30 15:13                 ` Arnaldo Carvalho de Melo
2015-03-31  0:27                   ` Namhyung Kim
2015-03-31  0:46                     ` Arnaldo Carvalho de Melo
2015-03-31  7:21                       ` Namhyung Kim
2015-03-30 13:22             ` Arnaldo Carvalho de Melo
2015-03-30 13:09         ` Arnaldo Carvalho de Melo
2015-03-30 13:17         ` Arnaldo Carvalho de Melo

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).