All of lore.kernel.org
 help / color / mirror / Atom feed
From: Jiri Olsa <jolsa@kernel.org>
To: Arnaldo Carvalho de Melo <acme@kernel.org>
Cc: lkml <linux-kernel@vger.kernel.org>,
	Ingo Molnar <mingo@kernel.org>,
	Namhyung Kim <namhyung@kernel.org>,
	David Ahern <dsahern@gmail.com>,
	Alexander Shishkin <alexander.shishkin@linux.intel.com>,
	Peter Zijlstra <a.p.zijlstra@chello.nl>,
	Kan Liang <kan.liang@intel.com>, Andi Kleen <ak@linux.intel.com>,
	Lukasz Odzioba <lukasz.odzioba@intel.com>,
	Wang Nan <wangnan0@huawei.com>
Subject: [PATCH 1/4] perf tools: Fix struct comm_str removal crash
Date: Thu, 12 Jul 2018 16:20:20 +0200	[thread overview]
Message-ID: <20180712142023.16915-2-jolsa@kernel.org> (raw)
In-Reply-To: <20180712142023.16915-1-jolsa@kernel.org>

We occasionaly hit following assert failure in perf top,
when processing the /proc info in multiple threads.

  perf: ...include/linux/refcount.h:109: refcount_inc:
        Assertion `!(!refcount_inc_not_zero(r))' failed.

The gdb backtrace looks like this:

  [Switching to Thread 0x7ffff11ba700 (LWP 13749)]
  0x00007ffff50839fb in raise () from /lib64/libc.so.6
  (gdb)
  #0  0x00007ffff50839fb in raise () from /lib64/libc.so.6
  #1  0x00007ffff5085800 in abort () from /lib64/libc.so.6
  #2  0x00007ffff507c0da in __assert_fail_base () from /lib64/libc.so.6
  #3  0x00007ffff507c152 in __assert_fail () from /lib64/libc.so.6
  #4  0x0000000000535373 in refcount_inc (r=0x7fffdc009be0)
      at ...include/linux/refcount.h:109
  #5  0x00000000005354f1 in comm_str__get (cs=0x7fffdc009bc0)
      at util/comm.c:24
  #6  0x00000000005356bd in __comm_str__findnew (str=0x7fffd000b260 ":2",
      root=0xbed5c0 <comm_str_root>) at util/comm.c:72
  #7  0x000000000053579e in comm_str__findnew (str=0x7fffd000b260 ":2",
      root=0xbed5c0 <comm_str_root>) at util/comm.c:95
  #8  0x000000000053582e in comm__new (str=0x7fffd000b260 ":2",
      timestamp=0, exec=false) at util/comm.c:111
  #9  0x00000000005363bc in thread__new (pid=2, tid=2) at util/thread.c:57
  #10 0x0000000000523da0 in ____machine__findnew_thread (machine=0xbfde38,
      threads=0xbfdf28, pid=2, tid=2, create=true) at util/machine.c:457
  #11 0x0000000000523eb4 in __machine__findnew_thread (machine=0xbfde38,
  ...

The failing assertion is this one:

  REFCOUNT_WARN(!refcount_inc_not_zero(r), ...

The problem is that we keep global comm_str_root list, which
is accessed by multiple threads during the perf top startup
and following 2 paths can race:

  thread 1:
    ...
    thread__new
      comm__new
        comm_str__findnew
          down_write(&comm_str_lock);
          __comm_str__findnew
            comm_str__get

  thread 2:
    ...
    comm__override or comm__free
      comm_str__put
        refcount_dec_and_test
          down_write(&comm_str_lock);
          rb_erase(&cs->rb_node, &comm_str_root);

Because thread 2 first decrements the refcnt and only after then it
removes the struct comm_str from the list, the thread 1 can find this
object on the list with refcnt equls to 0 and hit the assert.

This patch fixes the thread 2 path, by removing the struct comm_str
FIRST from the list and only AFTER calling comm_str__put on it. This
way the thread 1 finds only valid objects on the list.

We also need to ensure now, that only one caller removes the struct
comm_str, from the list. Adding 'removed' bool to track that.

Link: http://lkml.kernel.org/n/tip-vrizt6sw1lu1ybsrl9l0wwln@git.kernel.org
Signed-off-by: Jiri Olsa <jolsa@kernel.org>
---
 tools/perf/util/comm.c | 30 +++++++++++++++++++++++++-----
 1 file changed, 25 insertions(+), 5 deletions(-)

diff --git a/tools/perf/util/comm.c b/tools/perf/util/comm.c
index 7798a2cc8a86..7f1c6e63e3e6 100644
--- a/tools/perf/util/comm.c
+++ b/tools/perf/util/comm.c
@@ -12,6 +12,7 @@ struct comm_str {
 	char *str;
 	struct rb_node rb_node;
 	refcount_t refcnt;
+	bool removed;
 };
 
 /* Should perhaps be moved to struct machine */
@@ -28,9 +29,6 @@ static struct comm_str *comm_str__get(struct comm_str *cs)
 static void comm_str__put(struct comm_str *cs)
 {
 	if (cs && refcount_dec_and_test(&cs->refcnt)) {
-		down_write(&comm_str_lock);
-		rb_erase(&cs->rb_node, &comm_str_root);
-		up_write(&comm_str_lock);
 		zfree(&cs->str);
 		free(cs);
 	}
@@ -117,6 +115,28 @@ struct comm *comm__new(const char *str, u64 timestamp, bool exec)
 	return comm;
 }
 
+static void __comm_str__remove(struct comm_str *cs)
+{
+	down_write(&comm_str_lock);
+	if (!cs->removed) {
+		rb_erase(&cs->rb_node, &comm_str_root);
+		cs->removed = true;
+	}
+	up_write(&comm_str_lock);
+}
+
+static void comm_str__remove(struct comm_str *cs)
+{
+	if (!cs->removed)
+		__comm_str__remove(cs);
+}
+
+static void comm_str__exit(struct comm_str *cs)
+{
+	comm_str__remove(cs);
+	comm_str__put(cs);
+}
+
 int comm__override(struct comm *comm, const char *str, u64 timestamp, bool exec)
 {
 	struct comm_str *new, *old = comm->comm_str;
@@ -125,7 +145,7 @@ int comm__override(struct comm *comm, const char *str, u64 timestamp, bool exec)
 	if (!new)
 		return -ENOMEM;
 
-	comm_str__put(old);
+	comm_str__exit(old);
 	comm->comm_str = new;
 	comm->start = timestamp;
 	if (exec)
@@ -136,7 +156,7 @@ int comm__override(struct comm *comm, const char *str, u64 timestamp, bool exec)
 
 void comm__free(struct comm *comm)
 {
-	comm_str__put(comm->comm_str);
+	comm_str__exit(comm->comm_str);
 	free(comm);
 }
 
-- 
2.17.1


  reply	other threads:[~2018-07-12 14:20 UTC|newest]

Thread overview: 14+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2018-07-12 14:20 [PATCH 0/4] perf tools: Fix top crashes Jiri Olsa
2018-07-12 14:20 ` Jiri Olsa [this message]
2018-07-15 13:08   ` [PATCH 1/4] perf tools: Fix struct comm_str removal crash Namhyung Kim
2018-07-16 10:29     ` Jiri Olsa
2018-07-17  1:49       ` Namhyung Kim
2018-07-17  9:02         ` Jiri Olsa
2018-07-18 10:44           ` Jiri Olsa
2018-07-12 14:20 ` [PATCH 2/4] perf tools: Add threads__get_last_match function Jiri Olsa
2018-07-22  7:53   ` [lkp-robot] [perf tools] 600b7378cf: perf-sanity-tests.Share_thread_mg.fail kernel test robot
2018-07-22  7:53     ` kernel test robot
2018-07-23  6:59     ` Jiri Olsa
2018-07-23  6:59       ` Jiri Olsa
2018-07-12 14:20 ` [PATCH 3/4] perf tools: Add threads__set_last_match function Jiri Olsa
2018-07-12 14:20 ` [PATCH 4/4] perf tools: Use last_match threads cache only in single thread mode Jiri Olsa

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20180712142023.16915-2-jolsa@kernel.org \
    --to=jolsa@kernel.org \
    --cc=a.p.zijlstra@chello.nl \
    --cc=acme@kernel.org \
    --cc=ak@linux.intel.com \
    --cc=alexander.shishkin@linux.intel.com \
    --cc=dsahern@gmail.com \
    --cc=kan.liang@intel.com \
    --cc=linux-kernel@vger.kernel.org \
    --cc=lukasz.odzioba@intel.com \
    --cc=mingo@kernel.org \
    --cc=namhyung@kernel.org \
    --cc=wangnan0@huawei.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.