linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: Jiri Olsa <jolsa@kernel.org>
To: Arnaldo Carvalho de Melo <acme@kernel.org>
Cc: lkml <linux-kernel@vger.kernel.org>,
	Ingo Molnar <mingo@kernel.org>,
	Namhyung Kim <namhyung@kernel.org>,
	Alexander Shishkin <alexander.shishkin@linux.intel.com>,
	Peter Zijlstra <a.p.zijlstra@chello.nl>,
	Andi Kleen <andi@firstfloor.org>
Subject: [PATCH] perf: Prevent recursion in ring buffer
Date: Wed, 12 Sep 2018 21:33:17 +0200	[thread overview]
Message-ID: <20180912193317.10339-1-jolsa@kernel.org> (raw)

Some of the scheduling tracepoints allow the perf_tp_event
code to write to ring buffer under different cpu than the
code is running on.

This results in corrupted ring buffer data demonstrated in
following perf commands:

  # perf record -e 'sched:sched_switch,sched:sched_wakeup' perf bench sched messaging
  # Running 'sched/messaging' benchmark:
  # 20 sender and receiver processes per group
  # 10 groups == 400 processes run

       Total time: 0.383 [sec]
  [ perf record: Woken up 8 times to write data ]
  0x42b890 [0]: failed to process type: -1765585640
  [ perf record: Captured and wrote 4.825 MB perf.data (29669 samples) ]

  # perf report --stdio
  0x42b890 [0]: failed to process type: -1765585640

The reason for the corruptions are some of the scheduling tracepoints,
that have __perf_task dfined and thus allow to store data to another
cpu ring buffer:

  sched_waking
  sched_wakeup
  sched_wakeup_new
  sched_stat_wait
  sched_stat_sleep
  sched_stat_iowait
  sched_stat_blocked

The perf_tp_event function first store samples for current cpu
related events defined for tracepoint:

    hlist_for_each_entry_rcu(event, head, hlist_entry)
      perf_swevent_event(event, count, &data, regs);

And then iterates events of the 'task' and store the sample
for any task's event that passes tracepoint checks:

  ctx = rcu_dereference(task->perf_event_ctxp[perf_sw_context]);

  list_for_each_entry_rcu(event, &ctx->event_list, event_entry) {
    if (event->attr.type != PERF_TYPE_TRACEPOINT)
      continue;
    if (event->attr.config != entry->type)
      continue;

    perf_swevent_event(event, count, &data, regs);
  }

Above code can race with same code running on another cpu,
ending up with 2 cpus trying to store under the same ring
buffer, which is not handled at the moment.

This patch adds atomic 'recursion' flag (any cpu could touch
the ring buffer at any moment), that guards the entry to the
storage code. It's set in __perf_output_begin function and
released in perf_output_put_handle. If the flag is set, the
code increases the lost count and bails out.

Signed-off-by: Jiri Olsa <jolsa@kernel.org>
---
 kernel/events/internal.h    | 1 +
 kernel/events/ring_buffer.c | 8 ++++++++
 2 files changed, 9 insertions(+)

diff --git a/kernel/events/internal.h b/kernel/events/internal.h
index 6dc725a7e7bc..82599da9723f 100644
--- a/kernel/events/internal.h
+++ b/kernel/events/internal.h
@@ -11,6 +11,7 @@
 
 struct ring_buffer {
 	atomic_t			refcount;
+	atomic_t			recursion;
 	struct rcu_head			rcu_head;
 #ifdef CONFIG_PERF_USE_VMALLOC
 	struct work_struct		work;
diff --git a/kernel/events/ring_buffer.c b/kernel/events/ring_buffer.c
index 4a9937076331..0c976ac414c5 100644
--- a/kernel/events/ring_buffer.c
+++ b/kernel/events/ring_buffer.c
@@ -101,6 +101,7 @@ static void perf_output_put_handle(struct perf_output_handle *handle)
 
 out:
 	preempt_enable();
+	atomic_set(&rb->recursion, 0);
 }
 
 static __always_inline bool
@@ -145,6 +146,12 @@ __perf_output_begin(struct perf_output_handle *handle,
 		goto out;
 	}
 
+	if (atomic_cmpxchg(&rb->recursion, 0, 1) != 0) {
+		if (rb->nr_pages)
+			local_inc(&rb->lost);
+		goto out;
+	}
+
 	handle->rb    = rb;
 	handle->event = event;
 
@@ -286,6 +293,7 @@ ring_buffer_init(struct ring_buffer *rb, long watermark, int flags)
 		rb->overwrite = 1;
 
 	atomic_set(&rb->refcount, 1);
+	atomic_set(&rb->recursion, 0);
 
 	INIT_LIST_HEAD(&rb->event_list);
 	spin_lock_init(&rb->event_lock);
-- 
2.17.1


             reply	other threads:[~2018-09-12 19:33 UTC|newest]

Thread overview: 9+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2018-09-12 19:33 Jiri Olsa [this message]
2018-09-13  7:07 ` [PATCH] perf: Prevent recursion in ring buffer Peter Zijlstra
2018-09-13  7:41   ` Jiri Olsa
2018-09-13  7:46   ` Jiri Olsa
2018-09-13  9:37     ` Peter Zijlstra
2018-09-23 16:13       ` [PATCHv2] perf: Prevent concurent ring buffer access Jiri Olsa
2018-10-02 10:01         ` [tip:perf/core] perf/ring_buffer: " tip-bot for Jiri Olsa
2018-09-13  7:40 ` [PATCH] perf: Prevent recursion in ring buffer Peter Zijlstra
2018-09-13  7:53   ` Jiri Olsa

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20180912193317.10339-1-jolsa@kernel.org \
    --to=jolsa@kernel.org \
    --cc=a.p.zijlstra@chello.nl \
    --cc=acme@kernel.org \
    --cc=alexander.shishkin@linux.intel.com \
    --cc=andi@firstfloor.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=mingo@kernel.org \
    --cc=namhyung@kernel.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).