LKML Archive on lore.kernel.org
 help / color / Atom feed
From: Jiri Olsa <jolsa@redhat.com>
To: Peter Zijlstra <peterz@infradead.org>
Cc: Jiri Olsa <jolsa@kernel.org>,
	Arnaldo Carvalho de Melo <acme@kernel.org>,
	lkml <linux-kernel@vger.kernel.org>,
	Ingo Molnar <mingo@kernel.org>,
	Namhyung Kim <namhyung@kernel.org>,
	Alexander Shishkin <alexander.shishkin@linux.intel.com>,
	Andi Kleen <andi@firstfloor.org>,
	Andrew Vagin <avagin@openvz.org>
Subject: [PATCHv2] perf: Prevent concurent ring buffer access
Date: Sun, 23 Sep 2018 18:13:43 +0200
Message-ID: <20180923161343.GB15054@krava> (raw)
In-Reply-To: <20180913093754.GV24124@hirez.programming.kicks-ass.net>

On Thu, Sep 13, 2018 at 11:37:54AM +0200, Peter Zijlstra wrote:
> On Thu, Sep 13, 2018 at 09:46:07AM +0200, Jiri Olsa wrote:
> > On Thu, Sep 13, 2018 at 09:07:40AM +0200, Peter Zijlstra wrote:
> > > On Wed, Sep 12, 2018 at 09:33:17PM +0200, Jiri Olsa wrote:
> > > > Some of the scheduling tracepoints allow the perf_tp_event
> > > > code to write to ring buffer under different cpu than the
> > > > code is running on.
> > > 
> > > ARGH.. that is indeed borken.
> 
> > I was first thinking to just leave it on the current cpu,
> > but not sure current users would be ok with that ;-)
> 
> > ---
> > diff --git a/kernel/events/core.c b/kernel/events/core.c
> > index abaed4f8bb7f..9b534a2ecf17 100644
> > --- a/kernel/events/core.c
> > +++ b/kernel/events/core.c
> > @@ -8308,6 +8308,8 @@ void perf_tp_event(u16 event_type, u64 count, void *record, int entry_size,
> >  				continue;
> >  			if (event->attr.config != entry->type)
> >  				continue;
> > +			if (event->cpu != smp_processor_id())
> > +				continue;
> >  			if (perf_tp_event_match(event, &data, regs))
> >  				perf_swevent_event(event, count, &data, regs);
> >  		}
> 
> That might indeed be the best we can do.
> 
> So the whole TP muck would be responsible for placing only matching
> events on the hlist, which is where our normal CPU filter is I think.
> 
> The above then does the same for @task. Which without this would also be
> getting nr_cpus copies of the event I think.
> 
> It does mean not getting any events if the @task only has a per-task
> buffer, but there's nothing to be done about that. And I'm not even sure
> we can create a useful warning for that :/

ok, sending full patch (v2) with above change

cc-ing Andrew Vagin who added this feature,
because this patch change the way it works

thanks,
jirka


---
Some of the scheduling tracepoints allow the perf_tp_event
code to write to ring buffer under different cpu than the
code is running on.

This results in corrupted ring buffer data demonstrated in
following perf commands:

  # perf record -e 'sched:sched_switch,sched:sched_wakeup' perf bench sched messaging
  # Running 'sched/messaging' benchmark:
  # 20 sender and receiver processes per group
  # 10 groups == 400 processes run

       Total time: 0.383 [sec]
  [ perf record: Woken up 8 times to write data ]
  0x42b890 [0]: failed to process type: -1765585640
  [ perf record: Captured and wrote 4.825 MB perf.data (29669 samples) ]

  # perf report --stdio
  0x42b890 [0]: failed to process type: -1765585640

The reason for the corruptions are some of the scheduling tracepoints,
that have __perf_task dfined and thus allow to store data to another
cpu ring buffer:

  sched_waking
  sched_wakeup
  sched_wakeup_new
  sched_stat_wait
  sched_stat_sleep
  sched_stat_iowait
  sched_stat_blocked

The perf_tp_event function first store samples for current cpu
related events defined for tracepoint:

    hlist_for_each_entry_rcu(event, head, hlist_entry)
      perf_swevent_event(event, count, &data, regs);

And then iterates events of the 'task' and store the sample
for any task's event that passes tracepoint checks:

  ctx = rcu_dereference(task->perf_event_ctxp[perf_sw_context]);

  list_for_each_entry_rcu(event, &ctx->event_list, event_entry) {
    if (event->attr.type != PERF_TYPE_TRACEPOINT)
      continue;
    if (event->attr.config != entry->type)
      continue;

    perf_swevent_event(event, count, &data, regs);
  }

Above code can race with same code running on another cpu,
ending up with 2 cpus trying to store under the same ring
buffer, which is not handled at the moment.

This patch prevents the race, by allowing only events
with the same current cpu to receive the event.

Fixes: e6dab5ffab59 ("perf/trace: Add ability to set a target task for events")
Signed-off-by: Jiri Olsa <jolsa@kernel.org>
---
 kernel/events/core.c | 2 ++
 1 file changed, 2 insertions(+)

diff --git a/kernel/events/core.c b/kernel/events/core.c
index c80549bf82c6..f269f666510c 100644
--- a/kernel/events/core.c
+++ b/kernel/events/core.c
@@ -8308,6 +8308,8 @@ void perf_tp_event(u16 event_type, u64 count, void *record, int entry_size,
 			goto unlock;
 
 		list_for_each_entry_rcu(event, &ctx->event_list, event_entry) {
+			if (event->cpu != smp_processor_id())
+				continue;
 			if (event->attr.type != PERF_TYPE_TRACEPOINT)
 				continue;
 			if (event->attr.config != entry->type)
-- 
2.17.1


  reply index

Thread overview: 9+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2018-09-12 19:33 [PATCH] perf: Prevent recursion in ring buffer Jiri Olsa
2018-09-13  7:07 ` Peter Zijlstra
2018-09-13  7:41   ` Jiri Olsa
2018-09-13  7:46   ` Jiri Olsa
2018-09-13  9:37     ` Peter Zijlstra
2018-09-23 16:13       ` Jiri Olsa [this message]
2018-10-02 10:01         ` [tip:perf/core] perf/ring_buffer: Prevent concurent ring buffer access tip-bot for Jiri Olsa
2018-09-13  7:40 ` [PATCH] perf: Prevent recursion in ring buffer Peter Zijlstra
2018-09-13  7:53   ` Jiri Olsa

Reply instructions:

You may reply publically to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20180923161343.GB15054@krava \
    --to=jolsa@redhat.com \
    --cc=acme@kernel.org \
    --cc=alexander.shishkin@linux.intel.com \
    --cc=andi@firstfloor.org \
    --cc=avagin@openvz.org \
    --cc=jolsa@kernel.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=mingo@kernel.org \
    --cc=namhyung@kernel.org \
    --cc=peterz@infradead.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

LKML Archive on lore.kernel.org

Archives are clonable:
	git clone --mirror https://lore.kernel.org/lkml/0 lkml/git/0.git
	git clone --mirror https://lore.kernel.org/lkml/1 lkml/git/1.git
	git clone --mirror https://lore.kernel.org/lkml/2 lkml/git/2.git
	git clone --mirror https://lore.kernel.org/lkml/3 lkml/git/3.git
	git clone --mirror https://lore.kernel.org/lkml/4 lkml/git/4.git
	git clone --mirror https://lore.kernel.org/lkml/5 lkml/git/5.git
	git clone --mirror https://lore.kernel.org/lkml/6 lkml/git/6.git
	git clone --mirror https://lore.kernel.org/lkml/7 lkml/git/7.git

	# If you have public-inbox 1.1+ installed, you may
	# initialize and index your mirror using the following commands:
	public-inbox-init -V2 lkml lkml/ https://lore.kernel.org/lkml \
		linux-kernel@vger.kernel.org
	public-inbox-index lkml

Example config snippet for mirrors

Newsgroup available over NNTP:
	nntp://nntp.lore.kernel.org/org.kernel.vger.linux-kernel


AGPL code for this site: git clone https://public-inbox.org/public-inbox.git