All of lore.kernel.org
 help / color / mirror / Atom feed
From: Peter Zijlstra <peterz@infradead.org>
To: Suzuki Poulose <suzuki.poulose@arm.com>
Cc: Mathieu Poirier <mathieu.poirier@linaro.org>,
	Sai Prakash Ranjan <saiprakash.ranjan@codeaurora.org>,
	Mike Leach <mike.leach@linaro.org>,
	Ingo Molnar <mingo@redhat.com>,
	Arnaldo Carvalho de Melo <acme@kernel.org>,
	Mark Rutland <mark.rutland@arm.com>,
	Alexander Shishkin <alexander.shishkin@linux.intel.com>,
	Jiri Olsa <jolsa@redhat.com>, Namhyung Kim <namhyung@kernel.org>,
	coresight@lists.linaro.org, Stephen Boyd <swboyd@chromium.org>,
	linux-arm-msm@vger.kernel.org, linux-kernel@vger.kernel.org,
	linux-arm-kernel@lists.infradead.org
Subject: Re: [PATCHv2 2/4] coresight: tmc-etf: Fix NULL ptr dereference in tmc_enable_etf_sink_perf()
Date: Fri, 23 Oct 2020 12:54:31 +0200	[thread overview]
Message-ID: <20201023105431.GM2594@hirez.programming.kicks-ass.net> (raw)
In-Reply-To: <bd8c136d-9dfa-a760-31f9-eb8d6698aced@arm.com>

On Fri, Oct 23, 2020 at 11:34:32AM +0100, Suzuki Poulose wrote:
> On 10/23/20 10:41 AM, Peter Zijlstra wrote:
> > On Fri, Oct 23, 2020 at 09:49:53AM +0100, Suzuki Poulose wrote:
> > > On 10/23/20 8:39 AM, Peter Zijlstra wrote:
> > 
> > > > So then I don't understand the !->owner issue, that only happens when
> > > > the task dies, which cannot be concurrent with event creation. Are you
> > > 
> > > Part of the patch from Sai, fixes this by avoiding the dereferencing
> > > after event creation (by caching it). But the kernel events needs
> > > fixing.
> > 
> > I'm fundamentally failing here. Creating a link to the sink is strictly
> > event-creation time. Why would you ever need it again later? Later you
> > already have the sink setup.
> > 
> 
> Sorry for the lack of clarity here, and you are not alone unless you
> have drowned in the CoreSight topologies ;-)
> 
> Typically current generation of systems have the following topology :
> 
> CPU0
>  etm0   \
>          \  ________
>          /          \
> CPU1    /            \
>   etm1                \
>                        \
>                        /-------  sink0
> CPU2                  /
>   etm2  \            /
>          \ ________ /
>          /
> CPU3    /
>   etm3
> 
> 
> i.e, Multiple ETMs share a sink. [for the sake of simplicity, I have
> used one sink. Even though there could be potential sinks (of different
> types), none of them are private to the ETMs. So, in a nutshell, a sink
> can be reached by multiple ETMs. ]
> 
> Now, for a session :
> 
> perf record -e cs_etm/sinkid=sink0/u workload
> 
> We create an event per CPU (say eventN, which are scheduled based on the
> threads that could execute on the CPU. At this point we have finalized
> the sink0, and have allocated necessary buffer for the sink0.
> 
> Now, when the threads are scheduled on the CPUs, we start the
> appropriate events for the CPUs.
> 
> e.g,
>  CPU0 sched -> workload:0 - > etm0->event0_start -> Turns all
> the components upto sink0, starting the trace collection in the buffer.
> 
> Now, if another CPU, CPU1 starts tracing event1 for workload:1 thread,
> it will eventually try to turn ON the sink0.Since sink0 is already
> active tracing event0, we could allow this to go through and collect
> the trace in the *same hardware buffer* (which can be demuxed from the
> single AUX record using the TraceID in the packets). Please note that
> we do double buffering and hardware buffer is copied only when the sink0
> is stopped (see below).
> 
> But, if the event scheduled on CPU1 doesn't belong to the above session, but
> belongs to different perf session
>  (say, perf record -e  cs_etm/sinkid=sink0/u benchmark),
> 
> we can't allow this to succeed and mix the trace data in to that of workload
> and thus fail the operation.
> 
> In a nutshell, since the sinks are shared, we start the sink on the
> first event and keeps sharing the sink buffer with any event that
> belongs to the same session (using refcounts). The sink is only released
> for other sessions, when there are no more events in the session tracing
> on any of the ETMs.
> 
> I know this is fundamentally a topology issue, but that is not something
> we can fix. But the situation is changing and we are starting to see
> systems with per-CPU sinks.
> 
> Hope this helps.

I think I'm more confused now :-/

Where do we use ->owner after event creation? The moment you create your
eventN you create the link to sink0. That link either succeeds (same
'cookie') or fails.

If it fails, event creation fails, the end.

On success, we have the sink pointer in our event and we never ever need
to look at ->owner ever again.

I'm also not seeing why exactly we need ->owner in the first place.

Suppose we make the sink0 device return -EBUSY on open() when it is
active. Then a perf session can open the sink0 device, create perf
events and attach them to the sink0 device using
perf_event_attr::config2. The events will attach to sink0 and increment
its usage count, such that any further open() will fail.

Once the events are created, the perf tool close()s the sink0 device,
which is now will in-use by the events. No other events can be attached
to it.

Or are you doing the event->sink mapping every time you do: pmu::add()?
That sounds insane.

WARNING: multiple messages have this Message-ID (diff)
From: Peter Zijlstra <peterz@infradead.org>
To: Suzuki Poulose <suzuki.poulose@arm.com>
Cc: Mark Rutland <mark.rutland@arm.com>,
	Sai Prakash Ranjan <saiprakash.ranjan@codeaurora.org>,
	Mathieu Poirier <mathieu.poirier@linaro.org>,
	Alexander Shishkin <alexander.shishkin@linux.intel.com>,
	linux-arm-msm@vger.kernel.org, coresight@lists.linaro.org,
	linux-kernel@vger.kernel.org,
	Arnaldo Carvalho de Melo <acme@kernel.org>,
	Stephen Boyd <swboyd@chromium.org>,
	Ingo Molnar <mingo@redhat.com>,
	Namhyung Kim <namhyung@kernel.org>, Jiri Olsa <jolsa@redhat.com>,
	linux-arm-kernel@lists.infradead.org,
	Mike Leach <mike.leach@linaro.org>
Subject: Re: [PATCHv2 2/4] coresight: tmc-etf: Fix NULL ptr dereference in tmc_enable_etf_sink_perf()
Date: Fri, 23 Oct 2020 12:54:31 +0200	[thread overview]
Message-ID: <20201023105431.GM2594@hirez.programming.kicks-ass.net> (raw)
In-Reply-To: <bd8c136d-9dfa-a760-31f9-eb8d6698aced@arm.com>

On Fri, Oct 23, 2020 at 11:34:32AM +0100, Suzuki Poulose wrote:
> On 10/23/20 10:41 AM, Peter Zijlstra wrote:
> > On Fri, Oct 23, 2020 at 09:49:53AM +0100, Suzuki Poulose wrote:
> > > On 10/23/20 8:39 AM, Peter Zijlstra wrote:
> > 
> > > > So then I don't understand the !->owner issue, that only happens when
> > > > the task dies, which cannot be concurrent with event creation. Are you
> > > 
> > > Part of the patch from Sai, fixes this by avoiding the dereferencing
> > > after event creation (by caching it). But the kernel events needs
> > > fixing.
> > 
> > I'm fundamentally failing here. Creating a link to the sink is strictly
> > event-creation time. Why would you ever need it again later? Later you
> > already have the sink setup.
> > 
> 
> Sorry for the lack of clarity here, and you are not alone unless you
> have drowned in the CoreSight topologies ;-)
> 
> Typically current generation of systems have the following topology :
> 
> CPU0
>  etm0   \
>          \  ________
>          /          \
> CPU1    /            \
>   etm1                \
>                        \
>                        /-------  sink0
> CPU2                  /
>   etm2  \            /
>          \ ________ /
>          /
> CPU3    /
>   etm3
> 
> 
> i.e, Multiple ETMs share a sink. [for the sake of simplicity, I have
> used one sink. Even though there could be potential sinks (of different
> types), none of them are private to the ETMs. So, in a nutshell, a sink
> can be reached by multiple ETMs. ]
> 
> Now, for a session :
> 
> perf record -e cs_etm/sinkid=sink0/u workload
> 
> We create an event per CPU (say eventN, which are scheduled based on the
> threads that could execute on the CPU. At this point we have finalized
> the sink0, and have allocated necessary buffer for the sink0.
> 
> Now, when the threads are scheduled on the CPUs, we start the
> appropriate events for the CPUs.
> 
> e.g,
>  CPU0 sched -> workload:0 - > etm0->event0_start -> Turns all
> the components upto sink0, starting the trace collection in the buffer.
> 
> Now, if another CPU, CPU1 starts tracing event1 for workload:1 thread,
> it will eventually try to turn ON the sink0.Since sink0 is already
> active tracing event0, we could allow this to go through and collect
> the trace in the *same hardware buffer* (which can be demuxed from the
> single AUX record using the TraceID in the packets). Please note that
> we do double buffering and hardware buffer is copied only when the sink0
> is stopped (see below).
> 
> But, if the event scheduled on CPU1 doesn't belong to the above session, but
> belongs to different perf session
>  (say, perf record -e  cs_etm/sinkid=sink0/u benchmark),
> 
> we can't allow this to succeed and mix the trace data in to that of workload
> and thus fail the operation.
> 
> In a nutshell, since the sinks are shared, we start the sink on the
> first event and keeps sharing the sink buffer with any event that
> belongs to the same session (using refcounts). The sink is only released
> for other sessions, when there are no more events in the session tracing
> on any of the ETMs.
> 
> I know this is fundamentally a topology issue, but that is not something
> we can fix. But the situation is changing and we are starting to see
> systems with per-CPU sinks.
> 
> Hope this helps.

I think I'm more confused now :-/

Where do we use ->owner after event creation? The moment you create your
eventN you create the link to sink0. That link either succeeds (same
'cookie') or fails.

If it fails, event creation fails, the end.

On success, we have the sink pointer in our event and we never ever need
to look at ->owner ever again.

I'm also not seeing why exactly we need ->owner in the first place.

Suppose we make the sink0 device return -EBUSY on open() when it is
active. Then a perf session can open the sink0 device, create perf
events and attach them to the sink0 device using
perf_event_attr::config2. The events will attach to sink0 and increment
its usage count, such that any further open() will fail.

Once the events are created, the perf tool close()s the sink0 device,
which is now will in-use by the events. No other events can be attached
to it.

Or are you doing the event->sink mapping every time you do: pmu::add()?
That sounds insane.

_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel

  reply	other threads:[~2020-10-23 10:54 UTC|newest]

Thread overview: 63+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2020-10-22 10:57 [PATCHv2 0/4] coresight: etf/etb10/etr: Fix NULL pointer dereference crashes Sai Prakash Ranjan
2020-10-22 10:57 ` Sai Prakash Ranjan
2020-10-22 10:57 ` [PATCHv2 1/4] perf/core: Export is_kernel_event() Sai Prakash Ranjan
2020-10-22 10:57   ` Sai Prakash Ranjan
2020-10-31  7:35   ` kernel test robot
2020-10-22 10:57 ` [PATCHv2 2/4] coresight: tmc-etf: Fix NULL ptr dereference in tmc_enable_etf_sink_perf() Sai Prakash Ranjan
2020-10-22 10:57   ` Sai Prakash Ranjan
2020-10-22 11:32   ` Peter Zijlstra
2020-10-22 11:32     ` Peter Zijlstra
2020-10-22 12:49     ` Sai Prakash Ranjan
2020-10-22 12:49       ` Sai Prakash Ranjan
2020-10-22 13:34       ` Peter Zijlstra
2020-10-22 13:34         ` Peter Zijlstra
2020-10-22 14:23         ` Sai Prakash Ranjan
2020-10-22 14:23           ` Sai Prakash Ranjan
2020-10-22 13:30     ` Suzuki Poulose
2020-10-22 13:30       ` Suzuki Poulose
2020-10-22 15:06       ` Peter Zijlstra
2020-10-22 15:06         ` Peter Zijlstra
2020-10-22 15:32         ` Suzuki Poulose
2020-10-22 15:32           ` Suzuki Poulose
2020-10-22 21:20           ` Mathieu Poirier
2020-10-22 21:20             ` Mathieu Poirier
2020-10-23  7:39             ` Peter Zijlstra
2020-10-23  7:39               ` Peter Zijlstra
2020-10-23  8:49               ` Suzuki Poulose
2020-10-23  8:49                 ` Suzuki Poulose
2020-10-23  9:23                 ` Peter Zijlstra
2020-10-23  9:23                   ` Peter Zijlstra
2020-10-23 10:49                   ` Suzuki Poulose
2020-10-23 10:49                     ` Suzuki Poulose
2020-10-23  9:41                 ` Peter Zijlstra
2020-10-23  9:41                   ` Peter Zijlstra
2020-10-23 10:34                   ` Suzuki Poulose
2020-10-23 10:34                     ` Suzuki Poulose
2020-10-23 10:54                     ` Peter Zijlstra [this message]
2020-10-23 10:54                       ` Peter Zijlstra
2020-10-23 12:56                       ` Suzuki Poulose
2020-10-23 12:56                         ` Suzuki Poulose
2020-10-23 13:16                         ` Peter Zijlstra
2020-10-23 13:16                           ` Peter Zijlstra
2020-10-23 13:29                           ` Suzuki Poulose
2020-10-23 13:29                             ` Suzuki Poulose
2020-10-23 13:44                             ` Peter Zijlstra
2020-10-23 13:44                               ` Peter Zijlstra
2020-10-23 20:37                               ` Mathieu Poirier
2020-10-23 20:37                                 ` Mathieu Poirier
2020-10-30  7:59                                 ` Sai Prakash Ranjan
2020-10-30  7:59                                   ` Sai Prakash Ranjan
2020-10-30 16:48                                   ` Mathieu Poirier
2020-10-30 16:48                                     ` Mathieu Poirier
2020-10-30 17:26                                     ` Sai Prakash Ranjan
2020-10-30 17:26                                       ` Sai Prakash Ranjan
2020-11-04 17:03                                       ` Mathieu Poirier
2020-11-04 17:03                                         ` Mathieu Poirier
2020-10-22 10:57 ` [PATCHv2 3/4] coresight: etb10: Fix possible NULL ptr dereference in etb_enable_perf() Sai Prakash Ranjan
2020-10-22 10:57   ` Sai Prakash Ranjan
2020-10-22 10:57 ` [PATCHv2 4/4] coresight: tmc-etr: Fix possible NULL ptr dereference in get_perf_etr_buf_cpu_wide() Sai Prakash Ranjan
2020-10-22 10:57   ` Sai Prakash Ranjan
2020-10-22 11:10 ` [PATCHv2 0/4] coresight: etf/etb10/etr: Fix NULL pointer dereference crashes Sai Prakash Ranjan
2020-10-22 11:10   ` Sai Prakash Ranjan
2020-10-22 11:23   ` Sai Prakash Ranjan
2020-10-22 11:23     ` Sai Prakash Ranjan

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20201023105431.GM2594@hirez.programming.kicks-ass.net \
    --to=peterz@infradead.org \
    --cc=acme@kernel.org \
    --cc=alexander.shishkin@linux.intel.com \
    --cc=coresight@lists.linaro.org \
    --cc=jolsa@redhat.com \
    --cc=linux-arm-kernel@lists.infradead.org \
    --cc=linux-arm-msm@vger.kernel.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=mark.rutland@arm.com \
    --cc=mathieu.poirier@linaro.org \
    --cc=mike.leach@linaro.org \
    --cc=mingo@redhat.com \
    --cc=namhyung@kernel.org \
    --cc=saiprakash.ranjan@codeaurora.org \
    --cc=suzuki.poulose@arm.com \
    --cc=swboyd@chromium.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.