* [PATCH] perf: Prevent recursion in ring buffer @ 2018-09-12 19:33 Jiri Olsa 2018-09-13 7:07 ` Peter Zijlstra 2018-09-13 7:40 ` [PATCH] perf: Prevent recursion in ring buffer Peter Zijlstra 0 siblings, 2 replies; 9+ messages in thread From: Jiri Olsa @ 2018-09-12 19:33 UTC (permalink / raw) To: Arnaldo Carvalho de Melo Cc: lkml, Ingo Molnar, Namhyung Kim, Alexander Shishkin, Peter Zijlstra, Andi Kleen Some of the scheduling tracepoints allow the perf_tp_event code to write to ring buffer under different cpu than the code is running on. This results in corrupted ring buffer data demonstrated in following perf commands: # perf record -e 'sched:sched_switch,sched:sched_wakeup' perf bench sched messaging # Running 'sched/messaging' benchmark: # 20 sender and receiver processes per group # 10 groups == 400 processes run Total time: 0.383 [sec] [ perf record: Woken up 8 times to write data ] 0x42b890 [0]: failed to process type: -1765585640 [ perf record: Captured and wrote 4.825 MB perf.data (29669 samples) ] # perf report --stdio 0x42b890 [0]: failed to process type: -1765585640 The reason for the corruptions are some of the scheduling tracepoints, that have __perf_task dfined and thus allow to store data to another cpu ring buffer: sched_waking sched_wakeup sched_wakeup_new sched_stat_wait sched_stat_sleep sched_stat_iowait sched_stat_blocked The perf_tp_event function first store samples for current cpu related events defined for tracepoint: hlist_for_each_entry_rcu(event, head, hlist_entry) perf_swevent_event(event, count, &data, regs); And then iterates events of the 'task' and store the sample for any task's event that passes tracepoint checks: ctx = rcu_dereference(task->perf_event_ctxp[perf_sw_context]); list_for_each_entry_rcu(event, &ctx->event_list, event_entry) { if (event->attr.type != PERF_TYPE_TRACEPOINT) continue; if (event->attr.config != entry->type) continue; perf_swevent_event(event, count, &data, regs); } Above code can race with same code running on another cpu, ending up with 2 cpus trying to store under the same ring buffer, which is not handled at the moment. This patch adds atomic 'recursion' flag (any cpu could touch the ring buffer at any moment), that guards the entry to the storage code. It's set in __perf_output_begin function and released in perf_output_put_handle. If the flag is set, the code increases the lost count and bails out. Signed-off-by: Jiri Olsa <jolsa@kernel.org> --- kernel/events/internal.h | 1 + kernel/events/ring_buffer.c | 8 ++++++++ 2 files changed, 9 insertions(+) diff --git a/kernel/events/internal.h b/kernel/events/internal.h index 6dc725a7e7bc..82599da9723f 100644 --- a/kernel/events/internal.h +++ b/kernel/events/internal.h @@ -11,6 +11,7 @@ struct ring_buffer { atomic_t refcount; + atomic_t recursion; struct rcu_head rcu_head; #ifdef CONFIG_PERF_USE_VMALLOC struct work_struct work; diff --git a/kernel/events/ring_buffer.c b/kernel/events/ring_buffer.c index 4a9937076331..0c976ac414c5 100644 --- a/kernel/events/ring_buffer.c +++ b/kernel/events/ring_buffer.c @@ -101,6 +101,7 @@ static void perf_output_put_handle(struct perf_output_handle *handle) out: preempt_enable(); + atomic_set(&rb->recursion, 0); } static __always_inline bool @@ -145,6 +146,12 @@ __perf_output_begin(struct perf_output_handle *handle, goto out; } + if (atomic_cmpxchg(&rb->recursion, 0, 1) != 0) { + if (rb->nr_pages) + local_inc(&rb->lost); + goto out; + } + handle->rb = rb; handle->event = event; @@ -286,6 +293,7 @@ ring_buffer_init(struct ring_buffer *rb, long watermark, int flags) rb->overwrite = 1; atomic_set(&rb->refcount, 1); + atomic_set(&rb->recursion, 0); INIT_LIST_HEAD(&rb->event_list); spin_lock_init(&rb->event_lock); -- 2.17.1 ^ permalink raw reply related [flat|nested] 9+ messages in thread
* Re: [PATCH] perf: Prevent recursion in ring buffer 2018-09-12 19:33 [PATCH] perf: Prevent recursion in ring buffer Jiri Olsa @ 2018-09-13 7:07 ` Peter Zijlstra 2018-09-13 7:41 ` Jiri Olsa 2018-09-13 7:46 ` Jiri Olsa 2018-09-13 7:40 ` [PATCH] perf: Prevent recursion in ring buffer Peter Zijlstra 1 sibling, 2 replies; 9+ messages in thread From: Peter Zijlstra @ 2018-09-13 7:07 UTC (permalink / raw) To: Jiri Olsa Cc: Arnaldo Carvalho de Melo, lkml, Ingo Molnar, Namhyung Kim, Alexander Shishkin, Andi Kleen On Wed, Sep 12, 2018 at 09:33:17PM +0200, Jiri Olsa wrote: > Some of the scheduling tracepoints allow the perf_tp_event > code to write to ring buffer under different cpu than the > code is running on. ARGH.. that is indeed borken. > diff --git a/kernel/events/ring_buffer.c b/kernel/events/ring_buffer.c > index 4a9937076331..0c976ac414c5 100644 > --- a/kernel/events/ring_buffer.c > +++ b/kernel/events/ring_buffer.c > @@ -101,6 +101,7 @@ static void perf_output_put_handle(struct perf_output_handle *handle) > > out: > preempt_enable(); > + atomic_set(&rb->recursion, 0); > } > > static __always_inline bool > @@ -145,6 +146,12 @@ __perf_output_begin(struct perf_output_handle *handle, > goto out; > } > > + if (atomic_cmpxchg(&rb->recursion, 0, 1) != 0) { > + if (rb->nr_pages) > + local_inc(&rb->lost); > + goto out; > + } > + > handle->rb = rb; > handle->event = event; > > @@ -286,6 +293,7 @@ ring_buffer_init(struct ring_buffer *rb, long watermark, int flags) > rb->overwrite = 1; > > atomic_set(&rb->refcount, 1); > + atomic_set(&rb->recursion, 0); > > INIT_LIST_HEAD(&rb->event_list); > spin_lock_init(&rb->event_lock); That's not a recursion count, that's a test-and-set spinlock, and you got the memory ordering wrong for that. Also, we tried very hard to avoid atomic ops in the ring-buffer and you just wrecked that. Worse, you wrecked previously working interrupt nesting output. Let me have a look at this. ^ permalink raw reply [flat|nested] 9+ messages in thread
* Re: [PATCH] perf: Prevent recursion in ring buffer 2018-09-13 7:07 ` Peter Zijlstra @ 2018-09-13 7:41 ` Jiri Olsa 2018-09-13 7:46 ` Jiri Olsa 1 sibling, 0 replies; 9+ messages in thread From: Jiri Olsa @ 2018-09-13 7:41 UTC (permalink / raw) To: Peter Zijlstra Cc: Jiri Olsa, Arnaldo Carvalho de Melo, lkml, Ingo Molnar, Namhyung Kim, Alexander Shishkin, Andi Kleen On Thu, Sep 13, 2018 at 09:07:40AM +0200, Peter Zijlstra wrote: > On Wed, Sep 12, 2018 at 09:33:17PM +0200, Jiri Olsa wrote: > > Some of the scheduling tracepoints allow the perf_tp_event > > code to write to ring buffer under different cpu than the > > code is running on. > > ARGH.. that is indeed borken. > > > diff --git a/kernel/events/ring_buffer.c b/kernel/events/ring_buffer.c > > index 4a9937076331..0c976ac414c5 100644 > > --- a/kernel/events/ring_buffer.c > > +++ b/kernel/events/ring_buffer.c > > @@ -101,6 +101,7 @@ static void perf_output_put_handle(struct perf_output_handle *handle) > > > > out: > > preempt_enable(); > > + atomic_set(&rb->recursion, 0); > > } > > > > static __always_inline bool > > @@ -145,6 +146,12 @@ __perf_output_begin(struct perf_output_handle *handle, > > goto out; > > } > > > > + if (atomic_cmpxchg(&rb->recursion, 0, 1) != 0) { > > + if (rb->nr_pages) > > + local_inc(&rb->lost); > > + goto out; > > + } > > + > > handle->rb = rb; > > handle->event = event; > > > > @@ -286,6 +293,7 @@ ring_buffer_init(struct ring_buffer *rb, long watermark, int flags) > > rb->overwrite = 1; > > > > atomic_set(&rb->refcount, 1); > > + atomic_set(&rb->recursion, 0); > > > > INIT_LIST_HEAD(&rb->event_list); > > spin_lock_init(&rb->event_lock); > > That's not a recursion count, that's a test-and-set spinlock, and you > got the memory ordering wrong for that. > > Also, we tried very hard to avoid atomic ops in the ring-buffer and you > just wrecked that. Worse, you wrecked previously working interrupt > nesting output. ah.. the interrupt will also bail out now.. right :-\ > > Let me have a look at this. > ^ permalink raw reply [flat|nested] 9+ messages in thread
* Re: [PATCH] perf: Prevent recursion in ring buffer 2018-09-13 7:07 ` Peter Zijlstra 2018-09-13 7:41 ` Jiri Olsa @ 2018-09-13 7:46 ` Jiri Olsa 2018-09-13 9:37 ` Peter Zijlstra 1 sibling, 1 reply; 9+ messages in thread From: Jiri Olsa @ 2018-09-13 7:46 UTC (permalink / raw) To: Peter Zijlstra Cc: Jiri Olsa, Arnaldo Carvalho de Melo, lkml, Ingo Molnar, Namhyung Kim, Alexander Shishkin, Andi Kleen On Thu, Sep 13, 2018 at 09:07:40AM +0200, Peter Zijlstra wrote: > On Wed, Sep 12, 2018 at 09:33:17PM +0200, Jiri Olsa wrote: > > Some of the scheduling tracepoints allow the perf_tp_event > > code to write to ring buffer under different cpu than the > > code is running on. > > ARGH.. that is indeed borken. > > > diff --git a/kernel/events/ring_buffer.c b/kernel/events/ring_buffer.c > > index 4a9937076331..0c976ac414c5 100644 > > --- a/kernel/events/ring_buffer.c > > +++ b/kernel/events/ring_buffer.c > > @@ -101,6 +101,7 @@ static void perf_output_put_handle(struct perf_output_handle *handle) > > > > out: > > preempt_enable(); > > + atomic_set(&rb->recursion, 0); > > } > > > > static __always_inline bool > > @@ -145,6 +146,12 @@ __perf_output_begin(struct perf_output_handle *handle, > > goto out; > > } > > > > + if (atomic_cmpxchg(&rb->recursion, 0, 1) != 0) { > > + if (rb->nr_pages) > > + local_inc(&rb->lost); > > + goto out; > > + } > > + > > handle->rb = rb; > > handle->event = event; > > > > @@ -286,6 +293,7 @@ ring_buffer_init(struct ring_buffer *rb, long watermark, int flags) > > rb->overwrite = 1; > > > > atomic_set(&rb->refcount, 1); > > + atomic_set(&rb->recursion, 0); > > > > INIT_LIST_HEAD(&rb->event_list); > > spin_lock_init(&rb->event_lock); > > That's not a recursion count, that's a test-and-set spinlock, and you > got the memory ordering wrong for that. > > Also, we tried very hard to avoid atomic ops in the ring-buffer and you > just wrecked that. Worse, you wrecked previously working interrupt > nesting output. > > Let me have a look at this. I was first thinking to just leave it on the current cpu, but not sure current users would be ok with that ;-) jirka --- diff --git a/kernel/events/core.c b/kernel/events/core.c index abaed4f8bb7f..9b534a2ecf17 100644 --- a/kernel/events/core.c +++ b/kernel/events/core.c @@ -8308,6 +8308,8 @@ void perf_tp_event(u16 event_type, u64 count, void *record, int entry_size, continue; if (event->attr.config != entry->type) continue; + if (event->cpu != smp_processor_id()) + continue; if (perf_tp_event_match(event, &data, regs)) perf_swevent_event(event, count, &data, regs); } ^ permalink raw reply related [flat|nested] 9+ messages in thread
* Re: [PATCH] perf: Prevent recursion in ring buffer 2018-09-13 7:46 ` Jiri Olsa @ 2018-09-13 9:37 ` Peter Zijlstra 2018-09-23 16:13 ` [PATCHv2] perf: Prevent concurent ring buffer access Jiri Olsa 0 siblings, 1 reply; 9+ messages in thread From: Peter Zijlstra @ 2018-09-13 9:37 UTC (permalink / raw) To: Jiri Olsa Cc: Jiri Olsa, Arnaldo Carvalho de Melo, lkml, Ingo Molnar, Namhyung Kim, Alexander Shishkin, Andi Kleen On Thu, Sep 13, 2018 at 09:46:07AM +0200, Jiri Olsa wrote: > On Thu, Sep 13, 2018 at 09:07:40AM +0200, Peter Zijlstra wrote: > > On Wed, Sep 12, 2018 at 09:33:17PM +0200, Jiri Olsa wrote: > > > Some of the scheduling tracepoints allow the perf_tp_event > > > code to write to ring buffer under different cpu than the > > > code is running on. > > > > ARGH.. that is indeed borken. > I was first thinking to just leave it on the current cpu, > but not sure current users would be ok with that ;-) > --- > diff --git a/kernel/events/core.c b/kernel/events/core.c > index abaed4f8bb7f..9b534a2ecf17 100644 > --- a/kernel/events/core.c > +++ b/kernel/events/core.c > @@ -8308,6 +8308,8 @@ void perf_tp_event(u16 event_type, u64 count, void *record, int entry_size, > continue; > if (event->attr.config != entry->type) > continue; > + if (event->cpu != smp_processor_id()) > + continue; > if (perf_tp_event_match(event, &data, regs)) > perf_swevent_event(event, count, &data, regs); > } That might indeed be the best we can do. So the whole TP muck would be responsible for placing only matching events on the hlist, which is where our normal CPU filter is I think. The above then does the same for @task. Which without this would also be getting nr_cpus copies of the event I think. It does mean not getting any events if the @task only has a per-task buffer, but there's nothing to be done about that. And I'm not even sure we can create a useful warning for that :/ ^ permalink raw reply [flat|nested] 9+ messages in thread
* [PATCHv2] perf: Prevent concurent ring buffer access 2018-09-13 9:37 ` Peter Zijlstra @ 2018-09-23 16:13 ` Jiri Olsa 2018-10-02 10:01 ` [tip:perf/core] perf/ring_buffer: " tip-bot for Jiri Olsa 0 siblings, 1 reply; 9+ messages in thread From: Jiri Olsa @ 2018-09-23 16:13 UTC (permalink / raw) To: Peter Zijlstra Cc: Jiri Olsa, Arnaldo Carvalho de Melo, lkml, Ingo Molnar, Namhyung Kim, Alexander Shishkin, Andi Kleen, Andrew Vagin On Thu, Sep 13, 2018 at 11:37:54AM +0200, Peter Zijlstra wrote: > On Thu, Sep 13, 2018 at 09:46:07AM +0200, Jiri Olsa wrote: > > On Thu, Sep 13, 2018 at 09:07:40AM +0200, Peter Zijlstra wrote: > > > On Wed, Sep 12, 2018 at 09:33:17PM +0200, Jiri Olsa wrote: > > > > Some of the scheduling tracepoints allow the perf_tp_event > > > > code to write to ring buffer under different cpu than the > > > > code is running on. > > > > > > ARGH.. that is indeed borken. > > > I was first thinking to just leave it on the current cpu, > > but not sure current users would be ok with that ;-) > > > --- > > diff --git a/kernel/events/core.c b/kernel/events/core.c > > index abaed4f8bb7f..9b534a2ecf17 100644 > > --- a/kernel/events/core.c > > +++ b/kernel/events/core.c > > @@ -8308,6 +8308,8 @@ void perf_tp_event(u16 event_type, u64 count, void *record, int entry_size, > > continue; > > if (event->attr.config != entry->type) > > continue; > > + if (event->cpu != smp_processor_id()) > > + continue; > > if (perf_tp_event_match(event, &data, regs)) > > perf_swevent_event(event, count, &data, regs); > > } > > That might indeed be the best we can do. > > So the whole TP muck would be responsible for placing only matching > events on the hlist, which is where our normal CPU filter is I think. > > The above then does the same for @task. Which without this would also be > getting nr_cpus copies of the event I think. > > It does mean not getting any events if the @task only has a per-task > buffer, but there's nothing to be done about that. And I'm not even sure > we can create a useful warning for that :/ ok, sending full patch (v2) with above change cc-ing Andrew Vagin who added this feature, because this patch change the way it works thanks, jirka --- Some of the scheduling tracepoints allow the perf_tp_event code to write to ring buffer under different cpu than the code is running on. This results in corrupted ring buffer data demonstrated in following perf commands: # perf record -e 'sched:sched_switch,sched:sched_wakeup' perf bench sched messaging # Running 'sched/messaging' benchmark: # 20 sender and receiver processes per group # 10 groups == 400 processes run Total time: 0.383 [sec] [ perf record: Woken up 8 times to write data ] 0x42b890 [0]: failed to process type: -1765585640 [ perf record: Captured and wrote 4.825 MB perf.data (29669 samples) ] # perf report --stdio 0x42b890 [0]: failed to process type: -1765585640 The reason for the corruptions are some of the scheduling tracepoints, that have __perf_task dfined and thus allow to store data to another cpu ring buffer: sched_waking sched_wakeup sched_wakeup_new sched_stat_wait sched_stat_sleep sched_stat_iowait sched_stat_blocked The perf_tp_event function first store samples for current cpu related events defined for tracepoint: hlist_for_each_entry_rcu(event, head, hlist_entry) perf_swevent_event(event, count, &data, regs); And then iterates events of the 'task' and store the sample for any task's event that passes tracepoint checks: ctx = rcu_dereference(task->perf_event_ctxp[perf_sw_context]); list_for_each_entry_rcu(event, &ctx->event_list, event_entry) { if (event->attr.type != PERF_TYPE_TRACEPOINT) continue; if (event->attr.config != entry->type) continue; perf_swevent_event(event, count, &data, regs); } Above code can race with same code running on another cpu, ending up with 2 cpus trying to store under the same ring buffer, which is not handled at the moment. This patch prevents the race, by allowing only events with the same current cpu to receive the event. Fixes: e6dab5ffab59 ("perf/trace: Add ability to set a target task for events") Signed-off-by: Jiri Olsa <jolsa@kernel.org> --- kernel/events/core.c | 2 ++ 1 file changed, 2 insertions(+) diff --git a/kernel/events/core.c b/kernel/events/core.c index c80549bf82c6..f269f666510c 100644 --- a/kernel/events/core.c +++ b/kernel/events/core.c @@ -8308,6 +8308,8 @@ void perf_tp_event(u16 event_type, u64 count, void *record, int entry_size, goto unlock; list_for_each_entry_rcu(event, &ctx->event_list, event_entry) { + if (event->cpu != smp_processor_id()) + continue; if (event->attr.type != PERF_TYPE_TRACEPOINT) continue; if (event->attr.config != entry->type) -- 2.17.1 ^ permalink raw reply related [flat|nested] 9+ messages in thread
* [tip:perf/core] perf/ring_buffer: Prevent concurent ring buffer access 2018-09-23 16:13 ` [PATCHv2] perf: Prevent concurent ring buffer access Jiri Olsa @ 2018-10-02 10:01 ` tip-bot for Jiri Olsa 0 siblings, 0 replies; 9+ messages in thread From: tip-bot for Jiri Olsa @ 2018-10-02 10:01 UTC (permalink / raw) To: linux-tip-commits Cc: alexander.shishkin, torvalds, peterz, jolsa, hpa, tglx, namhyung, linux-kernel, acme, avagin, jolsa, vincent.weaver, acme, mingo, eranian Commit-ID: cd6fb677ce7e460c25bdd66f689734102ec7d642 Gitweb: https://git.kernel.org/tip/cd6fb677ce7e460c25bdd66f689734102ec7d642 Author: Jiri Olsa <jolsa@redhat.com> AuthorDate: Sun, 23 Sep 2018 18:13:43 +0200 Committer: Ingo Molnar <mingo@kernel.org> CommitDate: Tue, 2 Oct 2018 09:37:59 +0200 perf/ring_buffer: Prevent concurent ring buffer access Some of the scheduling tracepoints allow the perf_tp_event code to write to ring buffer under different cpu than the code is running on. This results in corrupted ring buffer data demonstrated in following perf commands: # perf record -e 'sched:sched_switch,sched:sched_wakeup' perf bench sched messaging # Running 'sched/messaging' benchmark: # 20 sender and receiver processes per group # 10 groups == 400 processes run Total time: 0.383 [sec] [ perf record: Woken up 8 times to write data ] 0x42b890 [0]: failed to process type: -1765585640 [ perf record: Captured and wrote 4.825 MB perf.data (29669 samples) ] # perf report --stdio 0x42b890 [0]: failed to process type: -1765585640 The reason for the corruption are some of the scheduling tracepoints, that have __perf_task dfined and thus allow to store data to another cpu ring buffer: sched_waking sched_wakeup sched_wakeup_new sched_stat_wait sched_stat_sleep sched_stat_iowait sched_stat_blocked The perf_tp_event function first store samples for current cpu related events defined for tracepoint: hlist_for_each_entry_rcu(event, head, hlist_entry) perf_swevent_event(event, count, &data, regs); And then iterates events of the 'task' and store the sample for any task's event that passes tracepoint checks: ctx = rcu_dereference(task->perf_event_ctxp[perf_sw_context]); list_for_each_entry_rcu(event, &ctx->event_list, event_entry) { if (event->attr.type != PERF_TYPE_TRACEPOINT) continue; if (event->attr.config != entry->type) continue; perf_swevent_event(event, count, &data, regs); } Above code can race with same code running on another cpu, ending up with 2 cpus trying to store under the same ring buffer, which is specifically not allowed. This patch prevents the problem, by allowing only events with the same current cpu to receive the event. NOTE: this requires the use of (per-task-)per-cpu buffers for this feature to work; perf-record does this. Signed-off-by: Jiri Olsa <jolsa@kernel.org> [peterz: small edits to Changelog] Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com> Cc: Andrew Vagin <avagin@openvz.org> Cc: Arnaldo Carvalho de Melo <acme@kernel.org> Cc: Arnaldo Carvalho de Melo <acme@redhat.com> Cc: Jiri Olsa <jolsa@redhat.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Namhyung Kim <namhyung@kernel.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Stephane Eranian <eranian@google.com> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Vince Weaver <vincent.weaver@maine.edu> Fixes: e6dab5ffab59 ("perf/trace: Add ability to set a target task for events") Link: http://lkml.kernel.org/r/20180923161343.GB15054@krava Signed-off-by: Ingo Molnar <mingo@kernel.org> --- kernel/events/core.c | 2 ++ 1 file changed, 2 insertions(+) diff --git a/kernel/events/core.c b/kernel/events/core.c index dfb1d951789e..5a97f34bc14c 100644 --- a/kernel/events/core.c +++ b/kernel/events/core.c @@ -8314,6 +8314,8 @@ void perf_tp_event(u16 event_type, u64 count, void *record, int entry_size, goto unlock; list_for_each_entry_rcu(event, &ctx->event_list, event_entry) { + if (event->cpu != smp_processor_id()) + continue; if (event->attr.type != PERF_TYPE_TRACEPOINT) continue; if (event->attr.config != entry->type) ^ permalink raw reply related [flat|nested] 9+ messages in thread
* Re: [PATCH] perf: Prevent recursion in ring buffer 2018-09-12 19:33 [PATCH] perf: Prevent recursion in ring buffer Jiri Olsa 2018-09-13 7:07 ` Peter Zijlstra @ 2018-09-13 7:40 ` Peter Zijlstra 2018-09-13 7:53 ` Jiri Olsa 1 sibling, 1 reply; 9+ messages in thread From: Peter Zijlstra @ 2018-09-13 7:40 UTC (permalink / raw) To: Jiri Olsa Cc: Arnaldo Carvalho de Melo, lkml, Ingo Molnar, Namhyung Kim, Alexander Shishkin, Andi Kleen On Wed, Sep 12, 2018 at 09:33:17PM +0200, Jiri Olsa wrote: > # perf record -e 'sched:sched_switch,sched:sched_wakeup' perf bench sched messaging > The reason for the corruptions are some of the scheduling tracepoints, > that have __perf_task dfined and thus allow to store data to another > cpu ring buffer: > > sched_waking > sched_wakeup > sched_wakeup_new > sched_stat_wait > sched_stat_sleep > sched_stat_iowait > sched_stat_blocked > And then iterates events of the 'task' and store the sample > for any task's event that passes tracepoint checks: > > ctx = rcu_dereference(task->perf_event_ctxp[perf_sw_context]); > > list_for_each_entry_rcu(event, &ctx->event_list, event_entry) { > if (event->attr.type != PERF_TYPE_TRACEPOINT) > continue; > if (event->attr.config != entry->type) > continue; > > perf_swevent_event(event, count, &data, regs); > } > > Above code can race with same code running on another cpu, > ending up with 2 cpus trying to store under the same ring > buffer, which is not handled at the moment. It can yes, however the only way I can see this breaking is if we use !inherited events with a strict per-task buffer, but your record command doesn't use that. Now, your test-case uses inherited events, which would all share the buffer, however IIRC inherited events require per-task-per-cpu buffers, because there is already no guarantee the various tasks run on the same CPU in the first place. This means we _should_ write to the @task's local CPU buffer, and that would work again. Let me try and figure out where this is going wrong. ^ permalink raw reply [flat|nested] 9+ messages in thread
* Re: [PATCH] perf: Prevent recursion in ring buffer 2018-09-13 7:40 ` [PATCH] perf: Prevent recursion in ring buffer Peter Zijlstra @ 2018-09-13 7:53 ` Jiri Olsa 0 siblings, 0 replies; 9+ messages in thread From: Jiri Olsa @ 2018-09-13 7:53 UTC (permalink / raw) To: Peter Zijlstra Cc: Jiri Olsa, Arnaldo Carvalho de Melo, lkml, Ingo Molnar, Namhyung Kim, Alexander Shishkin, Andi Kleen On Thu, Sep 13, 2018 at 09:40:42AM +0200, Peter Zijlstra wrote: > On Wed, Sep 12, 2018 at 09:33:17PM +0200, Jiri Olsa wrote: > > > # perf record -e 'sched:sched_switch,sched:sched_wakeup' perf bench sched messaging > > > The reason for the corruptions are some of the scheduling tracepoints, > > that have __perf_task dfined and thus allow to store data to another > > cpu ring buffer: > > > > sched_waking > > sched_wakeup > > sched_wakeup_new > > sched_stat_wait > > sched_stat_sleep > > sched_stat_iowait > > sched_stat_blocked > > > And then iterates events of the 'task' and store the sample > > for any task's event that passes tracepoint checks: > > > > ctx = rcu_dereference(task->perf_event_ctxp[perf_sw_context]); > > > > list_for_each_entry_rcu(event, &ctx->event_list, event_entry) { > > if (event->attr.type != PERF_TYPE_TRACEPOINT) > > continue; > > if (event->attr.config != entry->type) > > continue; > > > > perf_swevent_event(event, count, &data, regs); > > } > > > > Above code can race with same code running on another cpu, > > ending up with 2 cpus trying to store under the same ring > > buffer, which is not handled at the moment. > > It can yes, however the only way I can see this breaking is if we use > !inherited events with a strict per-task buffer, but your record command > doesn't use that. > > Now, your test-case uses inherited events, which would all share the > buffer, however IIRC inherited events require per-task-per-cpu buffers, that's what perf record always does when monitoring task.. there's an event/rb for each cpu and the given task and all events for the task (sched:*) on given cpu share that single cpu ring buffer via PERF_EVENT_IOC_SET_OUTPUT > because there is already no guarantee the various tasks run on the same > CPU in the first place. > > This means we _should_ write to the @task's local CPU buffer, and that > would work again. > > Let me try and figure out where this is going wrong. ^ permalink raw reply [flat|nested] 9+ messages in thread
end of thread, other threads:[~2018-10-02 10:02 UTC | newest] Thread overview: 9+ messages (download: mbox.gz / follow: Atom feed) -- links below jump to the message on this page -- 2018-09-12 19:33 [PATCH] perf: Prevent recursion in ring buffer Jiri Olsa 2018-09-13 7:07 ` Peter Zijlstra 2018-09-13 7:41 ` Jiri Olsa 2018-09-13 7:46 ` Jiri Olsa 2018-09-13 9:37 ` Peter Zijlstra 2018-09-23 16:13 ` [PATCHv2] perf: Prevent concurent ring buffer access Jiri Olsa 2018-10-02 10:01 ` [tip:perf/core] perf/ring_buffer: " tip-bot for Jiri Olsa 2018-09-13 7:40 ` [PATCH] perf: Prevent recursion in ring buffer Peter Zijlstra 2018-09-13 7:53 ` Jiri Olsa
This is an external index of several public inboxes, see mirroring instructions on how to clone and mirror all data and code used by this external index.