linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: Peter Zijlstra <peterz@infradead.org>
To: Marco Elver <elver@google.com>
Cc: Ingo Molnar <mingo@redhat.com>,
	Arnaldo Carvalho de Melo <acme@kernel.org>,
	Mark Rutland <mark.rutland@arm.com>,
	Alexander Shishkin <alexander.shishkin@linux.intel.com>,
	Jiri Olsa <jolsa@kernel.org>, Namhyung Kim <namhyung@kernel.org>,
	linux-perf-users@vger.kernel.org, linux-kernel@vger.kernel.org,
	kasan-dev@googlegroups.com, Dmitry Vyukov <dvyukov@google.com>
Subject: Re: [PATCH v2] perf: Fix missing SIGTRAPs
Date: Tue, 11 Oct 2022 15:06:52 +0200	[thread overview]
Message-ID: <Y0VqbNDKIHUcC7Ha@hirez.programming.kicks-ass.net> (raw)
In-Reply-To: <Y0VofNVMBXPOJJr7@elver.google.com>

On Tue, Oct 11, 2022 at 02:58:36PM +0200, Marco Elver wrote:
> On Tue, Oct 11, 2022 at 09:44AM +0200, Peter Zijlstra wrote:
> > Subject: perf: Fix missing SIGTRAPs
> > From: Peter Zijlstra <peterz@infradead.org>
> > Date: Thu Oct 6 15:00:39 CEST 2022
> > 
> > Marco reported:
> > 
> > Due to the implementation of how SIGTRAP are delivered if
> > perf_event_attr::sigtrap is set, we've noticed 3 issues:
> > 
> >   1. Missing SIGTRAP due to a race with event_sched_out() (more
> >      details below).
> > 
> >   2. Hardware PMU events being disabled due to returning 1 from
> >      perf_event_overflow(). The only way to re-enable the event is
> >      for user space to first "properly" disable the event and then
> >      re-enable it.
> > 
> >   3. The inability to automatically disable an event after a
> >      specified number of overflows via PERF_EVENT_IOC_REFRESH.
> > 
> > The worst of the 3 issues is problem (1), which occurs when a
> > pending_disable is "consumed" by a racing event_sched_out(), observed
> > as follows:
> > 
> > 		CPU0			|	CPU1
> > 	--------------------------------+---------------------------
> > 	__perf_event_overflow()		|
> > 	 perf_event_disable_inatomic()	|
> > 	  pending_disable = CPU0	| ...
> > 					| _perf_event_enable()
> > 					|  event_function_call()
> > 					|   task_function_call()
> > 					|    /* sends IPI to CPU0 */
> > 	<IPI>				| ...
> > 	 __perf_event_enable()		+---------------------------
> > 	  ctx_resched()
> > 	   task_ctx_sched_out()
> > 	    ctx_sched_out()
> > 	     group_sched_out()
> > 	      event_sched_out()
> > 	       pending_disable = -1
> > 	</IPI>
> > 	<IRQ-work>
> > 	 perf_pending_event()
> > 	  perf_pending_event_disable()
> > 	   /* Fails to send SIGTRAP because no pending_disable! */
> > 	</IRQ-work>
> > 
> > In the above case, not only is that particular SIGTRAP missed, but also
> > all future SIGTRAPs because 'event_limit' is not reset back to 1.
> > 
> > To fix, rework pending delivery of SIGTRAP via IRQ-work by introduction
> > of a separate 'pending_sigtrap', no longer using 'event_limit' and
> > 'pending_disable' for its delivery.
> > 
> > Additionally; and different to Marco's proposed patch:
> > 
> >  - recognise that pending_disable effectively duplicates oncpu for
> >    the case where it is set. As such, change the irq_work handler to
> >    use ->oncpu to target the event and use pending_* as boolean toggles.
> > 
> >  - observe that SIGTRAP targets the ctx->task, so the context switch
> >    optimization that carries contexts between tasks is invalid. If
> >    the irq_work were delayed enough to hit after a context switch the
> >    SIGTRAP would be delivered to the wrong task.
> > 
> >  - observe that if the event gets scheduled out
> >    (rotation/migration/context-switch/...) the irq-work would be
> >    insufficient to deliver the SIGTRAP when the event gets scheduled
> >    back in (the irq-work might still be pending on the old CPU).
> > 
> >    Therefore have event_sched_out() convert the pending sigtrap into a
> >    task_work which will deliver the signal at return_to_user.
> > 
> > Fixes: 97ba62b27867 ("perf: Add support for SIGTRAP on perf events")
> > Reported-by: Marco Elver <elver@google.com>
> > Debugged-by: Marco Elver <elver@google.com>
> 
> Reviewed-by: Marco Elver <elver@google.com>
> Tested-by: Marco Elver <elver@google.com>
> 
> .. fuzzing, and lots of concurrent sigtrap_threads with this patch:
> 
> 	https://lore.kernel.org/all/20221011124534.84907-1-elver@google.com/
> 
> > Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
> 
> My original patch also attributed Dmitry:
> 
> 	Reported-by: Dmitry Vyukov <dvyukov@google.com>
> 	Debugged-by: Dmitry Vyukov <dvyukov@google.com>
> 
> ... we all melted our brains on this one. :-)
> 
> Would be good to get the fix into one of the upcoming 6.1-rc.

Updated and yes, I'm planning on queueing this in perf/urgent the moment
-rc1 happens.

Thanks!

      reply	other threads:[~2022-10-11 13:07 UTC|newest]

Thread overview: 27+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2022-09-27 12:13 [PATCH] perf: Fix missing SIGTRAPs due to pending_disable abuse Marco Elver
2022-09-27 12:30 ` Marco Elver
2022-09-27 18:20 ` Peter Zijlstra
2022-09-27 21:45   ` Marco Elver
2022-09-28 10:06     ` Marco Elver
2022-09-28 14:55       ` Marco Elver
2022-10-04 17:09         ` Peter Zijlstra
2022-10-04 17:21           ` Peter Zijlstra
2022-10-04 17:33           ` Marco Elver
2022-10-05  7:37             ` Peter Zijlstra
2022-10-05  7:49               ` Marco Elver
2022-10-05  8:23               ` Peter Zijlstra
2022-10-06 13:33 ` [PATCH] perf: Fix missing SIGTRAPs Peter Zijlstra
2022-10-06 13:59   ` Marco Elver
2022-10-06 16:02     ` Peter Zijlstra
2022-10-07  9:37       ` Marco Elver
2022-10-07 13:09         ` Peter Zijlstra
2022-10-07 13:58           ` Marco Elver
2022-10-07 16:14             ` Marco Elver
2022-10-08  8:41               ` Marco Elver
2022-10-08 12:41                 ` Peter Zijlstra
2022-10-10 20:52                   ` Marco Elver
2022-10-08 13:51             ` Peter Zijlstra
2022-10-08 14:08               ` Peter Zijlstra
2022-10-11  7:44   ` [PATCH v2] " Peter Zijlstra
2022-10-11 12:58     ` Marco Elver
2022-10-11 13:06       ` Peter Zijlstra [this message]

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=Y0VqbNDKIHUcC7Ha@hirez.programming.kicks-ass.net \
    --to=peterz@infradead.org \
    --cc=acme@kernel.org \
    --cc=alexander.shishkin@linux.intel.com \
    --cc=dvyukov@google.com \
    --cc=elver@google.com \
    --cc=jolsa@kernel.org \
    --cc=kasan-dev@googlegroups.com \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-perf-users@vger.kernel.org \
    --cc=mark.rutland@arm.com \
    --cc=mingo@redhat.com \
    --cc=namhyung@kernel.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).