linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: Peter Zijlstra <peterz@infradead.org>
To: Steven Rostedt <rostedt@goodmis.org>
Cc: "Dmitry V. Levin" <ldv@altlinux.org>,
	Jiri Olsa <jolsa@kernel.org>,
	Arnaldo Carvalho de Melo <acme@kernel.org>,
	Ingo Molnar <mingo@kernel.org>,
	Namhyung Kim <namhyung@kernel.org>,
	Alexander Shishkin <alexander.shishkin@linux.intel.com>,
	Thomas Gleixner <tglx@linutronix.de>,
	"Luis Claudio R. Goncalves" <lclaudio@uudg.org>,
	Eugene Syromyatnikov <esyr@redhat.com>,
	Frederic Weisbecker <fweisbec@gmail.com>,
	lkml <linux-kernel@vger.kernel.org>
Subject: Re: [PATCH 1/8] perf: Allow to block process in syscall tracepoints
Date: Thu, 13 Dec 2018 11:01:49 +0100	[thread overview]
Message-ID: <20181213100149.GF5289@hirez.programming.kicks-ass.net> (raw)
In-Reply-To: <20181212202639.1978ec88@vmware.local.home>

On Wed, Dec 12, 2018 at 08:26:39PM -0500, Steven Rostedt wrote:
> On Thu, 13 Dec 2018 03:39:38 +0300
> "Dmitry V. Levin" <ldv@altlinux.org> wrote:
> 
> > btw, I didn't ask for the implementation to be ugly.
> > You don't have to introduce polling into the kernel if you don't want to,
> > userspace is perfectly capable of invoking wait4(2) in a loop.
> > Just block the tracee, notify the tracer, and let it pick up the pieces.
> 
> Note, there's been some discussion offlist to only have perf set a flag
> when it dropped an event and have the ptrace code do the heavy lifting
> of blocking the task and waking it back up. I think that would be a
> cleaner solution and wont muck with perf as badly.

It's still really horrid -- the question is not if we can come up with
something, anything, to make strace work. The question is if we can
extend something in a sane and maintainable manner to allow this.

So there's a whole bunch of problems I see with all this, in no
particular order:

 - we cannot block when writing to the actual buffer, and have to unroll
   the callstack and bolt on the blocking manualy in a few specific
   sites. This is ugly, inconsistent and maintenance heavy.

 - it only works for some 'magic' events that got the treatment, but not
   for many other you might expect it to work for with no real
   indication which and why.

 - the wakeups side is icky; the best I can come up with is making the
   data page R/O and single stepping on write fault, but that isn't
   multi-threading safe.

   Another alternative would be keeping the whole page R/O and
   using write(2) or an ioctl() to update the head pointer.

Again, if we're going to do this; it needs to be done well and
consistent and not as a special hack to enable strace-like
functionality. And without clean and sane solutions to the above I just
don't see it happening.

Note that the first 2 points are equally true for ftrace; so I don't see
how we could sanely add it there either.


One, very big maybe, would be to add a new tracepoint type that includes
a might_sleep() and we very carefully undo all the preempt_disable and
go sleep where we should. That also gives the tracepoint crud the
information it needs to publish the capability to userspace.

We also have to consider (and possibly forbid) mixing blocking and
!blocking events to the same buffer.

  parent reply	other threads:[~2018-12-13 10:02 UTC|newest]

Thread overview: 45+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2018-12-05 16:05 [RFC 1/8] perf: Block perf calls for system call tracepoints Jiri Olsa
2018-12-05 16:05 ` [PATCH 1/8] perf: Allow to block process in syscall tracepoints Jiri Olsa
2018-12-05 17:35   ` Steven Rostedt
2018-12-05 17:56     ` Jiri Olsa
2018-12-06  8:09   ` Peter Zijlstra
2018-12-06 10:30     ` Jiri Olsa
2018-12-06  8:10   ` Peter Zijlstra
2018-12-06  8:24     ` Jiri Olsa
2018-12-06 10:31       ` Peter Zijlstra
2018-12-06  8:34     ` Peter Zijlstra
2018-12-06 10:31       ` Jiri Olsa
2018-12-06 18:19       ` Steven Rostedt
2018-12-07  8:44         ` Jiri Olsa
2018-12-07  8:58         ` Peter Zijlstra
2018-12-07 13:41           ` Steven Rostedt
2018-12-07 15:11             ` Peter Zijlstra
2018-12-07 15:49               ` Arnaldo Carvalho de Melo
2018-12-08 10:41                 ` Peter Zijlstra
2018-12-08 17:34                   ` Steven Rostedt
2018-12-07 20:14               ` Steven Rostedt
2018-12-08 10:44                 ` Peter Zijlstra
2018-12-08 17:38                   ` Steven Rostedt
2018-12-10 10:18                     ` Peter Zijlstra
2018-12-13  0:39                       ` Dmitry V. Levin
2018-12-13  1:26                         ` Steven Rostedt
2018-12-13  1:49                           ` Dmitry V. Levin
2018-12-13 10:01                           ` Peter Zijlstra [this message]
2018-12-13 10:05                             ` Peter Zijlstra
2018-12-13 10:08                             ` Peter Zijlstra
2018-12-13 11:29                             ` Jiri Olsa
2018-12-06  8:17   ` Peter Zijlstra
2018-12-06 10:27     ` Jiri Olsa
2018-12-05 16:05 ` [PATCH 2/8] perf tools: Sync uapi perf_event.h Jiri Olsa
2018-12-05 16:05 ` [PATCH 3/8] perf record: Add --block option Jiri Olsa
2018-12-05 16:05 ` [PATCH 4/8] perf trace: " Jiri Olsa
2018-12-05 16:05 ` [PATCH 5/8] perf tools: Add block term support for tracepoints Jiri Olsa
2018-12-05 16:05 ` [PATCH 6/8] perf tools: Add ordered_events__flush_time interface Jiri Olsa
2018-12-14 21:00   ` [tip:perf/core] perf ordered_events: " tip-bot for Jiri Olsa
2018-12-18 14:27   ` tip-bot for Jiri Olsa
2018-12-05 16:05 ` [PATCH 7/8] perf trace: Move event delivery to deliver_event function Jiri Olsa
2018-12-14 21:01   ` [tip:perf/core] perf trace: Move event delivery to a new deliver_event() function tip-bot for Jiri Olsa
2018-12-18 14:28   ` tip-bot for Jiri Olsa
2018-12-05 16:05 ` [PATCH 8/8] perf trace: Add ordered processing for --block option Jiri Olsa
2018-12-14 21:02   ` [tip:perf/core] perf trace: Add ordered processing tip-bot for Jiri Olsa
2018-12-18 14:29   ` tip-bot for Jiri Olsa

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20181213100149.GF5289@hirez.programming.kicks-ass.net \
    --to=peterz@infradead.org \
    --cc=acme@kernel.org \
    --cc=alexander.shishkin@linux.intel.com \
    --cc=esyr@redhat.com \
    --cc=fweisbec@gmail.com \
    --cc=jolsa@kernel.org \
    --cc=lclaudio@uudg.org \
    --cc=ldv@altlinux.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=mingo@kernel.org \
    --cc=namhyung@kernel.org \
    --cc=rostedt@goodmis.org \
    --cc=tglx@linutronix.de \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).