linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: Peter Zijlstra <peterz@infradead.org>
To: Will Deacon <will@kernel.org>
Cc: Leo Yan <leo.yan@linaro.org>, Mark Rutland <mark.rutland@arm.com>,
	Mike Leach <mike.leach@linaro.org>,
	Adrian Hunter <adrian.hunter@intel.com>,
	Ingo Molnar <mingo@redhat.com>, Borislav Petkov <bp@alien8.de>,
	"H . Peter Anvin" <hpa@zytor.com>,
	x86@kernel.org,
	Alexander Shishkin <alexander.shishkin@linux.intel.com>,
	Mathieu Poirier <mathieu.poirier@linaro.org>,
	Arnaldo Carvalho de Melo <acme@kernel.org>,
	Jiri Olsa <jolsa@redhat.com>,
	linux-kernel@vger.kernel.org
Subject: Re: [PATCH RFC 1/6] perf/x86: Add perf text poke event
Date: Mon, 11 Nov 2019 17:05:05 +0100	[thread overview]
Message-ID: <20191111160505.GZ4097@hirez.programming.kicks-ass.net> (raw)
In-Reply-To: <20191111153925.GC10488@willie-the-truck>

On Mon, Nov 11, 2019 at 03:39:25PM +0000, Will Deacon wrote:

> Backing up though, I think I'm missing details about what this thread is
> trying to achieve. You're adding perf events so that coresight trace can
> take into account modifications of the kernel text, right?

Yes, because ARM-CS / Intel-PT need to know the exact text at any one
time in order to correctly decode their traces.

> If so:
>
>   * Does this need to take into account more than just jump_label()?

jump_label seems to be the initial target Adrian set, but yes, it needs
to cover 'everything'.

That is, all single instruction patching crud around:

 - optimized kprobes
   (regular kprobes are exempt because they rely on exceptions)
 - jump_labels
 - static_call (still pending but still)
 - ftrace fentry

We also need a solution for whole new chunks of text:

 - modules
 - ftrace trampolines
 - optprobe trampolines
 - JIT stuff

but that is so far not included; I had some ideas about a /dev/mem based
interface that would report new ranges and wait for acks (from open
file-desc) before freeing them.

>   * What consistency guarantees are needed by userspace? It's not clear to
>     me how it correlates the new event with execution on other CPUs. Is this
>     using timestamps or something else?

Something else :-)

Both CS/PT basically have a bit-stream encoding of taken / not-taken
decisions. To decode they read the text until a conditional
branch-point, then consume one bit from the stream and continue.

(note how unconditional branches -- jump_labels -- are expected to be
correct in this scheme)

This means they need an exact representation of the text to be able to
accurately decode.

This means timestamps or PRE/POST modify events are not acceptible.
Because until the I-FLUSH has completed we do not know which CPU has
which version of the instruction.

Instead we rely on exceptions; exceptions are differently encoded in the
CS/PT data streams.

The scheme used is:

 - overwrite target instruction with an exception (INT3 on x86, BRK on arm)
 - sync (IPI broadcast CPUID or I-FLUSH completion)

at this point we know the instruction _will_ trap and CS/PT can observe
this alternate flow. That is, the exception handler will emulate the
instruction.

 - emit the TEXT_POKE event with both the old and new instruction
   included

 - overwrite the target instruction with the new instruction
 - sync

at this point the new instruction should be valid.

Using this scheme we can at all times follow the instruction flow.
Either it is an exception and the exception encoding helps us navigate,
or, on either size, we'll know the old/new instruction.

>   * What about module loading?

I mentioned that above; that is still pending.

> Finally, the whole point of the '_nosync()' stuff is that we don't have
> to synchronise. Therefore, there is a window where you really can't tell
> whether the instruction has been updated from the perspective of another
> CPU.

Right, and much of that is preserved with the above scheme, nowhere do
you have to wait/disturb other CPUs, except for the I-FLUSH completion.

Does this help?

  reply	other threads:[~2019-11-11 16:05 UTC|newest]

Thread overview: 34+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2019-10-25 12:59 [PATCH RFC 0/6] perf/x86: Add perf text poke event Adrian Hunter
2019-10-25 12:59 ` [PATCH RFC 1/6] " Adrian Hunter
2019-10-30 10:47   ` Leo Yan
2019-10-30 12:46     ` Peter Zijlstra
2019-10-30 14:19       ` Leo Yan
2019-10-30 15:00         ` Mike Leach
2019-10-30 16:23         ` Peter Zijlstra
2019-10-31  7:31           ` Leo Yan
2019-11-01 10:04             ` Peter Zijlstra
2019-11-01 10:09               ` Peter Zijlstra
2019-11-04  2:23               ` Leo Yan
2019-11-08 15:05                 ` Leo Yan
2019-11-11 14:46                   ` Peter Zijlstra
2019-11-11 15:39                     ` Will Deacon
2019-11-11 16:05                       ` Peter Zijlstra [this message]
2019-11-11 17:29                         ` Will Deacon
2019-11-11 20:32                           ` Peter Zijlstra
     [not found]             ` <CAJ9a7VgZH7g=rFDpKf=FzEcyBVLS_WjqbrqtRnjOi7WOY4st+w@mail.gmail.com>
2019-11-01 10:06               ` Peter Zijlstra
2019-11-04 10:40   ` Peter Zijlstra
2019-11-04 12:32     ` Adrian Hunter
2019-10-25 12:59 ` [PATCH RFC 2/6] perf dso: Refactor dso_cache__read() Adrian Hunter
2019-10-25 14:54   ` Arnaldo Carvalho de Melo
2019-10-28 15:39   ` Jiri Olsa
2019-10-29  9:19     ` Adrian Hunter
2019-11-12 11:18   ` [tip: perf/core] " tip-bot2 for Adrian Hunter
2019-10-25 12:59 ` [PATCH RFC 3/6] perf dso: Add dso__data_write_cache_addr() Adrian Hunter
2019-10-28 15:45   ` Jiri Olsa
2019-10-29  9:20     ` Adrian Hunter
2019-11-12 11:18   ` [tip: perf/core] " tip-bot2 for Adrian Hunter
2019-10-25 12:59 ` [PATCH RFC 4/6] perf tools: Add support for PERF_RECORD_TEXT_POKE Adrian Hunter
2019-10-25 12:59 ` [PATCH RFC 5/6] perf auxtrace: Add auxtrace_cache__remove() Adrian Hunter
2019-10-25 14:48   ` Arnaldo Carvalho de Melo
2019-11-12 11:18   ` [tip: perf/core] " tip-bot2 for Adrian Hunter
2019-10-25 13:00 ` [PATCH RFC 6/6] perf intel-pt: Add support for text poke events Adrian Hunter

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20191111160505.GZ4097@hirez.programming.kicks-ass.net \
    --to=peterz@infradead.org \
    --cc=acme@kernel.org \
    --cc=adrian.hunter@intel.com \
    --cc=alexander.shishkin@linux.intel.com \
    --cc=bp@alien8.de \
    --cc=hpa@zytor.com \
    --cc=jolsa@redhat.com \
    --cc=leo.yan@linaro.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=mark.rutland@arm.com \
    --cc=mathieu.poirier@linaro.org \
    --cc=mike.leach@linaro.org \
    --cc=mingo@redhat.com \
    --cc=will@kernel.org \
    --cc=x86@kernel.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).