From: Adrian Hunter <adrian.hunter@intel.com>
To: Peter Zijlstra <peterz@infradead.org>
Cc: Ingo Molnar <mingo@redhat.com>, Borislav Petkov <bp@alien8.de>,
"H . Peter Anvin" <hpa@zytor.com>,
x86@kernel.org, Mark Rutland <mark.rutland@arm.com>,
Alexander Shishkin <alexander.shishkin@linux.intel.com>,
Mathieu Poirier <mathieu.poirier@linaro.org>,
Leo Yan <leo.yan@linaro.org>,
Arnaldo Carvalho de Melo <acme@kernel.org>,
Jiri Olsa <jolsa@redhat.com>,
linux-kernel@vger.kernel.org
Subject: [PATCH V2 01/13] perf: Add perf text poke event
Date: Wed, 12 Feb 2020 14:49:37 +0200 [thread overview]
Message-ID: <20200212124949.3589-2-adrian.hunter@intel.com> (raw)
In-Reply-To: <20200212124949.3589-1-adrian.hunter@intel.com>
Record (single instruction) changes to the kernel text (i.e.
self-modifying code) in order to support tracers like Intel PT and
ARM CoreSight.
A copy of the running kernel code is needed as a reference point (e.g.
from /proc/kcore). The text poke event records the old bytes and the
new bytes so that the event can be processed forwards or backwards.
The basic problem is recording the modified instruction in an
unambiguous manner given SMP instruction cache (in)coherence. That is,
when modifying an instruction concurrently any solution with one or
multiple timestamps is not sufficient:
CPU0 CPU1
0
1 write insn A
2 execute insn A
3 sync-I$
4
Due to I$, CPU1 might execute either the old or new A. No matter where
we record tracepoints on CPU0, one simply cannot tell what CPU1 will
have observed, except that at 0 it must be the old one and at 4 it
must be the new one.
To solve this, take inspiration from x86 text poking, which has to
solve this exact problem due to variable length instruction encoding
and I-fetch windows.
1) overwrite the instruction with a breakpoint and sync I$
This guarantees that that code flow will never hit the target
instruction anymore, on any CPU (or rather, it will cause an
exception).
2) issue the TEXT_POKE event
3) overwrite the breakpoint with the new instruction and sync I$
Now we know that any execution after the TEXT_POKE event will either
observe the breakpoint (and hit the exception) or the new instruction.
So by guarding the TEXT_POKE event with an exception on either side;
we can now tell, without doubt, which instruction another CPU will
have observed.
Signed-off-by: Adrian Hunter <adrian.hunter@intel.com>
---
include/linux/perf_event.h | 8 +++
include/uapi/linux/perf_event.h | 21 +++++++-
kernel/events/core.c | 90 ++++++++++++++++++++++++++++++++-
3 files changed, 117 insertions(+), 2 deletions(-)
diff --git a/include/linux/perf_event.h b/include/linux/perf_event.h
index 6d4c22aee384..e2b110e418fa 100644
--- a/include/linux/perf_event.h
+++ b/include/linux/perf_event.h
@@ -1212,6 +1212,9 @@ extern void perf_event_exec(void);
extern void perf_event_comm(struct task_struct *tsk, bool exec);
extern void perf_event_namespaces(struct task_struct *tsk);
extern void perf_event_fork(struct task_struct *tsk);
+extern void perf_event_text_poke(const void *addr,
+ const void *old_bytes, size_t old_len,
+ const void *new_bytes, size_t new_len);
/* Callchains */
DECLARE_PER_CPU(struct perf_callchain_entry, perf_callchain_entry);
@@ -1462,6 +1465,11 @@ static inline void perf_event_exec(void) { }
static inline void perf_event_comm(struct task_struct *tsk, bool exec) { }
static inline void perf_event_namespaces(struct task_struct *tsk) { }
static inline void perf_event_fork(struct task_struct *tsk) { }
+static inline void perf_event_text_poke(const void *addr,
+ const void *old_bytes,
+ size_t old_len,
+ const void *new_bytes,
+ size_t new_len) { }
static inline void perf_event_init(void) { }
static inline int perf_swevent_get_recursion_context(void) { return -1; }
static inline void perf_swevent_put_recursion_context(int rctx) { }
diff --git a/include/uapi/linux/perf_event.h b/include/uapi/linux/perf_event.h
index 377d794d3105..bae9e9d2d897 100644
--- a/include/uapi/linux/perf_event.h
+++ b/include/uapi/linux/perf_event.h
@@ -377,7 +377,8 @@ struct perf_event_attr {
ksymbol : 1, /* include ksymbol events */
bpf_event : 1, /* include bpf events */
aux_output : 1, /* generate AUX records instead of events */
- __reserved_1 : 32;
+ text_poke : 1, /* include text poke events */
+ __reserved_1 : 31;
union {
__u32 wakeup_events; /* wakeup every n events */
@@ -1006,6 +1007,24 @@ enum perf_event_type {
*/
PERF_RECORD_BPF_EVENT = 18,
+ /*
+ * Records changes to kernel text i.e. self-modified code. 'old_len' is
+ * the number of old bytes, 'new_len' is the number of new bytes. Either
+ * 'old_len' or 'new_len' may be zero to indicate, for example, the
+ * addition or removal of a trampoline. 'bytes' contains the old bytes
+ * followed immediately by the new bytes.
+ *
+ * struct {
+ * struct perf_event_header header;
+ * u64 addr;
+ * u16 old_len;
+ * u16 new_len;
+ * u8 bytes[];
+ * struct sample_id sample_id;
+ * };
+ */
+ PERF_RECORD_TEXT_POKE = 19,
+
PERF_RECORD_MAX, /* non-ABI */
};
diff --git a/kernel/events/core.c b/kernel/events/core.c
index 2173c23c25b4..47fca7a4ac7a 100644
--- a/kernel/events/core.c
+++ b/kernel/events/core.c
@@ -386,6 +386,7 @@ static atomic_t nr_freq_events __read_mostly;
static atomic_t nr_switch_events __read_mostly;
static atomic_t nr_ksymbol_events __read_mostly;
static atomic_t nr_bpf_events __read_mostly;
+static atomic_t nr_text_poke_events __read_mostly;
static LIST_HEAD(pmus);
static DEFINE_MUTEX(pmus_lock);
@@ -4397,7 +4398,7 @@ static bool is_sb_event(struct perf_event *event)
if (attr->mmap || attr->mmap_data || attr->mmap2 ||
attr->comm || attr->comm_exec ||
attr->task || attr->ksymbol ||
- attr->context_switch ||
+ attr->context_switch || attr->text_poke ||
attr->bpf_event)
return true;
return false;
@@ -4471,6 +4472,8 @@ static void unaccount_event(struct perf_event *event)
atomic_dec(&nr_ksymbol_events);
if (event->attr.bpf_event)
atomic_dec(&nr_bpf_events);
+ if (event->attr.text_poke)
+ atomic_dec(&nr_text_poke_events);
if (dec) {
if (!atomic_add_unless(&perf_sched_count, -1, 1))
@@ -8309,6 +8312,89 @@ void perf_event_bpf_event(struct bpf_prog *prog,
perf_iterate_sb(perf_event_bpf_output, &bpf_event, NULL);
}
+struct perf_text_poke_event {
+ const void *old_bytes;
+ const void *new_bytes;
+ size_t pad;
+ u16 old_len;
+ u16 new_len;
+
+ struct {
+ struct perf_event_header header;
+
+ u64 addr;
+ } event_id;
+};
+
+static int perf_event_text_poke_match(struct perf_event *event)
+{
+ return event->attr.text_poke;
+}
+
+static void perf_event_text_poke_output(struct perf_event *event, void *data)
+{
+ struct perf_text_poke_event *text_poke_event = data;
+ struct perf_output_handle handle;
+ struct perf_sample_data sample;
+ u64 padding = 0;
+ int ret;
+
+ if (!perf_event_text_poke_match(event))
+ return;
+
+ perf_event_header__init_id(&text_poke_event->event_id.header, &sample, event);
+
+ ret = perf_output_begin(&handle, event, text_poke_event->event_id.header.size);
+ if (ret)
+ return;
+
+ perf_output_put(&handle, text_poke_event->event_id);
+ perf_output_put(&handle, text_poke_event->old_len);
+ perf_output_put(&handle, text_poke_event->new_len);
+
+ __output_copy(&handle, text_poke_event->old_bytes, text_poke_event->old_len);
+ __output_copy(&handle, text_poke_event->new_bytes, text_poke_event->new_len);
+
+ if (text_poke_event->pad)
+ __output_copy(&handle, &padding, text_poke_event->pad);
+
+ perf_event__output_id_sample(event, &handle, &sample);
+
+ perf_output_end(&handle);
+}
+
+void perf_event_text_poke(const void *addr, const void *old_bytes,
+ size_t old_len, const void *new_bytes, size_t new_len)
+{
+ struct perf_text_poke_event text_poke_event;
+ size_t tot, pad;
+
+ if (!atomic_read(&nr_text_poke_events))
+ return;
+
+ tot = sizeof(text_poke_event.old_len) + old_len;
+ tot += sizeof(text_poke_event.new_len) + new_len;
+ pad = ALIGN(tot, sizeof(u64)) - tot;
+
+ text_poke_event = (struct perf_text_poke_event){
+ .old_bytes = old_bytes,
+ .new_bytes = new_bytes,
+ .pad = pad,
+ .old_len = old_len,
+ .new_len = new_len,
+ .event_id = {
+ .header = {
+ .type = PERF_RECORD_TEXT_POKE,
+ .misc = PERF_RECORD_MISC_KERNEL,
+ .size = sizeof(text_poke_event.event_id) + tot + pad,
+ },
+ .addr = (u64)addr,
+ },
+ };
+
+ perf_iterate_sb(perf_event_text_poke_output, &text_poke_event, NULL);
+}
+
void perf_event_itrace_started(struct perf_event *event)
{
event->attach_state |= PERF_ATTACH_ITRACE;
@@ -10623,6 +10709,8 @@ static void account_event(struct perf_event *event)
atomic_inc(&nr_ksymbol_events);
if (event->attr.bpf_event)
atomic_inc(&nr_bpf_events);
+ if (event->attr.text_poke)
+ atomic_inc(&nr_text_poke_events);
if (inc) {
/*
--
2.17.1
next prev parent reply other threads:[~2020-02-12 12:50 UTC|newest]
Thread overview: 20+ messages / expand[flat|nested] mbox.gz Atom feed top
2020-02-12 12:49 [PATCH V2 00/13] perf/x86: Add perf text poke events Adrian Hunter
2020-02-12 12:49 ` Adrian Hunter [this message]
2020-02-12 12:49 ` [PATCH V2 02/13] perf/x86: Add support for perf text poke event for text_poke_bp_batch() callers Adrian Hunter
2020-02-12 12:49 ` [PATCH V2 03/13] kprobes: Add symbols for kprobe insn pages Adrian Hunter
2020-02-26 15:21 ` Jiri Olsa
2020-02-28 14:02 ` Adrian Hunter
2020-02-26 15:22 ` Jiri Olsa
2020-02-27 7:13 ` Adrian Hunter
2020-02-12 12:49 ` [PATCH V2 04/13] kprobes: Add perf ksymbol events " Adrian Hunter
2020-02-12 12:49 ` [PATCH V2 05/13] perf/x86: Add perf text poke events for kprobes Adrian Hunter
2020-02-12 12:49 ` [PATCH V2 06/13] ftrace: Add symbols for ftrace trampolines Adrian Hunter
2020-02-12 12:49 ` [PATCH V2 07/13] ftrace: Add perf ksymbol events " Adrian Hunter
2020-02-12 12:49 ` [PATCH V2 08/13] ftrace: Add perf text poke " Adrian Hunter
2020-02-12 12:49 ` [PATCH V2 09/13] perf kcore_copy: Fix module map when there are no modules loaded Adrian Hunter
2020-02-12 12:49 ` [PATCH V2 10/13] perf evlist: Disable 'immediate' events last Adrian Hunter
2020-02-12 12:49 ` [PATCH V2 11/13] perf tools: Add support for PERF_RECORD_TEXT_POKE Adrian Hunter
2020-02-12 12:49 ` [PATCH V2 12/13] perf tools: Add support for PERF_RECORD_KSYMBOL_TYPE_OOL Adrian Hunter
2020-02-12 12:49 ` [PATCH V2 13/13] perf intel-pt: Add support for text poke events Adrian Hunter
2020-02-24 13:50 ` [PATCH V2 00/13] perf/x86: Add perf " Adrian Hunter
2020-02-24 14:03 ` Jiri Olsa
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20200212124949.3589-2-adrian.hunter@intel.com \
--to=adrian.hunter@intel.com \
--cc=acme@kernel.org \
--cc=alexander.shishkin@linux.intel.com \
--cc=bp@alien8.de \
--cc=hpa@zytor.com \
--cc=jolsa@redhat.com \
--cc=leo.yan@linaro.org \
--cc=linux-kernel@vger.kernel.org \
--cc=mark.rutland@arm.com \
--cc=mathieu.poirier@linaro.org \
--cc=mingo@redhat.com \
--cc=peterz@infradead.org \
--cc=x86@kernel.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).