From: Jin Yao <yao.jin@linux.intel.com>
To: peterz@infradead.org, mingo@redhat.com, oleg@redhat.com,
acme@kernel.org, jolsa@kernel.org
Cc: Linux-kernel@vger.kernel.org, ak@linux.intel.com,
kan.liang@intel.com, yao.jin@intel.com,
alexander.shishkin@linux.intel.com, mark.rutland@arm.com,
Jin Yao <yao.jin@linux.intel.com>
Subject: [PATCH v1 2/2] perf/core: Fake regs for leaked kernel samples
Date: Fri, 31 Jul 2020 10:56:17 +0800 [thread overview]
Message-ID: <20200731025617.16243-2-yao.jin@linux.intel.com> (raw)
In-Reply-To: <20200731025617.16243-1-yao.jin@linux.intel.com>
When doing sampling, for example,
perf record -e cycles:u ...
On workloads that do a lot of kernel entry/exits we see kernel
samples, even though :u is specified. This is due to skid.
This is a potential security issue because it may leak kernel
address even though kernel sampling is disabled.
The commit cc1582c231ea ("perf/core: Drop kernel samples even
though :u is specified") dropped the leaked kernel samples but it
broke rr-project.
Another idea is (inspired by Mark Rutland's original patch), it
doesn't lose the samples, it keeps the samples but fakes the regs
by using the user regs of current task. If the user regs is not
available, uses the instruction_pointer_set to set -1L to ip address
of regs.
For CALLCHAIN, the get_perf_callchain has checked user_mode(regs)
and use task_pt_regs(current) instead in some cases. So actually it
has considered the leaking possibility.
For REGS_USER and STACK_USER, it's similar. The perf_sample_regs_user
has also checked the user_mode(regs). It calls perf_get_regs_user()
for kthread. So we don't need to use "regs_fake" there.
Example:
perf record -e cycles:u ./div
perf report --stdio
Before:
# Overhead Command Shared Object Symbol
# ........ ....... ................ ........................
#
41.08% div div [.] main
21.04% div libc-2.27.so [.] __random_r
19.90% div libc-2.27.so [.] __random
9.86% div div [.] compute_flag
4.24% div libc-2.27.so [.] rand
3.88% div div [.] rand@plt
0.01% div [unknown] [k] 0xffffffffb9601c20
0.00% div libc-2.27.so [.] _dl_addr
0.00% div ld-2.27.so [.] __GI___tunables_init
0.00% div [unknown] [k] 0xffffffffb9601210
0.00% div ld-2.27.so [.] _start
"0xffffffffb9601c20, 0xffffffffb9601210" are leaked kernel addresses.
After:
# Overhead Command Shared Object Symbol
# ........ ....... ............. ........................
#
40.86% div div [.] main
20.67% div libc-2.27.so [.] __random_r
20.54% div libc-2.27.so [.] __random
9.65% div div [.] compute_flag
4.32% div libc-2.27.so [.] rand
3.97% div div [.] rand@plt
0.00% div ld-2.27.so [.] __GI___tunables_init
0.00% div ld-2.27.so [.] _start
Now there is no kernel address leaked.
Inspired-by: Mark Rutland <mark.rutland@arm.com>
Signed-off-by: Jin Yao <yao.jin@linux.intel.com>
---
kernel/events/core.c | 48 +++++++++++++++++++++++++++++++++++++++++---
1 file changed, 45 insertions(+), 3 deletions(-)
diff --git a/kernel/events/core.c b/kernel/events/core.c
index 7c436d705fbd..52f6d7f0b86b 100644
--- a/kernel/events/core.c
+++ b/kernel/events/core.c
@@ -6973,7 +6973,8 @@ static struct perf_callchain_entry __empty_callchain = { .nr = 0, };
struct perf_callchain_entry *
perf_callchain(struct perf_event *event, struct pt_regs *regs)
{
- bool kernel = !event->attr.exclude_callchain_kernel;
+ bool kernel = !event->attr.exclude_callchain_kernel &&
+ !event->attr.exclude_kernel;
bool user = !event->attr.exclude_callchain_user;
/* Disallow cross-task user callchains. */
bool crosstask = event->ctx->task && event->ctx->task != current;
@@ -6988,12 +6989,39 @@ perf_callchain(struct perf_event *event, struct pt_regs *regs)
return callchain ?: &__empty_callchain;
}
+static struct pt_regs *leak_check(struct perf_event_header *header,
+ struct perf_event *event,
+ struct pt_regs *regs)
+{
+ struct pt_regs *regs_fake = NULL;
+
+ if (event->attr.exclude_kernel && !user_mode(regs)) {
+ if (!(current->flags & PF_KTHREAD)) {
+ regs_fake = task_pt_regs(current);
+ if (!user_mode(regs_fake)) {
+ regs_fake = NULL;
+ instruction_pointer_set(regs, -1L);
+ }
+ } else
+ instruction_pointer_set(regs, -1L);
+
+ if ((header->misc & PERF_RECORD_MISC_CPUMODE_MASK) ==
+ PERF_RECORD_MISC_KERNEL) {
+ header->misc &= ~PERF_RECORD_MISC_CPUMODE_MASK;
+ header->misc |= PERF_RECORD_MISC_USER;
+ }
+ }
+
+ return regs_fake;
+}
+
void perf_prepare_sample(struct perf_event_header *header,
struct perf_sample_data *data,
struct perf_event *event,
struct pt_regs *regs)
{
u64 sample_type = event->attr.sample_type;
+ struct pt_regs *regs_fake;
header->type = PERF_RECORD_SAMPLE;
header->size = sizeof(*header) + event->header_size;
@@ -7003,8 +7031,19 @@ void perf_prepare_sample(struct perf_event_header *header,
__perf_event_header__init_id(header, data, event);
+ /*
+ * Due to interrupt latency (AKA "skid"), we may enter the
+ * kernel before taking an overflow, even if the PMU is only
+ * counting user events. To avoid leaking kernel address to
+ * userspace, we try to fake the regs by using the user regs
+ * of current task.
+ */
+ regs_fake = leak_check(header, event, regs);
+
if (sample_type & PERF_SAMPLE_IP)
- data->ip = perf_instruction_pointer(regs);
+ data->ip = (regs_fake) ?
+ perf_instruction_pointer(regs_fake) :
+ perf_instruction_pointer(regs);
if (sample_type & PERF_SAMPLE_CALLCHAIN) {
int size = 1;
@@ -7099,7 +7138,10 @@ void perf_prepare_sample(struct perf_event_header *header,
/* regs dump ABI info */
int size = sizeof(u64);
- perf_sample_regs_intr(&data->regs_intr, regs);
+ if (regs_fake)
+ perf_sample_regs_intr(&data->regs_intr, regs_fake);
+ else
+ perf_sample_regs_intr(&data->regs_intr, regs);
if (data->regs_intr.regs) {
u64 mask = event->attr.sample_regs_intr;
--
2.17.1
next prev parent reply other threads:[~2020-07-31 2:57 UTC|newest]
Thread overview: 24+ messages / expand[flat|nested] mbox.gz Atom feed top
2020-07-31 2:56 [PATCH v1 1/2] Missing instruction_pointer_set() instances Jin Yao
2020-07-31 2:56 ` Jin Yao [this message]
2020-08-04 11:49 ` [PATCH v1 2/2] perf/core: Fake regs for leaked kernel samples peterz
2020-08-05 2:15 ` Jin, Yao
2020-08-05 12:44 ` peterz
2020-08-05 12:57 ` peterz
2020-08-06 2:26 ` Jin, Yao
2020-08-06 9:18 ` peterz
2020-08-06 9:24 ` peterz
2020-08-07 5:32 ` Jin, Yao
2020-08-06 11:00 ` peterz
2020-08-07 6:24 ` Jin, Yao
2020-08-07 9:02 ` peterz
2020-08-10 2:03 ` Jin, Yao
2020-08-07 5:23 ` Jin, Yao
2020-08-11 7:50 ` Jin, Yao
2020-08-11 7:59 ` Peter Zijlstra
2020-08-11 8:31 ` Jin, Yao
2020-08-11 8:45 ` Peter Zijlstra
2020-08-12 3:52 ` Jin, Yao
2020-08-12 7:25 ` Like Xu
2020-08-04 11:31 ` [PATCH v1 1/2] Missing instruction_pointer_set() instances peterz
2020-08-05 0:26 ` Jin, Yao
2020-08-04 21:31 ` Max Filippov
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20200731025617.16243-2-yao.jin@linux.intel.com \
--to=yao.jin@linux.intel.com \
--cc=Linux-kernel@vger.kernel.org \
--cc=acme@kernel.org \
--cc=ak@linux.intel.com \
--cc=alexander.shishkin@linux.intel.com \
--cc=jolsa@kernel.org \
--cc=kan.liang@intel.com \
--cc=mark.rutland@arm.com \
--cc=mingo@redhat.com \
--cc=oleg@redhat.com \
--cc=peterz@infradead.org \
--cc=yao.jin@intel.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).