linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: Jin Yao <yao.jin@linux.intel.com>
To: peterz@infradead.org, mingo@redhat.com, oleg@redhat.com,
	acme@kernel.org, jolsa@kernel.org
Cc: Linux-kernel@vger.kernel.org, ak@linux.intel.com,
	kan.liang@intel.com, yao.jin@intel.com,
	alexander.shishkin@linux.intel.com, mark.rutland@arm.com,
	Jin Yao <yao.jin@linux.intel.com>
Subject: [PATCH v1 2/2] perf/core: Fake regs for leaked kernel samples
Date: Fri, 31 Jul 2020 10:56:17 +0800	[thread overview]
Message-ID: <20200731025617.16243-2-yao.jin@linux.intel.com> (raw)
In-Reply-To: <20200731025617.16243-1-yao.jin@linux.intel.com>

When doing sampling, for example,

perf record -e cycles:u ...

On workloads that do a lot of kernel entry/exits we see kernel
samples, even though :u is specified. This is due to skid.

This is a potential security issue because it may leak kernel
address even though kernel sampling is disabled.

The commit cc1582c231ea ("perf/core: Drop kernel samples even
though :u is specified") dropped the leaked kernel samples but it
broke rr-project.

Another idea is (inspired by Mark Rutland's original patch), it
doesn't lose the samples, it keeps the samples but fakes the regs
by using the user regs of current task. If the user regs is not
available, uses the instruction_pointer_set to set -1L to ip address
of regs.

For CALLCHAIN, the get_perf_callchain has checked user_mode(regs)
and use task_pt_regs(current) instead in some cases. So actually it
has considered the leaking possibility.

For REGS_USER and STACK_USER, it's similar. The perf_sample_regs_user
has also checked the user_mode(regs). It calls perf_get_regs_user()
for kthread. So we don't need to use "regs_fake" there.

Example:

  perf record -e cycles:u ./div
  perf report --stdio

Before:

  # Overhead  Command  Shared Object     Symbol
  # ........  .......  ................  ........................
  #
      41.08%  div      div               [.] main
      21.04%  div      libc-2.27.so      [.] __random_r
      19.90%  div      libc-2.27.so      [.] __random
       9.86%  div      div               [.] compute_flag
       4.24%  div      libc-2.27.so      [.] rand
       3.88%  div      div               [.] rand@plt
       0.01%  div      [unknown]         [k] 0xffffffffb9601c20
       0.00%  div      libc-2.27.so      [.] _dl_addr
       0.00%  div      ld-2.27.so        [.] __GI___tunables_init
       0.00%  div      [unknown]         [k] 0xffffffffb9601210
       0.00%  div      ld-2.27.so        [.] _start

"0xffffffffb9601c20, 0xffffffffb9601210" are leaked kernel addresses.

After:

  # Overhead  Command  Shared Object  Symbol
  # ........  .......  .............  ........................
  #
      40.86%  div      div            [.] main
      20.67%  div      libc-2.27.so   [.] __random_r
      20.54%  div      libc-2.27.so   [.] __random
       9.65%  div      div            [.] compute_flag
       4.32%  div      libc-2.27.so   [.] rand
       3.97%  div      div            [.] rand@plt
       0.00%  div      ld-2.27.so     [.] __GI___tunables_init
       0.00%  div      ld-2.27.so     [.] _start

Now there is no kernel address leaked.

Inspired-by: Mark Rutland <mark.rutland@arm.com>
Signed-off-by: Jin Yao <yao.jin@linux.intel.com>
---
 kernel/events/core.c | 48 +++++++++++++++++++++++++++++++++++++++++---
 1 file changed, 45 insertions(+), 3 deletions(-)

diff --git a/kernel/events/core.c b/kernel/events/core.c
index 7c436d705fbd..52f6d7f0b86b 100644
--- a/kernel/events/core.c
+++ b/kernel/events/core.c
@@ -6973,7 +6973,8 @@ static struct perf_callchain_entry __empty_callchain = { .nr = 0, };
 struct perf_callchain_entry *
 perf_callchain(struct perf_event *event, struct pt_regs *regs)
 {
-	bool kernel = !event->attr.exclude_callchain_kernel;
+	bool kernel = !event->attr.exclude_callchain_kernel &&
+		      !event->attr.exclude_kernel;
 	bool user   = !event->attr.exclude_callchain_user;
 	/* Disallow cross-task user callchains. */
 	bool crosstask = event->ctx->task && event->ctx->task != current;
@@ -6988,12 +6989,39 @@ perf_callchain(struct perf_event *event, struct pt_regs *regs)
 	return callchain ?: &__empty_callchain;
 }
 
+static struct pt_regs *leak_check(struct perf_event_header *header,
+				  struct perf_event *event,
+				  struct pt_regs *regs)
+{
+	struct pt_regs *regs_fake = NULL;
+
+	if (event->attr.exclude_kernel && !user_mode(regs)) {
+		if (!(current->flags & PF_KTHREAD)) {
+			regs_fake = task_pt_regs(current);
+			if (!user_mode(regs_fake)) {
+				regs_fake = NULL;
+				instruction_pointer_set(regs, -1L);
+			}
+		} else
+			instruction_pointer_set(regs, -1L);
+
+		if ((header->misc & PERF_RECORD_MISC_CPUMODE_MASK) ==
+		     PERF_RECORD_MISC_KERNEL) {
+			header->misc &= ~PERF_RECORD_MISC_CPUMODE_MASK;
+			header->misc |= PERF_RECORD_MISC_USER;
+		}
+	}
+
+	return regs_fake;
+}
+
 void perf_prepare_sample(struct perf_event_header *header,
 			 struct perf_sample_data *data,
 			 struct perf_event *event,
 			 struct pt_regs *regs)
 {
 	u64 sample_type = event->attr.sample_type;
+	struct pt_regs *regs_fake;
 
 	header->type = PERF_RECORD_SAMPLE;
 	header->size = sizeof(*header) + event->header_size;
@@ -7003,8 +7031,19 @@ void perf_prepare_sample(struct perf_event_header *header,
 
 	__perf_event_header__init_id(header, data, event);
 
+	/*
+	 * Due to interrupt latency (AKA "skid"), we may enter the
+	 * kernel before taking an overflow, even if the PMU is only
+	 * counting user events. To avoid leaking kernel address to
+	 * userspace, we try to fake the regs by using the user regs
+	 * of current task.
+	 */
+	regs_fake = leak_check(header, event, regs);
+
 	if (sample_type & PERF_SAMPLE_IP)
-		data->ip = perf_instruction_pointer(regs);
+		data->ip = (regs_fake) ?
+			perf_instruction_pointer(regs_fake) :
+			perf_instruction_pointer(regs);
 
 	if (sample_type & PERF_SAMPLE_CALLCHAIN) {
 		int size = 1;
@@ -7099,7 +7138,10 @@ void perf_prepare_sample(struct perf_event_header *header,
 		/* regs dump ABI info */
 		int size = sizeof(u64);
 
-		perf_sample_regs_intr(&data->regs_intr, regs);
+		if (regs_fake)
+			perf_sample_regs_intr(&data->regs_intr, regs_fake);
+		else
+			perf_sample_regs_intr(&data->regs_intr, regs);
 
 		if (data->regs_intr.regs) {
 			u64 mask = event->attr.sample_regs_intr;
-- 
2.17.1


  reply	other threads:[~2020-07-31  2:57 UTC|newest]

Thread overview: 24+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2020-07-31  2:56 [PATCH v1 1/2] Missing instruction_pointer_set() instances Jin Yao
2020-07-31  2:56 ` Jin Yao [this message]
2020-08-04 11:49   ` [PATCH v1 2/2] perf/core: Fake regs for leaked kernel samples peterz
2020-08-05  2:15     ` Jin, Yao
2020-08-05 12:44       ` peterz
2020-08-05 12:57         ` peterz
2020-08-06  2:26         ` Jin, Yao
2020-08-06  9:18           ` peterz
2020-08-06  9:24             ` peterz
2020-08-07  5:32               ` Jin, Yao
2020-08-06 11:00             ` peterz
2020-08-07  6:24               ` Jin, Yao
2020-08-07  9:02                 ` peterz
2020-08-10  2:03                   ` Jin, Yao
2020-08-07  5:23             ` Jin, Yao
2020-08-11  7:50           ` Jin, Yao
2020-08-11  7:59             ` Peter Zijlstra
2020-08-11  8:31               ` Jin, Yao
2020-08-11  8:45                 ` Peter Zijlstra
2020-08-12  3:52                   ` Jin, Yao
2020-08-12  7:25                     ` Like Xu
2020-08-04 11:31 ` [PATCH v1 1/2] Missing instruction_pointer_set() instances peterz
2020-08-05  0:26   ` Jin, Yao
2020-08-04 21:31 ` Max Filippov

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20200731025617.16243-2-yao.jin@linux.intel.com \
    --to=yao.jin@linux.intel.com \
    --cc=Linux-kernel@vger.kernel.org \
    --cc=acme@kernel.org \
    --cc=ak@linux.intel.com \
    --cc=alexander.shishkin@linux.intel.com \
    --cc=jolsa@kernel.org \
    --cc=kan.liang@intel.com \
    --cc=mark.rutland@arm.com \
    --cc=mingo@redhat.com \
    --cc=oleg@redhat.com \
    --cc=peterz@infradead.org \
    --cc=yao.jin@intel.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).