All of lore.kernel.org
 help / color / mirror / Atom feed
From: "Jin, Yao" <yao.jin@linux.intel.com>
To: Stephane Eranian <eranian@google.com>
Cc: Arnaldo Carvalho de Melo <acme@kernel.org>,
	Jiri Olsa <jolsa@kernel.org>,
	Peter Zijlstra <peterz@infradead.org>,
	Ingo Molnar <mingo@redhat.com>,
	Alexander Shishkin <alexander.shishkin@linux.intel.com>,
	me@kylehuey.com, LKML <Linux-kernel@vger.kernel.org>,
	Vince Weaver <vincent.weaver@maine.edu>,
	Will Deacon <will.deacon@arm.com>,
	Namhyung Kim <namhyung@kernel.org>,
	Andi Kleen <ak@linux.intel.com>,
	"Liang, Kan" <kan.liang@intel.com>,
	"Jin, Yao" <yao.jin@intel.com>
Subject: Re: [PATCH v1 1/2] perf/core: Use sysctl to turn on/off dropping leaked kernel samples
Date: Fri, 15 Jun 2018 15:15:03 +0800	[thread overview]
Message-ID: <5ee8dba2-3346-615c-84a2-3a65203c06a8@linux.intel.com> (raw)
In-Reply-To: <CABPqkBRoecNKQEuJaEw1E+1a5KF-aTucAWK+St8A15FU9dsZsw@mail.gmail.com>



On 6/15/2018 1:59 PM, Stephane Eranian wrote:
> On Thu, Jun 14, 2018 at 7:10 PM Jin Yao <yao.jin@linux.intel.com> wrote:
>>
>> When doing sampling, for example:
>>
>> perf record -e cycles:u ...
>>
>> On workloads that do a lot of kernel entry/exits we see kernel
>> samples, even though :u is specified. This is due to skid existing.
>>
>> This might be a security issue because it can leak kernel addresses even
>> though kernel sampling support is disabled.
>>
>> One patch "perf/core: Drop kernel samples even though :u is specified"
>> was posted in last year but it was reverted because it introduced a
>> regression issue that broke the rr-project, which used sampling
>> events to receive a signal on overflow. These signals were critical
>> to the correct operation of rr.
>>
>> See '6a8a75f32357 ("Revert "perf/core: Drop kernel samples even
>> though :u is specified"")' for detail.
>>
>> Now the idea is to use sysctl to control the dropping of leaked
>> kernel samples.
>>
>> /sys/devices/cpu/perf_allow_sample_leakage:
>>
>> 0 - default, drop the leaked kernel samples.
>> 1 - don't drop the leaked kernel samples.
>>
>> For rr it can write 1 to /sys/devices/cpu/perf_allow_sample_leakage.
>>
>> For example,
>>
>> root@skl:/tmp# cat /sys/devices/cpu/perf_allow_sample_leakage
>> 0
>> root@skl:/tmp# perf record -e cycles:u ./div
>> root@skl:/tmp# perf report --stdio
>>
>> ........  .......  .............  ................
>>
>>      47.01%  div      div            [.] main
>>      20.74%  div      libc-2.23.so   [.] __random_r
>>      15.59%  div      libc-2.23.so   [.] __random
>>       8.68%  div      div            [.] compute_flag
>>       4.48%  div      libc-2.23.so   [.] rand
>>       3.50%  div      div            [.] rand@plt
>>       0.00%  div      ld-2.23.so     [.] do_lookup_x
>>       0.00%  div      ld-2.23.so     [.] memcmp
>>       0.00%  div      ld-2.23.so     [.] _dl_start
>>       0.00%  div      ld-2.23.so     [.] _start
>>
>> There is no kernel symbol reported.
>>
>> root@skl:/tmp# echo 1 > /sys/devices/cpu/perf_allow_sample_leakage
>> root@skl:/tmp# cat /sys/devices/cpu/perf_allow_sample_leakage
>> 1
>> root@skl:/tmp# perf record -e cycles:u ./div
>> root@skl:/tmp# perf report --stdio
>>
>> ........  .......  ................  .............
>>
>>      47.53%  div      div               [.] main
>>      20.62%  div      libc-2.23.so      [.] __random_r
>>      15.32%  div      libc-2.23.so      [.] __random
>>       8.66%  div      div               [.] compute_flag
>>       4.53%  div      libc-2.23.so      [.] rand
>>       3.34%  div      div               [.] rand@plt
>>       0.00%  div      [kernel.vmlinux]  [k] apic_timer_interrupt
>>       0.00%  div      libc-2.23.so      [.] intel_check_word
>>       0.00%  div      ld-2.23.so        [.] brk
>>       0.00%  div      [kernel.vmlinux]  [k] page_fault
>>       0.00%  div      ld-2.23.so        [.] _start
>>
>> We can see the kernel symbols apic_timer_interrupt and page_fault.
>>
> These kernel symbols do not match your description here. How much skid
> do you think you have here?
> You're saying you are at the user level, you get a counter overflow,
> and the interrupted IP lands in the kernel
> because you where there by the time the interrupt is delivered. How
> many instructions does it take to get
> from user land to apic_timer_interrupt() or page_fault()? These
> functions are not right at the kernel entry,
> I believe. So how did you get there, the skid must have been VERY big
> or symbolization has a problem.
> 

I'm testing with the latest perf/core branch (4.17+). Again I test with 
Linux's vmstat (not test with my application).

perf record -e cycles:u vmstat 1
perf script -F ip

      7f84e2b0bc30
      7f84e2b0bc30
      7f84e2b0bc30
      7f84e2b0bc30
  ffffffffb7a01070
      7f84e2b243f0
      7f84e2b11891
      7f84e2b27f5e
      7f84e25a3b26
      7f84e25680f5

cat /proc/kallsyms  | grep page_fault
....
ffffffffb7a01070 T page_fault
ffffffffb7a010a0 T async_page_fault
....

So one sample (ip ffffffffb7a01070) hits on page_fault.

Maybe you can have a try too. :)

Thanks
Jin Yao

>> Signed-off-by: Jin Yao <yao.jin@linux.intel.com>
>> ---
>>   kernel/events/core.c | 58 ++++++++++++++++++++++++++++++++++++++++++++++++++++
>>   1 file changed, 58 insertions(+)
>>
>> diff --git a/kernel/events/core.c b/kernel/events/core.c
>> index 80cca2b..7867541 100644
>> --- a/kernel/events/core.c
>> +++ b/kernel/events/core.c
>> @@ -7721,6 +7721,28 @@ int perf_event_account_interrupt(struct perf_event *event)
>>          return __perf_event_account_interrupt(event, 1);
>>   }
>>
>> +static int perf_allow_sample_leakage __read_mostly;
>> +
>> +static bool sample_is_allowed(struct perf_event *event, struct pt_regs *regs)
>> +{
>> +       int allow_leakage = READ_ONCE(perf_allow_sample_leakage);
>> +
>> +       if (allow_leakage)
>> +               return true;
>> +
>> +       /*
>> +        * Due to interrupt latency (AKA "skid"), we may enter the
>> +        * kernel before taking an overflow, even if the PMU is only
>> +        * counting user events.
>> +        * To avoid leaking information to userspace, we must always
>> +        * reject kernel samples when exclude_kernel is set.
>> +        */
>> +       if (event->attr.exclude_kernel && !user_mode(regs))
>> +               return false;
>> +
>> +       return true;
>> +}
>> +
>>   /*
>>    * Generic event overflow handling, sampling.
>>    */
>> @@ -7742,6 +7764,12 @@ static int __perf_event_overflow(struct perf_event *event,
>>          ret = __perf_event_account_interrupt(event, throttle);
>>
>>          /*
>> +        * For security, drop the skid kernel samples if necessary.
>> +        */
>> +       if (!sample_is_allowed(event, regs))
>> +               return ret;
>> +
>> +       /*
>>           * XXX event_limit might not quite work as expected on inherited
>>           * events
>>           */
>> @@ -9500,9 +9528,39 @@ perf_event_mux_interval_ms_store(struct device *dev,
>>   }
>>   static DEVICE_ATTR_RW(perf_event_mux_interval_ms);
>>
>> +static ssize_t
>> +perf_allow_sample_leakage_show(struct device *dev,
>> +                              struct device_attribute *attr, char *page)
>> +{
>> +       int allow_leakage = READ_ONCE(perf_allow_sample_leakage);
>> +
>> +       return snprintf(page, PAGE_SIZE-1, "%d\n", allow_leakage);
>> +}
>> +
>> +static ssize_t
>> +perf_allow_sample_leakage_store(struct device *dev,
>> +                               struct device_attribute *attr,
>> +                               const char *buf, size_t count)
>> +{
>> +       int allow_leakage, ret;
>> +
>> +       ret = kstrtoint(buf, 0, &allow_leakage);
>> +       if (ret)
>> +               return ret;
>> +
>> +       if (allow_leakage != 0 && allow_leakage != 1)
>> +               return -EINVAL;
>> +
>> +       WRITE_ONCE(perf_allow_sample_leakage, allow_leakage);
>> +
>> +       return count;
>> +}
>> +static DEVICE_ATTR_RW(perf_allow_sample_leakage);
>> +
>>   static struct attribute *pmu_dev_attrs[] = {
>>          &dev_attr_type.attr,
>>          &dev_attr_perf_event_mux_interval_ms.attr,
>> +       &dev_attr_perf_allow_sample_leakage.attr,
>>          NULL,
>>   };
>>   ATTRIBUTE_GROUPS(pmu_dev);
>> --
>> 2.7.4
>>

  reply	other threads:[~2018-06-15  7:15 UTC|newest]

Thread overview: 29+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2018-06-15 10:03 [PATCH v1 0/2] perf: Drop leaked kernel samples Jin Yao
2018-06-15  3:35 ` Kyle Huey
2018-06-15  5:11   ` Jin, Yao
2018-06-15 17:16     ` Kyle Huey
2018-06-15 17:34       ` Robert O'Callahan
2018-06-16  0:50         ` Jin, Yao
2018-06-16  0:56           ` Kyle Huey
2018-06-16  1:18             ` Jin, Yao
2018-06-15  7:45 ` Peter Zijlstra
2018-06-15  8:01   ` Jin, Yao
2018-06-15  8:12     ` Peter Zijlstra
2018-06-15  8:24       ` Jin, Yao
2018-06-15 16:54       ` Stephane Eranian
2018-06-15 10:03 ` [PATCH v1 1/2] perf/core: Use sysctl to turn on/off dropping " Jin Yao
2018-06-15  5:59   ` Stephane Eranian
2018-06-15  7:15     ` Jin, Yao [this message]
2018-06-19 16:50       ` Stephane Eranian
2018-06-15  6:02   ` Stephane Eranian
2018-06-15  8:16     ` Peter Zijlstra
2018-06-15 13:31       ` Liang, Kan
2018-06-18 10:41         ` Peter Zijlstra
2018-06-15 11:36   ` Mark Rutland
2018-06-16  1:27     ` Linus Torvalds
2018-06-18 10:51       ` Peter Zijlstra
2018-06-18  6:55     ` Jin, Yao
2018-06-18 10:45       ` Peter Zijlstra
2018-06-19  1:39         ` Jin, Yao
2018-06-19  6:01           ` Mark Rutland
2018-06-15 10:03 ` [PATCH v1 2/2] perf Documentation: Introduce the sysctl perf_allow_sample_leakage Jin Yao

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=5ee8dba2-3346-615c-84a2-3a65203c06a8@linux.intel.com \
    --to=yao.jin@linux.intel.com \
    --cc=Linux-kernel@vger.kernel.org \
    --cc=acme@kernel.org \
    --cc=ak@linux.intel.com \
    --cc=alexander.shishkin@linux.intel.com \
    --cc=eranian@google.com \
    --cc=jolsa@kernel.org \
    --cc=kan.liang@intel.com \
    --cc=me@kylehuey.com \
    --cc=mingo@redhat.com \
    --cc=namhyung@kernel.org \
    --cc=peterz@infradead.org \
    --cc=vincent.weaver@maine.edu \
    --cc=will.deacon@arm.com \
    --cc=yao.jin@intel.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.