All of lore.kernel.org
 help / color / mirror / Atom feed
* [PATCH V5] perf/x86: Reset the dirty counter to prevent the leak for an RDPMC task
@ 2021-04-20 22:30 kan.liang
  2021-04-21  8:11 ` Peter Zijlstra
  0 siblings, 1 reply; 5+ messages in thread
From: kan.liang @ 2021-04-20 22:30 UTC (permalink / raw)
  To: peterz, mingo, linux-kernel
  Cc: ak, acme, mark.rutland, luto, eranian, namhyung, Kan Liang

From: Kan Liang <kan.liang@linux.intel.com>

The counter value of a perf task may leak to another RDPMC task.
For example, a perf stat task as below is running on CPU 0.

    perf stat -e 'branches,cycles' -- taskset -c 0 ./workload

In the meantime, an RDPMC task, which is also running on CPU 0, may read
the GP counters periodically. (The RDPMC task creates a fixed event,
but read four GP counters.)

    $ taskset -c 0 ./rdpmc_read_all_counters
    index 0x0 value 0x8001e5970f99
    index 0x1 value 0x8005d750edb6
    index 0x2 value 0x0
    index 0x3 value 0x0

    index 0x0 value 0x8002358e48a5
    index 0x1 value 0x8006bd1e3bc9
    index 0x2 value 0x0
    index 0x3 value 0x0

It is a potential security issue. Once the attacker knows what the other
thread is counting. The PerfMon counter can be used as a side-channel to
attack cryptosystems.

The counter value of the perf stat task leaks to the RDPMC task because
perf never clears the counter when it's stopped.

Two methods were considered to address the issue.
- Unconditionally reset the counter in x86_pmu_del(). It can bring extra
  overhead even when there is no RDPMC task running.
- Only reset the un-assigned dirty counters when the RDPMC task is
  scheduled in. The method is implemented here.

The dirty counter is a counter, on which the assigned event has been
deleted, but the counter is not reset. To track the dirty counters,
add a 'dirty' variable in the struct cpu_hw_events.

The current code doesn't reset the counter when the assigned event is
deleted. Set the corresponding bit in the 'dirty' variable in
x86_pmu_del(), if the RDPMC feature is available on the system.

The security issue can only be found with an RDPMC task. The event for
an RDPMC task requires the mmap buffer. This can be used to detect an
RDPMC task. Once the event is detected in the event_mapped(), enable
sched_task(), which is invoked in each context switch. Add a check in
the sched_task() to clear the dirty counters, when the RDPMC task is
scheduled in. Only the current un-assigned dirty counters are reset,
bacuase the RDPMC assigned dirty counters will be updated soon.

The RDPMC instruction is also supported on the older platforms. Add
sched_task() for the core_pmu. The core_pmu doesn't support large PEBS
and LBR callstack, the intel_pmu_pebs/lbr_sched_task() will be ignored.

The RDPMC is not Intel-only feature. Add the dirty counters clear code
in the X86 generic code.

After applying the patch,

        $ taskset -c 0 ./rdpmc_read_all_counters
        index 0x0 value 0x0
        index 0x1 value 0x0
        index 0x2 value 0x0
        index 0x3 value 0x0

        index 0x0 value 0x0
        index 0x1 value 0x0
        index 0x2 value 0x0
        index 0x3 value 0x0

Performance

The performance of a context switch only be impacted when there are two
or more perf users and one of the users must be an RDPMC user. In other
cases, there is no performance impact.

The worst-case occurs when there are two users: the RDPMC user only
applies one counter; while the other user applies all available
counters. When the RDPMC task is scheduled in, all the counters, other
than the RDPMC assigned one, have to be reset.

Here is the test result for the worst-case.

The test is implemented on an Ice Lake platform, which has 8 GP
counters and 3 fixed counters (Not include SLOTS counter).

The lat_ctx is used to measure the context switching time.

    lat_ctx -s 128K -N 1000 processes 2

I instrument the lat_ctx to open all 8 GP counters and 3 fixed
counters for one task. The other task opens a fixed counter and enable
RDPMC.

Without the patch:
The context switch time is 4.97 us

With the patch:
The context switch time is 5.16 us

There is ~4% performance drop for the context switching time in the
worst-case.

Suggested-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Signed-off-by: Kan Liang <kan.liang@linux.intel.com>
---
Changes since V4:
- Fix the warning with CONFIG_DEBUG_PREEMPT=y
  Disable the interrupts and preemption around perf_sched_cb_inc/dec()
  to protect the sched_cb_list. I don't think we touch the area in NMI.
  Disabling the interrupts should be good enough to protect the cpuctx.
  We don't need perf_ctx_lock().

Changes since V3:
- Fix warnings reported by kernel test robot <lkp@intel.com>
- Move bitmap_empty() check after clearing assigned counters.
  It should be very likely that the cpuc->dirty is non-empty.
  Move it after the clearing can skip the for_each_set_bit() and
  bitmap_zero().

The V2 can be found here.
https://lore.kernel.org/lkml/20200821195754.20159-3-kan.liang@linux.intel.com/

Changes since V2:
- Unconditionally set cpuc->dirty. The worst case for an RDPMC task is
  that we may have to clear all counters for the first time in
  x86_pmu_event_mapped. After that, the sched_task() will clear/update
  the 'dirty'. Only the real 'dirty' counters are clear. For a non-RDPMC
  task, it's harmless to unconditionally set the cpuc->dirty.
- Remove the !is_sampling_event() check
- Move the code into X86 generic file, because RDPMC is not a Intel-only
  feature.

Changes since V1:
- Drop the old method, which unconditionally reset the counter in
  x86_pmu_del().
  Only reset the dirty counters when a RDPMC task is sheduled in.
---
 arch/x86/events/core.c       | 55 ++++++++++++++++++++++++++++++++++++++++++++
 arch/x86/events/perf_event.h |  1 +
 2 files changed, 56 insertions(+)

diff --git a/arch/x86/events/core.c b/arch/x86/events/core.c
index dd9f3c2..45630beed 100644
--- a/arch/x86/events/core.c
+++ b/arch/x86/events/core.c
@@ -1585,6 +1585,8 @@ static void x86_pmu_del(struct perf_event *event, int flags)
 	if (cpuc->txn_flags & PERF_PMU_TXN_ADD)
 		goto do_del;
 
+	__set_bit(event->hw.idx, cpuc->dirty);
+
 	/*
 	 * Not a TXN, therefore cleanup properly.
 	 */
@@ -2304,12 +2306,50 @@ static int x86_pmu_event_init(struct perf_event *event)
 	return err;
 }
 
+static void x86_pmu_clear_dirty_counters(void)
+{
+	struct cpu_hw_events *cpuc = this_cpu_ptr(&cpu_hw_events);
+	int i;
+
+	 /* Don't need to clear the assigned counter. */
+	for (i = 0; i < cpuc->n_events; i++)
+		__clear_bit(cpuc->assign[i], cpuc->dirty);
+
+	if (bitmap_empty(cpuc->dirty, X86_PMC_IDX_MAX))
+		return;
+
+	for_each_set_bit(i, cpuc->dirty, X86_PMC_IDX_MAX) {
+		/* Metrics and fake events don't have corresponding HW counters. */
+		if (is_metric_idx(i) || (i == INTEL_PMC_IDX_FIXED_VLBR))
+			continue;
+		else if (i >= INTEL_PMC_IDX_FIXED)
+			wrmsrl(MSR_ARCH_PERFMON_FIXED_CTR0 + (i - INTEL_PMC_IDX_FIXED), 0);
+		else
+			wrmsrl(x86_pmu_event_addr(i), 0);
+	}
+
+	bitmap_zero(cpuc->dirty, X86_PMC_IDX_MAX);
+}
+
 static void x86_pmu_event_mapped(struct perf_event *event, struct mm_struct *mm)
 {
+	unsigned long flags;
+
 	if (!(event->hw.flags & PERF_X86_EVENT_RDPMC_ALLOWED))
 		return;
 
 	/*
+	 * Enable sched_task() for the RDPMC task,
+	 * and clear the existing dirty counters.
+	 */
+	if (x86_pmu.sched_task && event->hw.target) {
+		local_irq_save(flags);
+		perf_sched_cb_inc(event->ctx->pmu);
+		x86_pmu_clear_dirty_counters();
+		local_irq_restore(flags);
+	}
+
+	/*
 	 * This function relies on not being called concurrently in two
 	 * tasks in the same mm.  Otherwise one task could observe
 	 * perf_rdpmc_allowed > 1 and return all the way back to
@@ -2327,10 +2367,17 @@ static void x86_pmu_event_mapped(struct perf_event *event, struct mm_struct *mm)
 
 static void x86_pmu_event_unmapped(struct perf_event *event, struct mm_struct *mm)
 {
+	unsigned long flags;
 
 	if (!(event->hw.flags & PERF_X86_EVENT_RDPMC_ALLOWED))
 		return;
 
+	if (x86_pmu.sched_task && event->hw.target) {
+		local_irq_save(flags);
+		perf_sched_cb_dec(event->ctx->pmu);
+		local_irq_restore(flags);
+	}
+
 	if (atomic_dec_and_test(&mm->context.perf_rdpmc_allowed))
 		on_each_cpu_mask(mm_cpumask(mm), cr4_update_pce, NULL, 1);
 }
@@ -2436,6 +2483,14 @@ static const struct attribute_group *x86_pmu_attr_groups[] = {
 static void x86_pmu_sched_task(struct perf_event_context *ctx, bool sched_in)
 {
 	static_call_cond(x86_pmu_sched_task)(ctx, sched_in);
+
+	/*
+	 * If a new task has the RDPMC enabled, clear the dirty counters
+	 * to prevent the potential leak.
+	 */
+	if (sched_in && ctx && READ_ONCE(x86_pmu.attr_rdpmc) &&
+	    current->mm && atomic_read(&current->mm->context.perf_rdpmc_allowed))
+		x86_pmu_clear_dirty_counters();
 }
 
 static void x86_pmu_swap_task_ctx(struct perf_event_context *prev,
diff --git a/arch/x86/events/perf_event.h b/arch/x86/events/perf_event.h
index 54a340e..e855f20 100644
--- a/arch/x86/events/perf_event.h
+++ b/arch/x86/events/perf_event.h
@@ -228,6 +228,7 @@ struct cpu_hw_events {
 	 */
 	struct perf_event	*events[X86_PMC_IDX_MAX]; /* in counter order */
 	unsigned long		active_mask[BITS_TO_LONGS(X86_PMC_IDX_MAX)];
+	unsigned long		dirty[BITS_TO_LONGS(X86_PMC_IDX_MAX)];
 	int			enabled;
 
 	int			n_events; /* the # of events in the below arrays */
-- 
2.7.4


^ permalink raw reply related	[flat|nested] 5+ messages in thread

* Re: [PATCH V5] perf/x86: Reset the dirty counter to prevent the leak for an RDPMC task
  2021-04-20 22:30 [PATCH V5] perf/x86: Reset the dirty counter to prevent the leak for an RDPMC task kan.liang
@ 2021-04-21  8:11 ` Peter Zijlstra
  2021-04-21 15:12   ` Liang, Kan
  0 siblings, 1 reply; 5+ messages in thread
From: Peter Zijlstra @ 2021-04-21  8:11 UTC (permalink / raw)
  To: kan.liang
  Cc: mingo, linux-kernel, ak, acme, mark.rutland, luto, eranian, namhyung

On Tue, Apr 20, 2021 at 03:30:42PM -0700, kan.liang@linux.intel.com wrote:
>  static void x86_pmu_event_mapped(struct perf_event *event, struct mm_struct *mm)
>  {
> +	unsigned long flags;
> +
>  	if (!(event->hw.flags & PERF_X86_EVENT_RDPMC_ALLOWED))
>  		return;
>  
>  	/*
> +	 * Enable sched_task() for the RDPMC task,
> +	 * and clear the existing dirty counters.
> +	 */
> +	if (x86_pmu.sched_task && event->hw.target) {
> +		local_irq_save(flags);
> +		perf_sched_cb_inc(event->ctx->pmu);
> +		x86_pmu_clear_dirty_counters();
> +		local_irq_restore(flags);
> +	}
> +
> +	/*
>  	 * This function relies on not being called concurrently in two
>  	 * tasks in the same mm.  Otherwise one task could observe
>  	 * perf_rdpmc_allowed > 1 and return all the way back to
> @@ -2327,10 +2367,17 @@ static void x86_pmu_event_mapped(struct perf_event *event, struct mm_struct *mm)
>  
>  static void x86_pmu_event_unmapped(struct perf_event *event, struct mm_struct *mm)
>  {
> +	unsigned long flags;
>  
>  	if (!(event->hw.flags & PERF_X86_EVENT_RDPMC_ALLOWED))
>  		return;
>  
> +	if (x86_pmu.sched_task && event->hw.target) {
> +		local_irq_save(flags);
> +		perf_sched_cb_dec(event->ctx->pmu);
> +		local_irq_restore(flags);
> +	}
> +
>  	if (atomic_dec_and_test(&mm->context.perf_rdpmc_allowed))
>  		on_each_cpu_mask(mm_cpumask(mm), cr4_update_pce, NULL, 1);
>  }

I don't understand how this can possibly be correct. Both
perf_sched_cb_{inc,dec} modify strict per-cpu state, but the mapped
functions happen on whatever random CPU of the moment whenever the task
memory map changes.

Suppose we mmap() on CPU0 and then exit on CPU1. Suppose the task does
mmap() on CPU0 but then creates threads and runs on CPU1-4 concurrently
before existing on CPU5.

Could be I'm not seeing it due to having a snot-brain, please explain.

^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: [PATCH V5] perf/x86: Reset the dirty counter to prevent the leak for an RDPMC task
  2021-04-21  8:11 ` Peter Zijlstra
@ 2021-04-21 15:12   ` Liang, Kan
  2021-04-22 17:52     ` Rob Herring
  0 siblings, 1 reply; 5+ messages in thread
From: Liang, Kan @ 2021-04-21 15:12 UTC (permalink / raw)
  To: Peter Zijlstra
  Cc: mingo, linux-kernel, ak, acme, mark.rutland, luto, eranian, namhyung



On 4/21/2021 4:11 AM, Peter Zijlstra wrote:
>> @@ -2327,10 +2367,17 @@ static void x86_pmu_event_mapped(struct perf_event *event, struct mm_struct *mm)
>>   
>>   static void x86_pmu_event_unmapped(struct perf_event *event, struct mm_struct *mm)
>>   {
>> +	unsigned long flags;
>>   
>>   	if (!(event->hw.flags & PERF_X86_EVENT_RDPMC_ALLOWED))
>>   		return;
>>   
>> +	if (x86_pmu.sched_task && event->hw.target) {
>> +		local_irq_save(flags);
>> +		perf_sched_cb_dec(event->ctx->pmu);
>> +		local_irq_restore(flags);
>> +	}
>> +
>>   	if (atomic_dec_and_test(&mm->context.perf_rdpmc_allowed))
>>   		on_each_cpu_mask(mm_cpumask(mm), cr4_update_pce, NULL, 1);
>>   }
> I don't understand how this can possibly be correct. Both
> perf_sched_cb_{inc,dec} modify strict per-cpu state, but the mapped
> functions happen on whatever random CPU of the moment whenever the task
> memory map changes.
> 
> Suppose we mmap() on CPU0 and then exit on CPU1. Suppose the task does
> mmap() on CPU0 but then creates threads and runs on CPU1-4 concurrently
> before existing on CPU5.
> 
> Could be I'm not seeing it due to having a snot-brain, please explain.

You are right.
I implemented a new test case which mmap() on CPU 3, run and exit on CPU 
0. It can still read the counter values from other tasks on CPU 0.

Actually, I don't think we need perf_sched_cb_{inc,dec} and sched_task().

The mm->context.perf_rdpmc_allowed will tell us if it's a RDPMC task.
I think a clean way should be to add a new check_leakage() method. When 
perf schedules in a RDPMC task, we invoke the method and clear the dirty 
counters.
(I use a generic name check_leakage for the method so it can be reused 
by other ARCHs if possible.)

The patch is as below. The new and old test cases are all passed. I will 
do more tests.

diff --git a/arch/x86/events/core.c b/arch/x86/events/core.c
index c7fcc8d..229dd48 100644
--- a/arch/x86/events/core.c
+++ b/arch/x86/events/core.c
@@ -1624,6 +1624,8 @@ static void x86_pmu_del(struct perf_event *event, 
int flags)
  	if (cpuc->txn_flags & PERF_PMU_TXN_ADD)
  		goto do_del;

+	__set_bit(event->hw.idx, cpuc->dirty);
+
  	/*
  	 * Not a TXN, therefore cleanup properly.
  	 */
@@ -2631,6 +2633,37 @@ static int x86_pmu_check_period(struct perf_event 
*event, u64 value)
  	return 0;
  }

+static void x86_pmu_clear_dirty_counters(void)
+{
+	struct cpu_hw_events *cpuc = this_cpu_ptr(&cpu_hw_events);
+	int i;
+
+	 /* Don't need to clear the assigned counter. */
+	for (i = 0; i < cpuc->n_events; i++)
+		__clear_bit(cpuc->assign[i], cpuc->dirty);
+
+	if (bitmap_empty(cpuc->dirty, X86_PMC_IDX_MAX))
+		return;
+
+	for_each_set_bit(i, cpuc->dirty, X86_PMC_IDX_MAX) {
+		/* Metrics and fake events don't have corresponding HW counters. */
+		if (is_metric_idx(i) || (i == INTEL_PMC_IDX_FIXED_VLBR))
+			continue;
+		else if (i >= INTEL_PMC_IDX_FIXED)
+			wrmsrl(MSR_ARCH_PERFMON_FIXED_CTR0 + (i - INTEL_PMC_IDX_FIXED), 0);
+		else
+			wrmsrl(x86_pmu_event_addr(i), 0);
+	}
+
+	bitmap_zero(cpuc->dirty, X86_PMC_IDX_MAX);
+}
+
+static void x86_pmu_check_leakage(void)
+{
+	if (READ_ONCE(x86_pmu.attr_rdpmc))
+		x86_pmu_clear_dirty_counters();
+}
+
  static int x86_pmu_aux_output_match(struct perf_event *event)
  {
  	if (!(pmu.capabilities & PERF_PMU_CAP_AUX_OUTPUT))
@@ -2675,6 +2708,7 @@ static struct pmu pmu = {
  	.sched_task		= x86_pmu_sched_task,
  	.swap_task_ctx		= x86_pmu_swap_task_ctx,
  	.check_period		= x86_pmu_check_period,
+	.check_leakage		= x86_pmu_check_leakage,

  	.aux_output_match	= x86_pmu_aux_output_match,

diff --git a/arch/x86/events/perf_event.h b/arch/x86/events/perf_event.h
index 27fa85e..d6003e0 100644
--- a/arch/x86/events/perf_event.h
+++ b/arch/x86/events/perf_event.h
@@ -229,6 +229,7 @@ struct cpu_hw_events {
  	 */
  	struct perf_event	*events[X86_PMC_IDX_MAX]; /* in counter order */
  	unsigned long		active_mask[BITS_TO_LONGS(X86_PMC_IDX_MAX)];
+	unsigned long		dirty[BITS_TO_LONGS(X86_PMC_IDX_MAX)];
  	int			enabled;

  	int			n_events; /* the # of events in the below arrays */
diff --git a/include/linux/perf_event.h b/include/linux/perf_event.h
index a763928..bcf3964 100644
--- a/include/linux/perf_event.h
+++ b/include/linux/perf_event.h
@@ -514,6 +514,11 @@ struct pmu {
  	 * Check period value for PERF_EVENT_IOC_PERIOD ioctl.
  	 */
  	int (*check_period)		(struct perf_event *event, u64 value); /* 
optional */
+
+	/*
+	 * Check and clear dirty counters to prevent potential leakage
+	 */
+	void (*check_leakage)		(void); /* optional */
  };

  enum perf_addr_filter_action_t {
diff --git a/kernel/events/core.c b/kernel/events/core.c
index 928b166..b496113 100644
--- a/kernel/events/core.c
+++ b/kernel/events/core.c
@@ -3822,6 +3822,12 @@ static void cpu_ctx_sched_in(struct 
perf_cpu_context *cpuctx,
  	ctx_sched_in(ctx, cpuctx, event_type, task);
  }

+static bool has_check_leakage(struct pmu *pmu)
+{
+	return pmu->check_leakage && current->mm &&
+	       atomic_read(&current->mm->context.perf_rdpmc_allowed);
+}
+
  static void perf_event_context_sched_in(struct perf_event_context *ctx,
  					struct task_struct *task)
  {
@@ -3832,6 +3838,8 @@ static void perf_event_context_sched_in(struct 
perf_event_context *ctx,
  	if (cpuctx->task_ctx == ctx) {
  		if (cpuctx->sched_cb_usage)
  			__perf_pmu_sched_task(cpuctx, true);
+		if (has_check_leakage(pmu))
+			pmu->check_leakage();
  		return;
  	}

@@ -3858,6 +3866,8 @@ static void perf_event_context_sched_in(struct 
perf_event_context *ctx,

  	if (cpuctx->sched_cb_usage && pmu->sched_task)
  		pmu->sched_task(cpuctx->task_ctx, true);
+	if (has_check_leakage(pmu))
+		pmu->check_leakage();

  	perf_pmu_enable(pmu);


Thanks,
Kan

^ permalink raw reply related	[flat|nested] 5+ messages in thread

* Re: [PATCH V5] perf/x86: Reset the dirty counter to prevent the leak for an RDPMC task
  2021-04-21 15:12   ` Liang, Kan
@ 2021-04-22 17:52     ` Rob Herring
  2021-04-22 18:47       ` Liang, Kan
  0 siblings, 1 reply; 5+ messages in thread
From: Rob Herring @ 2021-04-22 17:52 UTC (permalink / raw)
  To: Liang, Kan
  Cc: Peter Zijlstra, mingo, linux-kernel, ak, acme, mark.rutland,
	luto, eranian, namhyung

On Wed, Apr 21, 2021 at 11:12:06AM -0400, Liang, Kan wrote:
> 
> 
> On 4/21/2021 4:11 AM, Peter Zijlstra wrote:
> > > @@ -2327,10 +2367,17 @@ static void x86_pmu_event_mapped(struct perf_event *event, struct mm_struct *mm)
> > >   static void x86_pmu_event_unmapped(struct perf_event *event, struct mm_struct *mm)
> > >   {
> > > +	unsigned long flags;
> > >   	if (!(event->hw.flags & PERF_X86_EVENT_RDPMC_ALLOWED))
> > >   		return;
> > > +	if (x86_pmu.sched_task && event->hw.target) {
> > > +		local_irq_save(flags);
> > > +		perf_sched_cb_dec(event->ctx->pmu);
> > > +		local_irq_restore(flags);
> > > +	}
> > > +
> > >   	if (atomic_dec_and_test(&mm->context.perf_rdpmc_allowed))
> > >   		on_each_cpu_mask(mm_cpumask(mm), cr4_update_pce, NULL, 1);
> > >   }
> > I don't understand how this can possibly be correct. Both
> > perf_sched_cb_{inc,dec} modify strict per-cpu state, but the mapped
> > functions happen on whatever random CPU of the moment whenever the task
> > memory map changes.
> > 
> > Suppose we mmap() on CPU0 and then exit on CPU1. Suppose the task does
> > mmap() on CPU0 but then creates threads and runs on CPU1-4 concurrently
> > before existing on CPU5.
> > 
> > Could be I'm not seeing it due to having a snot-brain, please explain.
> 
> You are right.
> I implemented a new test case which mmap() on CPU 3, run and exit on CPU 0.
> It can still read the counter values from other tasks on CPU 0.

Can you publish your tests? I'm working on arm64 user access support.

> Actually, I don't think we need perf_sched_cb_{inc,dec} and sched_task().
> 
> The mm->context.perf_rdpmc_allowed will tell us if it's a RDPMC task.
> I think a clean way should be to add a new check_leakage() method. When perf
> schedules in a RDPMC task, we invoke the method and clear the dirty
> counters.
> (I use a generic name check_leakage for the method so it can be reused by
> other ARCHs if possible.)
> 
> The patch is as below. The new and old test cases are all passed. I will do
> more tests.
> 
> diff --git a/arch/x86/events/core.c b/arch/x86/events/core.c
> index c7fcc8d..229dd48 100644
> --- a/arch/x86/events/core.c
> +++ b/arch/x86/events/core.c
> @@ -1624,6 +1624,8 @@ static void x86_pmu_del(struct perf_event *event, int
> flags)
>  	if (cpuc->txn_flags & PERF_PMU_TXN_ADD)
>  		goto do_del;
> 
> +	__set_bit(event->hw.idx, cpuc->dirty);
> +
>  	/*
>  	 * Not a TXN, therefore cleanup properly.
>  	 */
> @@ -2631,6 +2633,37 @@ static int x86_pmu_check_period(struct perf_event
> *event, u64 value)
>  	return 0;
>  }
> 
> +static void x86_pmu_clear_dirty_counters(void)
> +{
> +	struct cpu_hw_events *cpuc = this_cpu_ptr(&cpu_hw_events);
> +	int i;
> +
> +	 /* Don't need to clear the assigned counter. */
> +	for (i = 0; i < cpuc->n_events; i++)
> +		__clear_bit(cpuc->assign[i], cpuc->dirty);
> +
> +	if (bitmap_empty(cpuc->dirty, X86_PMC_IDX_MAX))
> +		return;
> +
> +	for_each_set_bit(i, cpuc->dirty, X86_PMC_IDX_MAX) {
> +		/* Metrics and fake events don't have corresponding HW counters. */
> +		if (is_metric_idx(i) || (i == INTEL_PMC_IDX_FIXED_VLBR))
> +			continue;
> +		else if (i >= INTEL_PMC_IDX_FIXED)
> +			wrmsrl(MSR_ARCH_PERFMON_FIXED_CTR0 + (i - INTEL_PMC_IDX_FIXED), 0);
> +		else
> +			wrmsrl(x86_pmu_event_addr(i), 0);
> +	}
> +
> +	bitmap_zero(cpuc->dirty, X86_PMC_IDX_MAX);
> +}
> +
> +static void x86_pmu_check_leakage(void)
> +{
> +	if (READ_ONCE(x86_pmu.attr_rdpmc))
> +		x86_pmu_clear_dirty_counters();
> +}
> +
>  static int x86_pmu_aux_output_match(struct perf_event *event)
>  {
>  	if (!(pmu.capabilities & PERF_PMU_CAP_AUX_OUTPUT))
> @@ -2675,6 +2708,7 @@ static struct pmu pmu = {
>  	.sched_task		= x86_pmu_sched_task,
>  	.swap_task_ctx		= x86_pmu_swap_task_ctx,
>  	.check_period		= x86_pmu_check_period,
> +	.check_leakage		= x86_pmu_check_leakage,
> 
>  	.aux_output_match	= x86_pmu_aux_output_match,
> 
> diff --git a/arch/x86/events/perf_event.h b/arch/x86/events/perf_event.h
> index 27fa85e..d6003e0 100644
> --- a/arch/x86/events/perf_event.h
> +++ b/arch/x86/events/perf_event.h
> @@ -229,6 +229,7 @@ struct cpu_hw_events {
>  	 */
>  	struct perf_event	*events[X86_PMC_IDX_MAX]; /* in counter order */
>  	unsigned long		active_mask[BITS_TO_LONGS(X86_PMC_IDX_MAX)];
> +	unsigned long		dirty[BITS_TO_LONGS(X86_PMC_IDX_MAX)];
>  	int			enabled;
> 
>  	int			n_events; /* the # of events in the below arrays */
> diff --git a/include/linux/perf_event.h b/include/linux/perf_event.h
> index a763928..bcf3964 100644
> --- a/include/linux/perf_event.h
> +++ b/include/linux/perf_event.h
> @@ -514,6 +514,11 @@ struct pmu {
>  	 * Check period value for PERF_EVENT_IOC_PERIOD ioctl.
>  	 */
>  	int (*check_period)		(struct perf_event *event, u64 value); /* optional */
> +
> +	/*
> +	 * Check and clear dirty counters to prevent potential leakage
> +	 */
> +	void (*check_leakage)		(void); /* optional */
>  };
> 
>  enum perf_addr_filter_action_t {
> diff --git a/kernel/events/core.c b/kernel/events/core.c
> index 928b166..b496113 100644
> --- a/kernel/events/core.c
> +++ b/kernel/events/core.c
> @@ -3822,6 +3822,12 @@ static void cpu_ctx_sched_in(struct perf_cpu_context
> *cpuctx,
>  	ctx_sched_in(ctx, cpuctx, event_type, task);
>  }
> 
> +static bool has_check_leakage(struct pmu *pmu)
> +{
> +	return pmu->check_leakage && current->mm &&
> +	       atomic_read(&current->mm->context.perf_rdpmc_allowed);

context.perf_rdpmc_allowed is arch specific.

> +}
> +
>  static void perf_event_context_sched_in(struct perf_event_context *ctx,
>  					struct task_struct *task)
>  {
> @@ -3832,6 +3838,8 @@ static void perf_event_context_sched_in(struct
> perf_event_context *ctx,
>  	if (cpuctx->task_ctx == ctx) {
>  		if (cpuctx->sched_cb_usage)
>  			__perf_pmu_sched_task(cpuctx, true);
> +		if (has_check_leakage(pmu))
> +			pmu->check_leakage();
>  		return;
>  	}
> 
> @@ -3858,6 +3866,8 @@ static void perf_event_context_sched_in(struct
> perf_event_context *ctx,
> 
>  	if (cpuctx->sched_cb_usage && pmu->sched_task)
>  		pmu->sched_task(cpuctx->task_ctx, true);
> +	if (has_check_leakage(pmu))
> +		pmu->check_leakage();
> 
>  	perf_pmu_enable(pmu);
> 
> 
> Thanks,
> Kan

^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: [PATCH V5] perf/x86: Reset the dirty counter to prevent the leak for an RDPMC task
  2021-04-22 17:52     ` Rob Herring
@ 2021-04-22 18:47       ` Liang, Kan
  0 siblings, 0 replies; 5+ messages in thread
From: Liang, Kan @ 2021-04-22 18:47 UTC (permalink / raw)
  To: Rob Herring
  Cc: Peter Zijlstra, mingo, linux-kernel, ak, acme, mark.rutland,
	luto, eranian, namhyung



On 4/22/2021 1:52 PM, Rob Herring wrote:
> On Wed, Apr 21, 2021 at 11:12:06AM -0400, Liang, Kan wrote:
>>
>>
>> On 4/21/2021 4:11 AM, Peter Zijlstra wrote:
>>>> @@ -2327,10 +2367,17 @@ static void x86_pmu_event_mapped(struct perf_event *event, struct mm_struct *mm)
>>>>    static void x86_pmu_event_unmapped(struct perf_event *event, struct mm_struct *mm)
>>>>    {
>>>> +	unsigned long flags;
>>>>    	if (!(event->hw.flags & PERF_X86_EVENT_RDPMC_ALLOWED))
>>>>    		return;
>>>> +	if (x86_pmu.sched_task && event->hw.target) {
>>>> +		local_irq_save(flags);
>>>> +		perf_sched_cb_dec(event->ctx->pmu);
>>>> +		local_irq_restore(flags);
>>>> +	}
>>>> +
>>>>    	if (atomic_dec_and_test(&mm->context.perf_rdpmc_allowed))
>>>>    		on_each_cpu_mask(mm_cpumask(mm), cr4_update_pce, NULL, 1);
>>>>    }
>>> I don't understand how this can possibly be correct. Both
>>> perf_sched_cb_{inc,dec} modify strict per-cpu state, but the mapped
>>> functions happen on whatever random CPU of the moment whenever the task
>>> memory map changes.
>>>
>>> Suppose we mmap() on CPU0 and then exit on CPU1. Suppose the task does
>>> mmap() on CPU0 but then creates threads and runs on CPU1-4 concurrently
>>> before existing on CPU5.
>>>
>>> Could be I'm not seeing it due to having a snot-brain, please explain.
>>
>> You are right.
>> I implemented a new test case which mmap() on CPU 3, run and exit on CPU 0.
>> It can still read the counter values from other tasks on CPU 0.
> 
> Can you publish your tests? I'm working on arm64 user access support.

Does Arm support PDPMC?

Not sure whether I can post the codes. I guess I need to check and do 
some modification before sharing the code.

The test case is very simple. The pseudo code is as below.

BindToCpu(3);
perf_open_and_enable()
rdpmc_read(); #read all counters in CPU 3.
BindToCpu(0);
rdpmc_read(); #read all counters in CPU 0.
perf_close();


> 
>> Actually, I don't think we need perf_sched_cb_{inc,dec} and sched_task().
>>
>> The mm->context.perf_rdpmc_allowed will tell us if it's a RDPMC task.
>> I think a clean way should be to add a new check_leakage() method. When perf
>> schedules in a RDPMC task, we invoke the method and clear the dirty
>> counters.
>> (I use a generic name check_leakage for the method so it can be reused by
>> other ARCHs if possible.)
>>
>> The patch is as below. The new and old test cases are all passed. I will do
>> more tests.
>>
>> diff --git a/arch/x86/events/core.c b/arch/x86/events/core.c
>> index c7fcc8d..229dd48 100644
>> --- a/arch/x86/events/core.c
>> +++ b/arch/x86/events/core.c
>> @@ -1624,6 +1624,8 @@ static void x86_pmu_del(struct perf_event *event, int
>> flags)
>>   	if (cpuc->txn_flags & PERF_PMU_TXN_ADD)
>>   		goto do_del;
>>
>> +	__set_bit(event->hw.idx, cpuc->dirty);
>> +
>>   	/*
>>   	 * Not a TXN, therefore cleanup properly.
>>   	 */
>> @@ -2631,6 +2633,37 @@ static int x86_pmu_check_period(struct perf_event
>> *event, u64 value)
>>   	return 0;
>>   }
>>
>> +static void x86_pmu_clear_dirty_counters(void)
>> +{
>> +	struct cpu_hw_events *cpuc = this_cpu_ptr(&cpu_hw_events);
>> +	int i;
>> +
>> +	 /* Don't need to clear the assigned counter. */
>> +	for (i = 0; i < cpuc->n_events; i++)
>> +		__clear_bit(cpuc->assign[i], cpuc->dirty);
>> +
>> +	if (bitmap_empty(cpuc->dirty, X86_PMC_IDX_MAX))
>> +		return;
>> +
>> +	for_each_set_bit(i, cpuc->dirty, X86_PMC_IDX_MAX) {
>> +		/* Metrics and fake events don't have corresponding HW counters. */
>> +		if (is_metric_idx(i) || (i == INTEL_PMC_IDX_FIXED_VLBR))
>> +			continue;
>> +		else if (i >= INTEL_PMC_IDX_FIXED)
>> +			wrmsrl(MSR_ARCH_PERFMON_FIXED_CTR0 + (i - INTEL_PMC_IDX_FIXED), 0);
>> +		else
>> +			wrmsrl(x86_pmu_event_addr(i), 0);
>> +	}
>> +
>> +	bitmap_zero(cpuc->dirty, X86_PMC_IDX_MAX);
>> +}
>> +
>> +static void x86_pmu_check_leakage(void)
>> +{
>> +	if (READ_ONCE(x86_pmu.attr_rdpmc))
>> +		x86_pmu_clear_dirty_counters();
>> +}
>> +
>>   static int x86_pmu_aux_output_match(struct perf_event *event)
>>   {
>>   	if (!(pmu.capabilities & PERF_PMU_CAP_AUX_OUTPUT))
>> @@ -2675,6 +2708,7 @@ static struct pmu pmu = {
>>   	.sched_task		= x86_pmu_sched_task,
>>   	.swap_task_ctx		= x86_pmu_swap_task_ctx,
>>   	.check_period		= x86_pmu_check_period,
>> +	.check_leakage		= x86_pmu_check_leakage,
>>
>>   	.aux_output_match	= x86_pmu_aux_output_match,
>>
>> diff --git a/arch/x86/events/perf_event.h b/arch/x86/events/perf_event.h
>> index 27fa85e..d6003e0 100644
>> --- a/arch/x86/events/perf_event.h
>> +++ b/arch/x86/events/perf_event.h
>> @@ -229,6 +229,7 @@ struct cpu_hw_events {
>>   	 */
>>   	struct perf_event	*events[X86_PMC_IDX_MAX]; /* in counter order */
>>   	unsigned long		active_mask[BITS_TO_LONGS(X86_PMC_IDX_MAX)];
>> +	unsigned long		dirty[BITS_TO_LONGS(X86_PMC_IDX_MAX)];
>>   	int			enabled;
>>
>>   	int			n_events; /* the # of events in the below arrays */
>> diff --git a/include/linux/perf_event.h b/include/linux/perf_event.h
>> index a763928..bcf3964 100644
>> --- a/include/linux/perf_event.h
>> +++ b/include/linux/perf_event.h
>> @@ -514,6 +514,11 @@ struct pmu {
>>   	 * Check period value for PERF_EVENT_IOC_PERIOD ioctl.
>>   	 */
>>   	int (*check_period)		(struct perf_event *event, u64 value); /* optional */
>> +
>> +	/*
>> +	 * Check and clear dirty counters to prevent potential leakage
>> +	 */
>> +	void (*check_leakage)		(void); /* optional */
>>   };
>>
>>   enum perf_addr_filter_action_t {
>> diff --git a/kernel/events/core.c b/kernel/events/core.c
>> index 928b166..b496113 100644
>> --- a/kernel/events/core.c
>> +++ b/kernel/events/core.c
>> @@ -3822,6 +3822,12 @@ static void cpu_ctx_sched_in(struct perf_cpu_context
>> *cpuctx,
>>   	ctx_sched_in(ctx, cpuctx, event_type, task);
>>   }
>>
>> +static bool has_check_leakage(struct pmu *pmu)
>> +{
>> +	return pmu->check_leakage && current->mm &&
>> +	       atomic_read(&current->mm->context.perf_rdpmc_allowed);
> 
> context.perf_rdpmc_allowed is arch specific.

Thanks. It's addressed in V6.

> 
>> +}
>> +
>>   static void perf_event_context_sched_in(struct perf_event_context *ctx,
>>   					struct task_struct *task)
>>   {
>> @@ -3832,6 +3838,8 @@ static void perf_event_context_sched_in(struct
>> perf_event_context *ctx,
>>   	if (cpuctx->task_ctx == ctx) {
>>   		if (cpuctx->sched_cb_usage)
>>   			__perf_pmu_sched_task(cpuctx, true);
>> +		if (has_check_leakage(pmu))
>> +			pmu->check_leakage();
>>   		return;
>>   	}

The above check is also removed in V6, because it's unnecessary for the 
identical contexts.

Thanks,
Kan

^ permalink raw reply	[flat|nested] 5+ messages in thread

end of thread, other threads:[~2021-04-22 18:47 UTC | newest]

Thread overview: 5+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2021-04-20 22:30 [PATCH V5] perf/x86: Reset the dirty counter to prevent the leak for an RDPMC task kan.liang
2021-04-21  8:11 ` Peter Zijlstra
2021-04-21 15:12   ` Liang, Kan
2021-04-22 17:52     ` Rob Herring
2021-04-22 18:47       ` Liang, Kan

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.