linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [PATCH V7 1/2] perf: Track per-PMU sched_task() callback users
@ 2021-05-13 14:23 kan.liang
  2021-05-13 14:23 ` [PATCH V7 2/2] perf/x86: Reset the dirty counter to prevent the leak for an RDPMC task kan.liang
  2021-05-13 14:42 ` [PATCH V7 1/2] perf: Track per-PMU sched_task() callback users Peter Zijlstra
  0 siblings, 2 replies; 9+ messages in thread
From: kan.liang @ 2021-05-13 14:23 UTC (permalink / raw)
  To: peterz, mingo, linux-kernel
  Cc: robh, ak, acme, mark.rutland, luto, eranian, namhyung, Kan Liang

From: Kan Liang <kan.liang@linux.intel.com>

Current perf only tracks the per-CPU sched_task() callback users, which
doesn't work if a callback user is a task. For example, the dirty
counters have to be cleared to prevent data leakage when a new RDPMC
task is scheduled in. The task may be created on one CPU but running on
another CPU. It cannot be tracked by the per-CPU variable. A global
variable is not going to work either because of the hybrid PMUs.
Add a per-PMU variable to track the callback users.

In theory, the per-PMU variable should be checked everywhere the
sched_task() can be called. But the X86 RDPMC is the only user for the
per-PMU sched_cb_usage. A callback for the X86 RDPMC is required only
when a different context is scheduled in. To avoid unnecessary
sched_task() invoke, the per-PMU sched_cb_usage is only checked there.
Should there be any other ARCHs which require it in the other places,
it can be added later separately.

Suggested-by: Rob Herring <robh@kernel.org>
Signed-off-by: Kan Liang <kan.liang@linux.intel.com>
---

- New patch. Split the V6 to core and x86 parts.

 include/linux/perf_event.h | 3 +++
 kernel/events/core.c       | 9 ++++++++-
 2 files changed, 11 insertions(+), 1 deletion(-)

diff --git a/include/linux/perf_event.h b/include/linux/perf_event.h
index c8a3388..c6ee202 100644
--- a/include/linux/perf_event.h
+++ b/include/linux/perf_event.h
@@ -301,6 +301,9 @@ struct pmu {
 	/* number of address filters this PMU can do */
 	unsigned int			nr_addr_filters;
 
+	/* Track the per PMU sched_task() callback users */
+	atomic_t			sched_cb_usage;
+
 	/*
 	 * Fully disable/enable this PMU, can be used to protect from the PMI
 	 * as well as for lazy/batch writing of the MSRs.
diff --git a/kernel/events/core.c b/kernel/events/core.c
index 1574b70..286b718 100644
--- a/kernel/events/core.c
+++ b/kernel/events/core.c
@@ -3851,7 +3851,14 @@ static void perf_event_context_sched_in(struct perf_event_context *ctx,
 		cpu_ctx_sched_out(cpuctx, EVENT_FLEXIBLE);
 	perf_event_sched_in(cpuctx, ctx, task);
 
-	if (cpuctx->sched_cb_usage && pmu->sched_task)
+	/*
+	 * X86 RDPMC is the only user for the per-PMU sched_cb_usage.
+	 * A callback for the X86 RDPMC is required only when a
+	 * different context is scheduled in.
+	 * To avoid unnecessary sched_task() invoke, the per-PMU
+	 * sched_cb_usage is only checked here.
+	 */
+	if (pmu->sched_task && (cpuctx->sched_cb_usage || atomic_read(&pmu->sched_cb_usage)))
 		pmu->sched_task(cpuctx->task_ctx, true);
 
 	perf_pmu_enable(pmu);
-- 
2.7.4


^ permalink raw reply related	[flat|nested] 9+ messages in thread

* [PATCH V7 2/2] perf/x86: Reset the dirty counter to prevent the leak for an RDPMC task
  2021-05-13 14:23 [PATCH V7 1/2] perf: Track per-PMU sched_task() callback users kan.liang
@ 2021-05-13 14:23 ` kan.liang
  2021-05-13 15:02   ` Peter Zijlstra
  2021-05-13 14:42 ` [PATCH V7 1/2] perf: Track per-PMU sched_task() callback users Peter Zijlstra
  1 sibling, 1 reply; 9+ messages in thread
From: kan.liang @ 2021-05-13 14:23 UTC (permalink / raw)
  To: peterz, mingo, linux-kernel
  Cc: robh, ak, acme, mark.rutland, luto, eranian, namhyung, Kan Liang

From: Kan Liang <kan.liang@linux.intel.com>

The counter value of a perf task may leak to another RDPMC task.
For example, a perf stat task as below is running on CPU 0.

    perf stat -e 'branches,cycles' -- taskset -c 0 ./workload

In the meantime, an RDPMC task, which is also running on CPU 0, may read
the GP counters periodically. (The RDPMC task creates a fixed event,
but read four GP counters.)

    $./rdpmc_read_all_counters
    index 0x0 value 0x8001e5970f99
    index 0x1 value 0x8005d750edb6
    index 0x2 value 0x0
    index 0x3 value 0x0

    index 0x0 value 0x8002358e48a5
    index 0x1 value 0x8006bd1e3bc9
    index 0x2 value 0x0
    index 0x3 value 0x0

It is a potential security issue. Once the attacker knows what the other
thread is counting. The PerfMon counter can be used as a side-channel to
attack cryptosystems.

The counter value of the perf stat task leaks to the RDPMC task because
perf never clears the counter when it's stopped.

Two methods were considered to address the issue.
- Unconditionally reset the counter in x86_pmu_del(). It can bring extra
  overhead even when there is no RDPMC task running.
- Only reset the un-assigned dirty counters when the RDPMC task is
  scheduled in. The method is implemented here.

The dirty counter is a counter, on which the assigned event has been
deleted, but the counter is not reset. To track the dirty counters,
add a 'dirty' variable in the struct cpu_hw_events.

The current code doesn't reset the counter when the assigned event is
deleted. Set the corresponding bit in the 'dirty' variable in
x86_pmu_del(), if the RDPMC feature is available on the system.

The security issue can only be found with an RDPMC task. The event for
an RDPMC task requires the mmap buffer. This can be used to detect an
RDPMC task. Once the event is detected in the event_mapped(), enable
sched_task(), which is invoked in each context switch. Add a check in
the sched_task() to clear the dirty counters, when the RDPMC task is
scheduled in. Only the current un-assigned dirty counters are reset,
bacuase the RDPMC assigned dirty counters will be updated soon.

The RDPMC instruction is also supported on the older platforms. Add
sched_task() for the core_pmu. The core_pmu doesn't support large PEBS
and LBR callstack, the intel_pmu_pebs/lbr_sched_task() will be ignored.

The RDPMC is not an Intel-only feature. Add the changes in the X86
generic code.

After applying the patch,

        $ ./rdpmc_read_all_counters
        index 0x0 value 0x0
        index 0x1 value 0x0
        index 0x2 value 0x0
        index 0x3 value 0x0

        index 0x0 value 0x0
        index 0x1 value 0x0
        index 0x2 value 0x0
        index 0x3 value 0x0

Performance

The performance of a context switch only be impacted when there are two
or more perf users and one of the users must be an RDPMC user. In other
cases, there is no performance impact.

The worst-case occurs when there are two users: the RDPMC user only
applies one counter; while the other user applies all available
counters. When the RDPMC task is scheduled in, all the counters, other
than the RDPMC assigned one, have to be reset.

Here is the test result for the worst-case.

The test is implemented on an Ice Lake platform, which has 8 GP
counters and 3 fixed counters (Not include SLOTS counter).

The lat_ctx is used to measure the context switching time.

    lat_ctx -s 128K -N 1000 processes 2

I instrument the lat_ctx to open all 8 GP counters and 3 fixed
counters for one task. The other task opens a fixed counter and enable
RDPMC.

Without the patch:
The context switch time is 4.97 us

With the patch:
The context switch time is 5.16 us

There is ~4% performance drop for the context switching time in the
worst-case.

Suggested-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Signed-off-by: Kan Liang <kan.liang@linux.intel.com>
---

Changes since V6:
- Drop the new method check_leakage()
- Update per-PMU sched_cb_usage in mmap/munmap().
- Clear dirty counters in mmap() for the first RDPMC read.

Changes since V5:
- Don't update perf_sched_cb_{inc,dec} in mmap/munmap().
  Don't check and clear dirty counters in sched_task().
  Because perf_sched_cb_{inc,dec} modify per-CPU state. The mmap() and
  munmap() can be invoked in different CPU.
- Add a new method check_leakage() to check and clear dirty counters
  to prevent potential leakage.

Changes since V4:
- Fix the warning with CONFIG_DEBUG_PREEMPT=y
  Disable the interrupts and preemption around perf_sched_cb_inc/dec()
  to protect the sched_cb_list. I don't think we touch the area in NMI.
  Disabling the interrupts should be good enough to protect the cpuctx.
  We don't need perf_ctx_lock().

Changes since V3:
- Fix warnings reported by kernel test robot <lkp@intel.com>
- Move bitmap_empty() check after clearing assigned counters.
  It should be very likely that the cpuc->dirty is non-empty.
  Move it after the clearing can skip the for_each_set_bit() and
  bitmap_zero().

The V2 can be found here.
https://lore.kernel.org/lkml/20200821195754.20159-3-kan.liang@linux.intel.com/

Changes since V2:
- Unconditionally set cpuc->dirty. The worst case for an RDPMC task is
  that we may have to clear all counters for the first time in
  x86_pmu_event_mapped. After that, the sched_task() will clear/update
  the 'dirty'. Only the real 'dirty' counters are clear. For a non-RDPMC
  task, it's harmless to unconditionally set the cpuc->dirty.
- Remove the !is_sampling_event() check
- Move the code into X86 generic file, because RDPMC is not a Intel-only
  feature.

Changes since V1:
- Drop the old method, which unconditionally reset the counter in
  x86_pmu_del().
  Only reset the dirty counters when a RDPMC task is sheduled in.

 arch/x86/events/core.c       | 52 +++++++++++++++++++++++++++++++++++++++++++-
 arch/x86/events/perf_event.h |  1 +
 2 files changed, 52 insertions(+), 1 deletion(-)

diff --git a/arch/x86/events/core.c b/arch/x86/events/core.c
index c6fedd2..b752376 100644
--- a/arch/x86/events/core.c
+++ b/arch/x86/events/core.c
@@ -1636,6 +1636,8 @@ static void x86_pmu_del(struct perf_event *event, int flags)
 	if (cpuc->txn_flags & PERF_PMU_TXN_ADD)
 		goto do_del;
 
+	__set_bit(event->hw.idx, cpuc->dirty);
+
 	/*
 	 * Not a TXN, therefore cleanup properly.
 	 */
@@ -2484,12 +2486,50 @@ static int x86_pmu_event_init(struct perf_event *event)
 	return err;
 }
 
+static void x86_pmu_clear_dirty_counters(void)
+{
+	struct cpu_hw_events *cpuc = this_cpu_ptr(&cpu_hw_events);
+	int i;
+
+	 /* Don't need to clear the assigned counter. */
+	for (i = 0; i < cpuc->n_events; i++)
+		__clear_bit(cpuc->assign[i], cpuc->dirty);
+
+	if (bitmap_empty(cpuc->dirty, X86_PMC_IDX_MAX))
+		return;
+
+	for_each_set_bit(i, cpuc->dirty, X86_PMC_IDX_MAX) {
+		/* Metrics and fake events don't have corresponding HW counters. */
+		if (is_metric_idx(i) || (i == INTEL_PMC_IDX_FIXED_VLBR))
+			continue;
+		else if (i >= INTEL_PMC_IDX_FIXED)
+			wrmsrl(MSR_ARCH_PERFMON_FIXED_CTR0 + (i - INTEL_PMC_IDX_FIXED), 0);
+		else
+			wrmsrl(x86_pmu_event_addr(i), 0);
+	}
+
+	bitmap_zero(cpuc->dirty, X86_PMC_IDX_MAX);
+}
+
 static void x86_pmu_event_mapped(struct perf_event *event, struct mm_struct *mm)
 {
+	unsigned long flags;
+
 	if (!(event->hw.flags & PERF_X86_EVENT_RDPMC_ALLOWED))
 		return;
 
 	/*
+	 * Enable sched_task() for the RDPMC task,
+	 * and clear the existing dirty counters.
+	 */
+	if (x86_pmu.sched_task && event->hw.target) {
+		atomic_inc(&event->pmu->sched_cb_usage);
+		local_irq_save(flags);
+		x86_pmu_clear_dirty_counters();
+		local_irq_restore(flags);
+	}
+
+	/*
 	 * This function relies on not being called concurrently in two
 	 * tasks in the same mm.  Otherwise one task could observe
 	 * perf_rdpmc_allowed > 1 and return all the way back to
@@ -2507,10 +2547,12 @@ static void x86_pmu_event_mapped(struct perf_event *event, struct mm_struct *mm)
 
 static void x86_pmu_event_unmapped(struct perf_event *event, struct mm_struct *mm)
 {
-
 	if (!(event->hw.flags & PERF_X86_EVENT_RDPMC_ALLOWED))
 		return;
 
+	if (x86_pmu.sched_task && event->hw.target)
+		atomic_dec(&event->pmu->sched_cb_usage);
+
 	if (atomic_dec_and_test(&mm->context.perf_rdpmc_allowed))
 		on_each_cpu_mask(mm_cpumask(mm), cr4_update_pce, NULL, 1);
 }
@@ -2616,6 +2658,14 @@ static const struct attribute_group *x86_pmu_attr_groups[] = {
 static void x86_pmu_sched_task(struct perf_event_context *ctx, bool sched_in)
 {
 	static_call_cond(x86_pmu_sched_task)(ctx, sched_in);
+
+	/*
+	 * If a new task has the RDPMC enabled, clear the dirty counters
+	 * to prevent the potential leak.
+	 */
+	if (sched_in && ctx && READ_ONCE(x86_pmu.attr_rdpmc) &&
+	    current->mm && atomic_read(&current->mm->context.perf_rdpmc_allowed))
+		x86_pmu_clear_dirty_counters();
 }
 
 static void x86_pmu_swap_task_ctx(struct perf_event_context *prev,
diff --git a/arch/x86/events/perf_event.h b/arch/x86/events/perf_event.h
index 10c8171..55bd891 100644
--- a/arch/x86/events/perf_event.h
+++ b/arch/x86/events/perf_event.h
@@ -229,6 +229,7 @@ struct cpu_hw_events {
 	 */
 	struct perf_event	*events[X86_PMC_IDX_MAX]; /* in counter order */
 	unsigned long		active_mask[BITS_TO_LONGS(X86_PMC_IDX_MAX)];
+	unsigned long		dirty[BITS_TO_LONGS(X86_PMC_IDX_MAX)];
 	int			enabled;
 
 	int			n_events; /* the # of events in the below arrays */
-- 
2.7.4


^ permalink raw reply related	[flat|nested] 9+ messages in thread

* Re: [PATCH V7 1/2] perf: Track per-PMU sched_task() callback users
  2021-05-13 14:23 [PATCH V7 1/2] perf: Track per-PMU sched_task() callback users kan.liang
  2021-05-13 14:23 ` [PATCH V7 2/2] perf/x86: Reset the dirty counter to prevent the leak for an RDPMC task kan.liang
@ 2021-05-13 14:42 ` Peter Zijlstra
  1 sibling, 0 replies; 9+ messages in thread
From: Peter Zijlstra @ 2021-05-13 14:42 UTC (permalink / raw)
  To: kan.liang
  Cc: mingo, linux-kernel, robh, ak, acme, mark.rutland, luto, eranian,
	namhyung

On Thu, May 13, 2021 at 07:23:01AM -0700, kan.liang@linux.intel.com wrote:
> From: Kan Liang <kan.liang@linux.intel.com>
> 
> Current perf only tracks the per-CPU sched_task() callback users, which
> doesn't work if a callback user is a task. For example, the dirty
> counters have to be cleared to prevent data leakage when a new RDPMC
> task is scheduled in. The task may be created on one CPU but running on
> another CPU. It cannot be tracked by the per-CPU variable. A global
> variable is not going to work either because of the hybrid PMUs.
> Add a per-PMU variable to track the callback users.
> 
> In theory, the per-PMU variable should be checked everywhere the
> sched_task() can be called. But the X86 RDPMC is the only user for the
> per-PMU sched_cb_usage. A callback for the X86 RDPMC is required only
> when a different context is scheduled in. To avoid unnecessary
> sched_task() invoke, the per-PMU sched_cb_usage is only checked there.
> Should there be any other ARCHs which require it in the other places,
> it can be added later separately.
> 
> Suggested-by: Rob Herring <robh@kernel.org>
> Signed-off-by: Kan Liang <kan.liang@linux.intel.com>
> ---
> 
> - New patch. Split the V6 to core and x86 parts.
> 
>  include/linux/perf_event.h | 3 +++
>  kernel/events/core.c       | 9 ++++++++-
>  2 files changed, 11 insertions(+), 1 deletion(-)
> 
> diff --git a/include/linux/perf_event.h b/include/linux/perf_event.h
> index c8a3388..c6ee202 100644
> --- a/include/linux/perf_event.h
> +++ b/include/linux/perf_event.h
> @@ -301,6 +301,9 @@ struct pmu {
>  	/* number of address filters this PMU can do */
>  	unsigned int			nr_addr_filters;
>  
> +	/* Track the per PMU sched_task() callback users */
> +	atomic_t			sched_cb_usage;
> +
>  	/*
>  	 * Fully disable/enable this PMU, can be used to protect from the PMI
>  	 * as well as for lazy/batch writing of the MSRs.
> diff --git a/kernel/events/core.c b/kernel/events/core.c
> index 1574b70..286b718 100644
> --- a/kernel/events/core.c
> +++ b/kernel/events/core.c
> @@ -3851,7 +3851,14 @@ static void perf_event_context_sched_in(struct perf_event_context *ctx,
>  		cpu_ctx_sched_out(cpuctx, EVENT_FLEXIBLE);
>  	perf_event_sched_in(cpuctx, ctx, task);
>  
> -	if (cpuctx->sched_cb_usage && pmu->sched_task)
> +	/*
> +	 * X86 RDPMC is the only user for the per-PMU sched_cb_usage.

I think we can do without this line; since we know ARM64 also
potentially wants this.

> +	 * A callback for the X86 RDPMC is required only when a

Also, I think we spell it: x86.

> +	 * different context is scheduled in.
> +	 * To avoid unnecessary sched_task() invoke, the per-PMU
> +	 * sched_cb_usage is only checked here.
> +	 */
> +	if (pmu->sched_task && (cpuctx->sched_cb_usage || atomic_read(&pmu->sched_cb_usage)))
>  		pmu->sched_task(cpuctx->task_ctx, true);
>  
>  	perf_pmu_enable(pmu);

I'll sit on these patches a wee bit until Rob has provided feedback, but
I'm thinking this should do.

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: [PATCH V7 2/2] perf/x86: Reset the dirty counter to prevent the leak for an RDPMC task
  2021-05-13 14:23 ` [PATCH V7 2/2] perf/x86: Reset the dirty counter to prevent the leak for an RDPMC task kan.liang
@ 2021-05-13 15:02   ` Peter Zijlstra
  2021-05-13 22:14     ` Liang, Kan
  0 siblings, 1 reply; 9+ messages in thread
From: Peter Zijlstra @ 2021-05-13 15:02 UTC (permalink / raw)
  To: kan.liang
  Cc: mingo, linux-kernel, robh, ak, acme, mark.rutland, luto, eranian,
	namhyung

On Thu, May 13, 2021 at 07:23:02AM -0700, kan.liang@linux.intel.com wrote:

> +	if (x86_pmu.sched_task && event->hw.target) {
> +		atomic_inc(&event->pmu->sched_cb_usage);
> +		local_irq_save(flags);
> +		x86_pmu_clear_dirty_counters();
> +		local_irq_restore(flags);
> +	}

So what happens if our mmap() happens after we've already created two
(or more) threads in the process, all of who already have a counter (or
more) on?

Shouldn't this be something like?

--- a/arch/x86/events/core.c
+++ b/arch/x86/events/core.c
@@ -2474,7 +2474,7 @@ static int x86_pmu_event_init(struct per
 	return err;
 }
 
-static void x86_pmu_clear_dirty_counters(void)
+static void x86_pmu_clear_dirty_counters(void *unused)
 {
 	struct cpu_hw_events *cpuc = this_cpu_ptr(&cpu_hw_events);
 	int i;
@@ -2512,9 +2512,9 @@ static void x86_pmu_event_mapped(struct
 	 */
 	if (x86_pmu.sched_task && event->hw.target) {
 		atomic_inc(&event->pmu->sched_cb_usage);
-		local_irq_save(flags);
-		x86_pmu_clear_dirty_counters();
-		local_irq_restore(flags);
+		on_each_cpu_mask(mm_cpumask(mm),
+				 x86_pmu_clear_dirty_counters,
+				 NULL, true);
 	}
 
 	/*
@@ -2653,7 +2653,7 @@ static void x86_pmu_sched_task(struct pe
 	 */
 	if (sched_in && ctx && READ_ONCE(x86_pmu.attr_rdpmc) &&
 	    current->mm && atomic_read(&current->mm->context.perf_rdpmc_allowed))
-		x86_pmu_clear_dirty_counters();
+		x86_pmu_clear_dirty_counters(NULL);
 }
 
 static void x86_pmu_swap_task_ctx(struct perf_event_context *prev,

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: [PATCH V7 2/2] perf/x86: Reset the dirty counter to prevent the leak for an RDPMC task
  2021-05-13 15:02   ` Peter Zijlstra
@ 2021-05-13 22:14     ` Liang, Kan
  2021-05-14  3:50       ` Rob Herring
  2021-05-14 14:44       ` Peter Zijlstra
  0 siblings, 2 replies; 9+ messages in thread
From: Liang, Kan @ 2021-05-13 22:14 UTC (permalink / raw)
  To: Peter Zijlstra
  Cc: mingo, linux-kernel, robh, ak, acme, mark.rutland, luto, eranian,
	namhyung



On 5/13/2021 11:02 AM, Peter Zijlstra wrote:
> On Thu, May 13, 2021 at 07:23:02AM -0700, kan.liang@linux.intel.com wrote:
> 
>> +	if (x86_pmu.sched_task && event->hw.target) {
>> +		atomic_inc(&event->pmu->sched_cb_usage);
>> +		local_irq_save(flags);
>> +		x86_pmu_clear_dirty_counters();
>> +		local_irq_restore(flags);
>> +	}
> 
> So what happens if our mmap() happens after we've already created two
> (or more) threads in the process, all of who already have a counter (or
> more) on?
> 
> Shouldn't this be something like?

That's not enough.

I implemented a test case as below:
- The main thread A creates a new thread B.
- Bind the thread A to CPU 0. Then the thread A opens a event, mmap, 
enable the event, and sleep.
- Bind the thread B to CPU 1. Wait until the event in the thread A is 
enabled. Then RDPMC can read the counters on CPU 1.

In the x86_pmu_event_mapped(), we do on_each_cpu_mask(mm_cpumask(mm), 
cr4_update_pce, NULL, 1);
The RDPMC from thread B on CPU 1 is not forbidden.
Since the counter is not created in thread B, the sched_task() never 
gets a chance to be invoked. The dirty counter is not cleared.

To fix it, I think we have to move the cr4_update_pce() to the context 
switch, and update it only when the RDPMC task is scheduled. But it 
probably brings some overhead.

What do you think?

Thanks,
Kan

> 
> --- a/arch/x86/events/core.c
> +++ b/arch/x86/events/core.c
> @@ -2474,7 +2474,7 @@ static int x86_pmu_event_init(struct per
>   	return err;
>   }
>   
> -static void x86_pmu_clear_dirty_counters(void)
> +static void x86_pmu_clear_dirty_counters(void *unused)
>   {
>   	struct cpu_hw_events *cpuc = this_cpu_ptr(&cpu_hw_events);
>   	int i;
> @@ -2512,9 +2512,9 @@ static void x86_pmu_event_mapped(struct
>   	 */
>   	if (x86_pmu.sched_task && event->hw.target) {
>   		atomic_inc(&event->pmu->sched_cb_usage);
> -		local_irq_save(flags);
> -		x86_pmu_clear_dirty_counters();
> -		local_irq_restore(flags);
> +		on_each_cpu_mask(mm_cpumask(mm),
> +				 x86_pmu_clear_dirty_counters,
> +				 NULL, true);
>   	}
>   
>   	/*
> @@ -2653,7 +2653,7 @@ static void x86_pmu_sched_task(struct pe
>   	 */
>   	if (sched_in && ctx && READ_ONCE(x86_pmu.attr_rdpmc) &&
>   	    current->mm && atomic_read(&current->mm->context.perf_rdpmc_allowed))
> -		x86_pmu_clear_dirty_counters();
> +		x86_pmu_clear_dirty_counters(NULL);
>   }
>   
>   static void x86_pmu_swap_task_ctx(struct perf_event_context *prev,
> 

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: [PATCH V7 2/2] perf/x86: Reset the dirty counter to prevent the leak for an RDPMC task
  2021-05-13 22:14     ` Liang, Kan
@ 2021-05-14  3:50       ` Rob Herring
  2021-05-14 13:48         ` Liang, Kan
  2021-05-14 14:44       ` Peter Zijlstra
  1 sibling, 1 reply; 9+ messages in thread
From: Rob Herring @ 2021-05-14  3:50 UTC (permalink / raw)
  To: Liang, Kan
  Cc: Peter Zijlstra, Ingo Molnar, linux-kernel, Andi Kleen,
	Arnaldo Carvalho de Melo, Mark Rutland, Andy Lutomirski,
	Stephane Eranian, Namhyung Kim

On Thu, May 13, 2021 at 5:14 PM Liang, Kan <kan.liang@linux.intel.com> wrote:
>
>
>
> On 5/13/2021 11:02 AM, Peter Zijlstra wrote:
> > On Thu, May 13, 2021 at 07:23:02AM -0700, kan.liang@linux.intel.com wrote:
> >
> >> +    if (x86_pmu.sched_task && event->hw.target) {
> >> +            atomic_inc(&event->pmu->sched_cb_usage);
> >> +            local_irq_save(flags);
> >> +            x86_pmu_clear_dirty_counters();
> >> +            local_irq_restore(flags);
> >> +    }
> >
> > So what happens if our mmap() happens after we've already created two
> > (or more) threads in the process, all of who already have a counter (or
> > more) on?
> >
> > Shouldn't this be something like?
>
> That's not enough.
>
> I implemented a test case as below:
> - The main thread A creates a new thread B.
> - Bind the thread A to CPU 0. Then the thread A opens a event, mmap,
> enable the event, and sleep.
> - Bind the thread B to CPU 1. Wait until the event in the thread A is
> enabled. Then RDPMC can read the counters on CPU 1.
>
> In the x86_pmu_event_mapped(), we do on_each_cpu_mask(mm_cpumask(mm),
> cr4_update_pce, NULL, 1);
> The RDPMC from thread B on CPU 1 is not forbidden.

You want RDPMC disabled since the counters are not cleared? If you had
a cpu bound event for CPU1, then you'd want RDPMC enabled, right?

> Since the counter is not created in thread B, the sched_task() never
> gets a chance to be invoked. The dirty counter is not cleared.
>
> To fix it, I think we have to move the cr4_update_pce() to the context
> switch, and update it only when the RDPMC task is scheduled. But it
> probably brings some overhead.

I'm trying to do a similar approach (if I understand what you mean)
using sched_task() without a switch_mm hook or IPIs. The current
branch is here[1]. I have things working for task bound events, but I
don't think cpu bound events are handled for similar reasons as above.
I'm not too sure that enabling user access for cpu bound events is
really all that useful? Maybe for Arm we should just keep user access
for cpu bound events disabled.

Note for now I'm not doing lazy clearing of counters for simplicity.

Rob

[1] git://git.kernel.org/pub/scm/linux/kernel/git/robh/linux.git
user-perf-event-v8

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: [PATCH V7 2/2] perf/x86: Reset the dirty counter to prevent the leak for an RDPMC task
  2021-05-14  3:50       ` Rob Herring
@ 2021-05-14 13:48         ` Liang, Kan
  0 siblings, 0 replies; 9+ messages in thread
From: Liang, Kan @ 2021-05-14 13:48 UTC (permalink / raw)
  To: Rob Herring
  Cc: Peter Zijlstra, Ingo Molnar, linux-kernel, Andi Kleen,
	Arnaldo Carvalho de Melo, Mark Rutland, Andy Lutomirski,
	Stephane Eranian, Namhyung Kim



On 5/13/2021 11:50 PM, Rob Herring wrote:
> On Thu, May 13, 2021 at 5:14 PM Liang, Kan <kan.liang@linux.intel.com> wrote:
>>
>>
>>
>> On 5/13/2021 11:02 AM, Peter Zijlstra wrote:
>>> On Thu, May 13, 2021 at 07:23:02AM -0700, kan.liang@linux.intel.com wrote:
>>>
>>>> +    if (x86_pmu.sched_task && event->hw.target) {
>>>> +            atomic_inc(&event->pmu->sched_cb_usage);
>>>> +            local_irq_save(flags);
>>>> +            x86_pmu_clear_dirty_counters();
>>>> +            local_irq_restore(flags);
>>>> +    }
>>>
>>> So what happens if our mmap() happens after we've already created two
>>> (or more) threads in the process, all of who already have a counter (or
>>> more) on?
>>>
>>> Shouldn't this be something like?
>>
>> That's not enough.
>>
>> I implemented a test case as below:
>> - The main thread A creates a new thread B.
>> - Bind the thread A to CPU 0. Then the thread A opens a event, mmap,
>> enable the event, and sleep.
>> - Bind the thread B to CPU 1. Wait until the event in the thread A is
>> enabled. Then RDPMC can read the counters on CPU 1.
>>
>> In the x86_pmu_event_mapped(), we do on_each_cpu_mask(mm_cpumask(mm),
>> cr4_update_pce, NULL, 1);
>> The RDPMC from thread B on CPU 1 is not forbidden.
> 
> You want RDPMC disabled since the counters are not cleared? If you had
> a cpu bound event for CPU1, then you'd want RDPMC enabled, right?
>

Since we are trying to use a lazy way to clear the counters, I think the 
RDPMC should be enabled only for a user who owns the counters.

For the above case, we only perf_event_open(pid = 0, cpu = -1) an event 
in the thread A. Perf should only monitor the thread A. The RDPMC should 
be enabled only when the thread A is scheduled in.
The thread B doesn't open any events. The RDPMC should be disabled for 
the thread B. Otherwise, it can read any counters on the CPU, including 
other task-bound events, which is what the patchset intends to prevent.


>> Since the counter is not created in thread B, the sched_task() never
>> gets a chance to be invoked. The dirty counter is not cleared.
>>
>> To fix it, I think we have to move the cr4_update_pce() to the context
>> switch, and update it only when the RDPMC task is scheduled. But it
>> probably brings some overhead.
> 
> I'm trying to do a similar approach (if I understand what you mean)
> using sched_task() without a switch_mm hook or IPIs. The current
> branch is here[1]. I have things working for task bound events, but I
> don't think cpu bound events are handled for similar reasons as above.
> I'm not too sure that enabling user access for cpu bound events is
> really all that useful? Maybe for Arm we should just keep user access
> for cpu bound events disabled.
> 
> Note for now I'm not doing lazy clearing of counters for simplicity.

If we don't use the lazy way, I think we can clear the counters when a 
task is scheduled out. I don't think we need to worry about the above case.

Thanks,
Kan

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: [PATCH V7 2/2] perf/x86: Reset the dirty counter to prevent the leak for an RDPMC task
  2021-05-13 22:14     ` Liang, Kan
  2021-05-14  3:50       ` Rob Herring
@ 2021-05-14 14:44       ` Peter Zijlstra
  2021-05-14 15:30         ` Liang, Kan
  1 sibling, 1 reply; 9+ messages in thread
From: Peter Zijlstra @ 2021-05-14 14:44 UTC (permalink / raw)
  To: Liang, Kan
  Cc: mingo, linux-kernel, robh, ak, acme, mark.rutland, luto, eranian,
	namhyung

On Thu, May 13, 2021 at 06:14:08PM -0400, Liang, Kan wrote:
> On 5/13/2021 11:02 AM, Peter Zijlstra wrote:
> > On Thu, May 13, 2021 at 07:23:02AM -0700, kan.liang@linux.intel.com wrote:
> > 
> > > +	if (x86_pmu.sched_task && event->hw.target) {
> > > +		atomic_inc(&event->pmu->sched_cb_usage);
> > > +		local_irq_save(flags);
> > > +		x86_pmu_clear_dirty_counters();
> > > +		local_irq_restore(flags);
> > > +	}
> > 
> > So what happens if our mmap() happens after we've already created two
> > (or more) threads in the process, all of who already have a counter (or
> > more) on?
> > 
> > Shouldn't this be something like?
> 
> That's not enough.
> 
> I implemented a test case as below:
> - The main thread A creates a new thread B.
> - Bind the thread A to CPU 0. Then the thread A opens a event, mmap, enable
> the event, and sleep.
> - Bind the thread B to CPU 1. Wait until the event in the thread A is
> enabled. Then RDPMC can read the counters on CPU 1.

This?

	A				B

	clone(CLONE_THREAD) --->	
	set_affine(0)
					set_affine(1)
					while (!event-enabled)
						;
	event = perf_event_open()
	mmap(event)
	ioctl(event, IOC_ENABLE); --->
					RDPMC

	sleep(n)
	  schedule(INTERRUPTIBLE)
	  /* idle */


> In the x86_pmu_event_mapped(), we do on_each_cpu_mask(mm_cpumask(mm),
> cr4_update_pce, NULL, 1);
> The RDPMC from thread B on CPU 1 is not forbidden.
> Since the counter is not created in thread B, the sched_task() never gets a
> chance to be invoked. The dirty counter is not cleared.

Per-task counters from CPU1 that ran before B ran?

> To fix it, I think we have to move the cr4_update_pce() to the context
> switch, and update it only when the RDPMC task is scheduled. But it probably
> brings some overhead.

We have CR4:PCE updates in the context switch path, see
switch_mm_irqs_off() -> cr4_update_pce_mm().

Doing the clear there might actually make sense and avoids this frobbing
of ->sched_task(). When we call cr4_update_pce_mm(), and @mm has rdpmc
on, clear dirty or something like that.

Worth a try.



^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: [PATCH V7 2/2] perf/x86: Reset the dirty counter to prevent the leak for an RDPMC task
  2021-05-14 14:44       ` Peter Zijlstra
@ 2021-05-14 15:30         ` Liang, Kan
  0 siblings, 0 replies; 9+ messages in thread
From: Liang, Kan @ 2021-05-14 15:30 UTC (permalink / raw)
  To: Peter Zijlstra
  Cc: mingo, linux-kernel, robh, ak, acme, mark.rutland, luto, eranian,
	namhyung



On 5/14/2021 10:44 AM, Peter Zijlstra wrote:
> On Thu, May 13, 2021 at 06:14:08PM -0400, Liang, Kan wrote:
>> On 5/13/2021 11:02 AM, Peter Zijlstra wrote:
>>> On Thu, May 13, 2021 at 07:23:02AM -0700, kan.liang@linux.intel.com wrote:
>>>
>>>> +	if (x86_pmu.sched_task && event->hw.target) {
>>>> +		atomic_inc(&event->pmu->sched_cb_usage);
>>>> +		local_irq_save(flags);
>>>> +		x86_pmu_clear_dirty_counters();
>>>> +		local_irq_restore(flags);
>>>> +	}
>>>
>>> So what happens if our mmap() happens after we've already created two
>>> (or more) threads in the process, all of who already have a counter (or
>>> more) on?
>>>
>>> Shouldn't this be something like?
>>
>> That's not enough.
>>
>> I implemented a test case as below:
>> - The main thread A creates a new thread B.
>> - Bind the thread A to CPU 0. Then the thread A opens a event, mmap, enable
>> the event, and sleep.
>> - Bind the thread B to CPU 1. Wait until the event in the thread A is
>> enabled. Then RDPMC can read the counters on CPU 1.
> 
> This?

Yes

> 
> 	A				B
> 
> 	clone(CLONE_THREAD) --->	
> 	set_affine(0)
> 					set_affine(1)
> 					while (!event-enabled)
> 						;
> 	event = perf_event_open()
> 	mmap(event)
> 	ioctl(event, IOC_ENABLE); --->
> 					RDPMC
> 
> 	sleep(n)
> 	  schedule(INTERRUPTIBLE)
> 	  /* idle */
> 
> 
>> In the x86_pmu_event_mapped(), we do on_each_cpu_mask(mm_cpumask(mm),
>> cr4_update_pce, NULL, 1);
>> The RDPMC from thread B on CPU 1 is not forbidden.
>> Since the counter is not created in thread B, the sched_task() never gets a
>> chance to be invoked. The dirty counter is not cleared.
> 
> Per-task counters from CPU1 that ran before B ran?

Yes

> 
>> To fix it, I think we have to move the cr4_update_pce() to the context
>> switch, and update it only when the RDPMC task is scheduled. But it probably
>> brings some overhead.
> 
> We have CR4:PCE updates in the context switch path, see
> switch_mm_irqs_off() -> cr4_update_pce_mm().
> 
> Doing the clear there might actually make sense and avoids this frobbing
> of ->sched_task(). When we call cr4_update_pce_mm(), and @mm has rdpmc
> on, clear dirty or something like that.
> 
> Worth a try.
> 
> 

Looks like a good place. Will try.

Thanks,
Kan

^ permalink raw reply	[flat|nested] 9+ messages in thread

end of thread, other threads:[~2021-05-14 15:30 UTC | newest]

Thread overview: 9+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2021-05-13 14:23 [PATCH V7 1/2] perf: Track per-PMU sched_task() callback users kan.liang
2021-05-13 14:23 ` [PATCH V7 2/2] perf/x86: Reset the dirty counter to prevent the leak for an RDPMC task kan.liang
2021-05-13 15:02   ` Peter Zijlstra
2021-05-13 22:14     ` Liang, Kan
2021-05-14  3:50       ` Rob Herring
2021-05-14 13:48         ` Liang, Kan
2021-05-14 14:44       ` Peter Zijlstra
2021-05-14 15:30         ` Liang, Kan
2021-05-13 14:42 ` [PATCH V7 1/2] perf: Track per-PMU sched_task() callback users Peter Zijlstra

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).