linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [PATCH V6] perf: Reset the dirty counter to prevent the leak for an RDPMC task
@ 2021-04-22 18:25 kan.liang
  2021-05-10 19:18 ` Peter Zijlstra
  0 siblings, 1 reply; 10+ messages in thread
From: kan.liang @ 2021-04-22 18:25 UTC (permalink / raw)
  To: peterz, mingo, linux-kernel
  Cc: ak, acme, mark.rutland, luto, eranian, namhyung, robh, Kan Liang

From: Kan Liang <kan.liang@linux.intel.com>

The counter value of a perf task may leak to another RDPMC task.
For example, a perf stat task as below is running on CPU 0.

    perf stat -e 'branches,cycles' -- taskset -c 0 ./workload

In the meantime, an RDPMC task, which is also running on CPU 0, may read
the GP counters periodically. (The RDPMC task creates a fixed event,
but read four GP counters.)

    $./rdpmc_read_all_counters
    index 0x0 value 0x8001e5970f99
    index 0x1 value 0x8005d750edb6
    index 0x2 value 0x0
    index 0x3 value 0x0

    index 0x0 value 0x8002358e48a5
    index 0x1 value 0x8006bd1e3bc9
    index 0x2 value 0x0
    index 0x3 value 0x0

It is a potential security issue. Once the attacker knows what the other
thread is counting. The PerfMon counter can be used as a side-channel to
attack cryptosystems.

The counter value of the perf stat task leaks to the RDPMC task because
perf never clears the counter when it's stopped.

Two methods were considered to address the issue.
- Unconditionally reset the counter in x86_pmu_del(). It can bring extra
  overhead even when there is no RDPMC task running.
- Only reset the un-assigned dirty counters when the RDPMC task is
  scheduled in. The method is implemented here.

The dirty counter is a counter, on which the assigned event has been
deleted, but the counter is not reset. To track the dirty counters,
add a 'dirty' variable in the struct cpu_hw_events.

The current code doesn't reset the counter when the assigned event is
deleted. Set the corresponding bit in the 'dirty' variable in
x86_pmu_del(), if the RDPMC feature is available on the system.

The security issue can only be found with an RDPMC task. It must update
the mm->context.perf_rdpmc_allowed, which can be used to detect an RDPMC
task.

Add a new check_leakage() method. When a RDPMC task is scheduled in,
the method is invoked to check and clear the dirty counters. Don't need
to invoke the method for the identical contexts, which have the same set
of events. Reset is not required.

The RDPMC is not an Intel-only feature. Add the changes in the X86
generic code. Only the un-assigned dirty counters are reset, bacause the
RDPMC assigned dirty counters will be updated soon.

After applying the patch,

        $ ./rdpmc_read_all_counters
        index 0x0 value 0x0
        index 0x1 value 0x0
        index 0x2 value 0x0
        index 0x3 value 0x0

        index 0x0 value 0x0
        index 0x1 value 0x0
        index 0x2 value 0x0
        index 0x3 value 0x0

Performance

The performance of a context switch only be impacted when there are two
or more perf users and one of the users must be an RDPMC user. In other
cases, there is no performance impact.

The worst-case occurs when there are two users: the RDPMC user only
applies one counter; while the other user applies all available
counters. When the RDPMC task is scheduled in, all the counters, other
than the RDPMC assigned one, have to be reset.

Here is the test result for the worst-case.

The test is implemented on an Ice Lake platform, which has 8 GP
counters and 3 fixed counters (Not include SLOTS counter).

The lat_ctx is used to measure the context switching time.

    lat_ctx -s 128K -N 1000 processes 2

I instrument the lat_ctx to open all 8 GP counters and 3 fixed
counters for one task. The other task opens a fixed counter and enable
RDPMC.

Without the patch:
The context switch time is 4.97 us

With the patch:
The context switch time is 5.16 us

There is ~4% performance drop for the context switching time in the
worst-case.

Suggested-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Signed-off-by: Kan Liang <kan.liang@linux.intel.com>
---
Changes since V5:
- Don't update perf_sched_cb_{inc,dec} in mmap/munmap().
  Don't check and clear dirty counters in sched_task().
  Because perf_sched_cb_{inc,dec} modify per-CPU state. The mmap() and
  munmap() can be invoked in different CPU.
- Add a new method check_leakage() to check and clear dirty counters
  to prevent potential leakage.

Changes since V4:
- Fix the warning with CONFIG_DEBUG_PREEMPT=y
  Disable the interrupts and preemption around perf_sched_cb_inc/dec()
  to protect the sched_cb_list. I don't think we touch the area in NMI.
  Disabling the interrupts should be good enough to protect the cpuctx.
  We don't need perf_ctx_lock().

Changes since V3:
- Fix warnings reported by kernel test robot <lkp@intel.com>
- Move bitmap_empty() check after clearing assigned counters.
  It should be very likely that the cpuc->dirty is non-empty.
  Move it after the clearing can skip the for_each_set_bit() and
  bitmap_zero().

The V2 can be found here.
https://lore.kernel.org/lkml/20200821195754.20159-3-kan.liang@linux.intel.com/

Changes since V2:
- Unconditionally set cpuc->dirty. The worst case for an RDPMC task is
  that we may have to clear all counters for the first time in
  x86_pmu_event_mapped. After that, the sched_task() will clear/update
  the 'dirty'. Only the real 'dirty' counters are clear. For a non-RDPMC
  task, it's harmless to unconditionally set the cpuc->dirty.
- Remove the !is_sampling_event() check
- Move the code into X86 generic file, because RDPMC is not a Intel-only
  feature.

Changes since V1:
- Drop the old method, which unconditionally reset the counter in
  x86_pmu_del().
  Only reset the dirty counters when a RDPMC task is sheduled in.
---
 arch/x86/events/core.c       | 35 +++++++++++++++++++++++++++++++++++
 arch/x86/events/perf_event.h |  1 +
 include/linux/perf_event.h   |  5 +++++
 kernel/events/core.c         |  2 ++
 4 files changed, 43 insertions(+)

diff --git a/arch/x86/events/core.c b/arch/x86/events/core.c
index c7fcc8d..08cb779 100644
--- a/arch/x86/events/core.c
+++ b/arch/x86/events/core.c
@@ -1624,6 +1624,8 @@ static void x86_pmu_del(struct perf_event *event, int flags)
 	if (cpuc->txn_flags & PERF_PMU_TXN_ADD)
 		goto do_del;
 
+	__set_bit(event->hw.idx, cpuc->dirty);
+
 	/*
 	 * Not a TXN, therefore cleanup properly.
 	 */
@@ -2631,6 +2633,38 @@ static int x86_pmu_check_period(struct perf_event *event, u64 value)
 	return 0;
 }
 
+static void x86_pmu_clear_dirty_counters(void)
+{
+	struct cpu_hw_events *cpuc = this_cpu_ptr(&cpu_hw_events);
+	int i;
+
+	 /* Don't need to clear the assigned counter. */
+	for (i = 0; i < cpuc->n_events; i++)
+		__clear_bit(cpuc->assign[i], cpuc->dirty);
+
+	if (bitmap_empty(cpuc->dirty, X86_PMC_IDX_MAX))
+		return;
+
+	for_each_set_bit(i, cpuc->dirty, X86_PMC_IDX_MAX) {
+		/* Metrics and fake events don't have corresponding HW counters. */
+		if (is_metric_idx(i) || (i == INTEL_PMC_IDX_FIXED_VLBR))
+			continue;
+		else if (i >= INTEL_PMC_IDX_FIXED)
+			wrmsrl(MSR_ARCH_PERFMON_FIXED_CTR0 + (i - INTEL_PMC_IDX_FIXED), 0);
+		else
+			wrmsrl(x86_pmu_event_addr(i), 0);
+	}
+
+	bitmap_zero(cpuc->dirty, X86_PMC_IDX_MAX);
+}
+
+static void x86_pmu_check_leakage(void)
+{
+	if (READ_ONCE(x86_pmu.attr_rdpmc) && current->mm &&
+	    atomic_read(&current->mm->context.perf_rdpmc_allowed))
+		x86_pmu_clear_dirty_counters();
+}
+
 static int x86_pmu_aux_output_match(struct perf_event *event)
 {
 	if (!(pmu.capabilities & PERF_PMU_CAP_AUX_OUTPUT))
@@ -2675,6 +2709,7 @@ static struct pmu pmu = {
 	.sched_task		= x86_pmu_sched_task,
 	.swap_task_ctx		= x86_pmu_swap_task_ctx,
 	.check_period		= x86_pmu_check_period,
+	.check_leakage		= x86_pmu_check_leakage,
 
 	.aux_output_match	= x86_pmu_aux_output_match,
 
diff --git a/arch/x86/events/perf_event.h b/arch/x86/events/perf_event.h
index 27fa85e..d6003e0 100644
--- a/arch/x86/events/perf_event.h
+++ b/arch/x86/events/perf_event.h
@@ -229,6 +229,7 @@ struct cpu_hw_events {
 	 */
 	struct perf_event	*events[X86_PMC_IDX_MAX]; /* in counter order */
 	unsigned long		active_mask[BITS_TO_LONGS(X86_PMC_IDX_MAX)];
+	unsigned long		dirty[BITS_TO_LONGS(X86_PMC_IDX_MAX)];
 	int			enabled;
 
 	int			n_events; /* the # of events in the below arrays */
diff --git a/include/linux/perf_event.h b/include/linux/perf_event.h
index a763928..bcf3964 100644
--- a/include/linux/perf_event.h
+++ b/include/linux/perf_event.h
@@ -514,6 +514,11 @@ struct pmu {
 	 * Check period value for PERF_EVENT_IOC_PERIOD ioctl.
 	 */
 	int (*check_period)		(struct perf_event *event, u64 value); /* optional */
+
+	/*
+	 * Check and clear dirty counters to prevent potential leakage
+	 */
+	void (*check_leakage)		(void); /* optional */
 };
 
 enum perf_addr_filter_action_t {
diff --git a/kernel/events/core.c b/kernel/events/core.c
index 928b166..692b94e 100644
--- a/kernel/events/core.c
+++ b/kernel/events/core.c
@@ -3858,6 +3858,8 @@ static void perf_event_context_sched_in(struct perf_event_context *ctx,
 
 	if (cpuctx->sched_cb_usage && pmu->sched_task)
 		pmu->sched_task(cpuctx->task_ctx, true);
+	if (pmu->check_leakage)
+		pmu->check_leakage();
 
 	perf_pmu_enable(pmu);
 
-- 
2.7.4


^ permalink raw reply related	[flat|nested] 10+ messages in thread

* Re: [PATCH V6] perf: Reset the dirty counter to prevent the leak for an RDPMC task
  2021-04-22 18:25 [PATCH V6] perf: Reset the dirty counter to prevent the leak for an RDPMC task kan.liang
@ 2021-05-10 19:18 ` Peter Zijlstra
  2021-05-10 20:29   ` Rob Herring
  0 siblings, 1 reply; 10+ messages in thread
From: Peter Zijlstra @ 2021-05-10 19:18 UTC (permalink / raw)
  To: kan.liang
  Cc: mingo, linux-kernel, ak, acme, mark.rutland, luto, eranian,
	namhyung, robh

On Thu, Apr 22, 2021 at 11:25:52AM -0700, kan.liang@linux.intel.com wrote:

> - Add a new method check_leakage() to check and clear dirty counters
>   to prevent potential leakage.

I really dislike adding spurious callbacks, also because indirect calls
are teh suck, but also because it pollutes the interface so.

That said, I'm not sure I actually like the below any better :/

---

 arch/x86/events/core.c       | 58 +++++++++++++++++++++++++++++++++++++++++---
 arch/x86/events/perf_event.h |  1 +
 include/linux/perf_event.h   |  2 ++
 kernel/events/core.c         |  7 +++++-
 4 files changed, 63 insertions(+), 5 deletions(-)

diff --git a/arch/x86/events/core.c b/arch/x86/events/core.c
index 8e509325c2c3..e650c4ab603a 100644
--- a/arch/x86/events/core.c
+++ b/arch/x86/events/core.c
@@ -740,21 +740,26 @@ void x86_pmu_enable_all(int added)
 	}
 }
 
-static inline int is_x86_event(struct perf_event *event)
+static inline bool is_x86_pmu(struct pmu *_pmu)
 {
 	int i;
 
 	if (!is_hybrid())
-		return event->pmu == &pmu;
+		return _pmu == &pmu;
 
 	for (i = 0; i < x86_pmu.num_hybrid_pmus; i++) {
-		if (event->pmu == &x86_pmu.hybrid_pmu[i].pmu)
+		if (_pmu == &x86_pmu.hybrid_pmu[i].pmu)
 			return true;
 	}
 
 	return false;
 }
 
+static inline int is_x86_event(struct perf_event *event)
+{
+	return is_x86_pmu(event->pmu);
+}
+
 struct pmu *x86_get_pmu(unsigned int cpu)
 {
 	struct cpu_hw_events *cpuc = &per_cpu(cpu_hw_events, cpu);
@@ -1624,6 +1629,8 @@ static void x86_pmu_del(struct perf_event *event, int flags)
 	if (cpuc->txn_flags & PERF_PMU_TXN_ADD)
 		goto do_del;
 
+	__set_bit(event->hw.idx, cpuc->dirty);
+
 	/*
 	 * Not a TXN, therefore cleanup properly.
 	 */
@@ -2472,6 +2479,31 @@ static int x86_pmu_event_init(struct perf_event *event)
 	return err;
 }
 
+static void x86_pmu_clear_dirty_counters(void)
+{
+	struct cpu_hw_events *cpuc = this_cpu_ptr(&cpu_hw_events);
+	int i;
+
+	 /* Don't need to clear the assigned counter. */
+	for (i = 0; i < cpuc->n_events; i++)
+		__clear_bit(cpuc->assign[i], cpuc->dirty);
+
+	if (bitmap_empty(cpuc->dirty, X86_PMC_IDX_MAX))
+		return;
+
+	for_each_set_bit(i, cpuc->dirty, X86_PMC_IDX_MAX) {
+		/* Metrics and fake events don't have corresponding HW counters. */
+		if (is_metric_idx(i) || (i == INTEL_PMC_IDX_FIXED_VLBR))
+			continue;
+		else if (i >= INTEL_PMC_IDX_FIXED)
+			wrmsrl(MSR_ARCH_PERFMON_FIXED_CTR0 + (i - INTEL_PMC_IDX_FIXED), 0);
+		else
+			wrmsrl(x86_pmu_event_addr(i), 0);
+	}
+
+	bitmap_zero(cpuc->dirty, X86_PMC_IDX_MAX);
+}
+
 static void x86_pmu_event_mapped(struct perf_event *event, struct mm_struct *mm)
 {
 	if (!(event->hw.flags & PERF_X86_EVENT_RDPMC_ALLOWED))
@@ -2495,7 +2527,6 @@ static void x86_pmu_event_mapped(struct perf_event *event, struct mm_struct *mm)
 
 static void x86_pmu_event_unmapped(struct perf_event *event, struct mm_struct *mm)
 {
-
 	if (!(event->hw.flags & PERF_X86_EVENT_RDPMC_ALLOWED))
 		return;
 
@@ -2604,6 +2635,25 @@ static const struct attribute_group *x86_pmu_attr_groups[] = {
 static void x86_pmu_sched_task(struct perf_event_context *ctx, bool sched_in)
 {
 	static_call_cond(x86_pmu_sched_task)(ctx, sched_in);
+
+	/*
+	 * If a new task has the RDPMC enabled, clear the dirty counters
+	 * to prevent the potential leak.
+	 */
+	if (sched_in && ctx && READ_ONCE(x86_pmu.attr_rdpmc) &&
+	    current->mm && atomic_read(&current->mm->context.perf_rdpmc_allowed))
+		x86_pmu_clear_dirty_counters();
+}
+
+bool arch_perf_needs_sched_in(struct pmu *pmu)
+{
+	if (!READ_ONCE(x86_pmu.attr_rdpmc))
+		return false;
+
+	if (!is_x86_pmu(pmu))
+		return  false;
+
+	return current->mm && atomic_read(&current->mm->context.perf_rdpmc_allowed);
 }
 
 static void x86_pmu_swap_task_ctx(struct perf_event_context *prev,
diff --git a/arch/x86/events/perf_event.h b/arch/x86/events/perf_event.h
index 27fa85e7d4fd..d6003e08b055 100644
--- a/arch/x86/events/perf_event.h
+++ b/arch/x86/events/perf_event.h
@@ -229,6 +229,7 @@ struct cpu_hw_events {
 	 */
 	struct perf_event	*events[X86_PMC_IDX_MAX]; /* in counter order */
 	unsigned long		active_mask[BITS_TO_LONGS(X86_PMC_IDX_MAX)];
+	unsigned long		dirty[BITS_TO_LONGS(X86_PMC_IDX_MAX)];
 	int			enabled;
 
 	int			n_events; /* the # of events in the below arrays */
diff --git a/include/linux/perf_event.h b/include/linux/perf_event.h
index f5a6a2f069ed..8b5a88e93e3c 100644
--- a/include/linux/perf_event.h
+++ b/include/linux/perf_event.h
@@ -1597,6 +1597,8 @@ int perf_event_exit_cpu(unsigned int cpu);
 #define perf_event_exit_cpu	NULL
 #endif
 
+extern bool __weak arch_perf_needs_sched_in(struct pmu *_pmu);
+
 extern void __weak arch_perf_update_userpage(struct perf_event *event,
 					     struct perf_event_mmap_page *userpg,
 					     u64 now);
diff --git a/kernel/events/core.c b/kernel/events/core.c
index 8c3abccaa612..9ae292c09cb0 100644
--- a/kernel/events/core.c
+++ b/kernel/events/core.c
@@ -3817,6 +3817,11 @@ static void cpu_ctx_sched_in(struct perf_cpu_context *cpuctx,
 	ctx_sched_in(ctx, cpuctx, event_type, task);
 }
 
+bool __weak arch_perf_needs_sched_in(struct pmu *pmu)
+{
+	return false;
+}
+
 static void perf_event_context_sched_in(struct perf_event_context *ctx,
 					struct task_struct *task)
 {
@@ -3851,7 +3856,7 @@ static void perf_event_context_sched_in(struct perf_event_context *ctx,
 		cpu_ctx_sched_out(cpuctx, EVENT_FLEXIBLE);
 	perf_event_sched_in(cpuctx, ctx, task);
 
-	if (cpuctx->sched_cb_usage && pmu->sched_task)
+	if (pmu->sched_task && (cpuctx->sched_cb_usage || arch_perf_needs_sched_in(pmu)))
 		pmu->sched_task(cpuctx->task_ctx, true);
 
 	perf_pmu_enable(pmu);

^ permalink raw reply related	[flat|nested] 10+ messages in thread

* Re: [PATCH V6] perf: Reset the dirty counter to prevent the leak for an RDPMC task
  2021-05-10 19:18 ` Peter Zijlstra
@ 2021-05-10 20:29   ` Rob Herring
  2021-05-11 17:59     ` Liang, Kan
  0 siblings, 1 reply; 10+ messages in thread
From: Rob Herring @ 2021-05-10 20:29 UTC (permalink / raw)
  To: Peter Zijlstra
  Cc: Liang, Kan, Ingo Molnar, linux-kernel, Andi Kleen,
	Arnaldo Carvalho de Melo, Mark Rutland, Andy Lutomirski,
	Stephane Eranian, Namhyung Kim

On Mon, May 10, 2021 at 2:18 PM Peter Zijlstra <peterz@infradead.org> wrote:
>
> On Thu, Apr 22, 2021 at 11:25:52AM -0700, kan.liang@linux.intel.com wrote:
>
> > - Add a new method check_leakage() to check and clear dirty counters
> >   to prevent potential leakage.
>
> I really dislike adding spurious callbacks, also because indirect calls
> are teh suck, but also because it pollutes the interface so.
>
> That said, I'm not sure I actually like the below any better :/
>
> ---
>
>  arch/x86/events/core.c       | 58 +++++++++++++++++++++++++++++++++++++++++---
>  arch/x86/events/perf_event.h |  1 +
>  include/linux/perf_event.h   |  2 ++
>  kernel/events/core.c         |  7 +++++-
>  4 files changed, 63 insertions(+), 5 deletions(-)
>
> diff --git a/arch/x86/events/core.c b/arch/x86/events/core.c
> index 8e509325c2c3..e650c4ab603a 100644
> --- a/arch/x86/events/core.c
> +++ b/arch/x86/events/core.c
> @@ -740,21 +740,26 @@ void x86_pmu_enable_all(int added)
>         }
>  }
>
> -static inline int is_x86_event(struct perf_event *event)
> +static inline bool is_x86_pmu(struct pmu *_pmu)
>  {
>         int i;
>
>         if (!is_hybrid())
> -               return event->pmu == &pmu;
> +               return _pmu == &pmu;
>
>         for (i = 0; i < x86_pmu.num_hybrid_pmus; i++) {
> -               if (event->pmu == &x86_pmu.hybrid_pmu[i].pmu)
> +               if (_pmu == &x86_pmu.hybrid_pmu[i].pmu)
>                         return true;
>         }
>
>         return false;
>  }

[...]

> +bool arch_perf_needs_sched_in(struct pmu *pmu)
> +{
> +       if (!READ_ONCE(x86_pmu.attr_rdpmc))
> +               return false;
> +
> +       if (!is_x86_pmu(pmu))
> +               return  false;
> +
> +       return current->mm && atomic_read(&current->mm->context.perf_rdpmc_allowed);
>  }

Why add an arch hook for something that clearly looks to be per PMU?
Couldn't we add another atomic/flag for calling sched_task() that is
per PMU rather than per CPU. With that, I think I can avoid a hook in
switch_mm() and keep every self contained in the Arm PMU driver.

Rob

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [PATCH V6] perf: Reset the dirty counter to prevent the leak for an RDPMC task
  2021-05-10 20:29   ` Rob Herring
@ 2021-05-11 17:59     ` Liang, Kan
  2021-05-11 20:39       ` Rob Herring
  0 siblings, 1 reply; 10+ messages in thread
From: Liang, Kan @ 2021-05-11 17:59 UTC (permalink / raw)
  To: Rob Herring, Peter Zijlstra
  Cc: Ingo Molnar, linux-kernel, Andi Kleen, Arnaldo Carvalho de Melo,
	Mark Rutland, Andy Lutomirski, Stephane Eranian, Namhyung Kim



On 5/10/2021 4:29 PM, Rob Herring wrote:
> On Mon, May 10, 2021 at 2:18 PM Peter Zijlstra <peterz@infradead.org> wrote:
>>
>> On Thu, Apr 22, 2021 at 11:25:52AM -0700, kan.liang@linux.intel.com wrote:
>>
>>> - Add a new method check_leakage() to check and clear dirty counters
>>>    to prevent potential leakage.
>>
>> I really dislike adding spurious callbacks, also because indirect calls
>> are teh suck, but also because it pollutes the interface so.
>>
>> That said, I'm not sure I actually like the below any better :/
>>

Maybe we can add a atomic variable to track the number of 
event_mapped(). Only invoke sched_task() when the number > 0.

It looks like only X86 implements the event_mapped(). So it should not 
impact other ARCHs.

What do you think?

diff --git a/arch/x86/events/core.c b/arch/x86/events/core.c
index c6fedd2..ae5b0e7 100644
--- a/arch/x86/events/core.c
+++ b/arch/x86/events/core.c
@@ -1636,6 +1636,8 @@ static void x86_pmu_del(struct perf_event *event, 
int flags)
  	if (cpuc->txn_flags & PERF_PMU_TXN_ADD)
  		goto do_del;

+	__set_bit(event->hw.idx, cpuc->dirty);
+
  	/*
  	 * Not a TXN, therefore cleanup properly.
  	 */
@@ -2484,6 +2486,31 @@ static int x86_pmu_event_init(struct perf_event 
*event)
  	return err;
  }

+static void x86_pmu_clear_dirty_counters(void)
+{
+	struct cpu_hw_events *cpuc = this_cpu_ptr(&cpu_hw_events);
+	int i;
+
+	 /* Don't need to clear the assigned counter. */
+	for (i = 0; i < cpuc->n_events; i++)
+		__clear_bit(cpuc->assign[i], cpuc->dirty);
+
+	if (bitmap_empty(cpuc->dirty, X86_PMC_IDX_MAX))
+		return;
+
+	for_each_set_bit(i, cpuc->dirty, X86_PMC_IDX_MAX) {
+		/* Metrics and fake events don't have corresponding HW counters. */
+		if (is_metric_idx(i) || (i == INTEL_PMC_IDX_FIXED_VLBR))
+			continue;
+		else if (i >= INTEL_PMC_IDX_FIXED)
+			wrmsrl(MSR_ARCH_PERFMON_FIXED_CTR0 + (i - INTEL_PMC_IDX_FIXED), 0);
+		else
+			wrmsrl(x86_pmu_event_addr(i), 0);
+	}
+
+	bitmap_zero(cpuc->dirty, X86_PMC_IDX_MAX);
+}
+
  static void x86_pmu_event_mapped(struct perf_event *event, struct 
mm_struct *mm)
  {
  	if (!(event->hw.flags & PERF_X86_EVENT_RDPMC_ALLOWED))
@@ -2507,7 +2534,6 @@ static void x86_pmu_event_mapped(struct perf_event 
*event, struct mm_struct *mm)

  static void x86_pmu_event_unmapped(struct perf_event *event, struct 
mm_struct *mm)
  {
-
  	if (!(event->hw.flags & PERF_X86_EVENT_RDPMC_ALLOWED))
  		return;

@@ -2616,6 +2642,14 @@ static const struct attribute_group 
*x86_pmu_attr_groups[] = {
  static void x86_pmu_sched_task(struct perf_event_context *ctx, bool 
sched_in)
  {
  	static_call_cond(x86_pmu_sched_task)(ctx, sched_in);
+
+	/*
+	 * If a new task has the RDPMC enabled, clear the dirty counters
+	 * to prevent the potential leak.
+	 */
+	if (sched_in && ctx && READ_ONCE(x86_pmu.attr_rdpmc) &&
+	    current->mm && atomic_read(&current->mm->context.perf_rdpmc_allowed))
+		x86_pmu_clear_dirty_counters();
  }

  static void x86_pmu_swap_task_ctx(struct perf_event_context *prev,
diff --git a/arch/x86/events/perf_event.h b/arch/x86/events/perf_event.h
index 10c8171..55bd891 100644
--- a/arch/x86/events/perf_event.h
+++ b/arch/x86/events/perf_event.h
@@ -229,6 +229,7 @@ struct cpu_hw_events {
  	 */
  	struct perf_event	*events[X86_PMC_IDX_MAX]; /* in counter order */
  	unsigned long		active_mask[BITS_TO_LONGS(X86_PMC_IDX_MAX)];
+	unsigned long		dirty[BITS_TO_LONGS(X86_PMC_IDX_MAX)];
  	int			enabled;

  	int			n_events; /* the # of events in the below arrays */
diff --git a/kernel/events/core.c b/kernel/events/core.c
index 1574b70..ef8f6f4 100644
--- a/kernel/events/core.c
+++ b/kernel/events/core.c
@@ -384,6 +384,7 @@ DEFINE_STATIC_KEY_FALSE(perf_sched_events);
  static DECLARE_DELAYED_WORK(perf_sched_work, perf_sched_delayed);
  static DEFINE_MUTEX(perf_sched_mutex);
  static atomic_t perf_sched_count;
+static atomic_t perf_event_mmap_count;

  static DEFINE_PER_CPU(atomic_t, perf_cgroup_events);
  static DEFINE_PER_CPU(int, perf_sched_cb_usages);
@@ -3851,7 +3852,7 @@ static void perf_event_context_sched_in(struct 
perf_event_context *ctx,
  		cpu_ctx_sched_out(cpuctx, EVENT_FLEXIBLE);
  	perf_event_sched_in(cpuctx, ctx, task);

-	if (cpuctx->sched_cb_usage && pmu->sched_task)
+	if (pmu->sched_task && (cpuctx->sched_cb_usage || 
atomic_read(&perf_event_mmap_count)))
  		pmu->sched_task(cpuctx->task_ctx, true);

  	perf_pmu_enable(pmu);
@@ -5988,8 +5989,10 @@ static void perf_mmap_open(struct vm_area_struct 
*vma)
  	if (vma->vm_pgoff)
  		atomic_inc(&event->rb->aux_mmap_count);

-	if (event->pmu->event_mapped)
+	if (event->pmu->event_mapped) {
+		atomic_inc(&perf_event_mmap_count);
  		event->pmu->event_mapped(event, vma->vm_mm);
+	}
  }

  static void perf_pmu_output_stop(struct perf_event *event);
@@ -6011,8 +6014,10 @@ static void perf_mmap_close(struct vm_area_struct 
*vma)
  	unsigned long size = perf_data_size(rb);
  	bool detach_rest = false;

-	if (event->pmu->event_unmapped)
+	if (event->pmu->event_unmapped) {
+		atomic_dec(&perf_event_mmap_count);
  		event->pmu->event_unmapped(event, vma->vm_mm);
+	}

  	/*
  	 * rb->aux_mmap_count will always drop before rb->mmap_count and
@@ -6329,8 +6334,10 @@ static int perf_mmap(struct file *file, struct 
vm_area_struct *vma)
  	vma->vm_flags |= VM_DONTCOPY | VM_DONTEXPAND | VM_DONTDUMP;
  	vma->vm_ops = &perf_mmap_vmops;

-	if (event->pmu->event_mapped)
+	if (event->pmu->event_mapped) {
+		atomic_inc(&perf_event_mmap_count);
  		event->pmu->event_mapped(event, vma->vm_mm);
+	}

  	return ret;
  }


Thanks,
Kan

^ permalink raw reply related	[flat|nested] 10+ messages in thread

* Re: [PATCH V6] perf: Reset the dirty counter to prevent the leak for an RDPMC task
  2021-05-11 17:59     ` Liang, Kan
@ 2021-05-11 20:39       ` Rob Herring
  2021-05-11 21:42         ` Liang, Kan
  0 siblings, 1 reply; 10+ messages in thread
From: Rob Herring @ 2021-05-11 20:39 UTC (permalink / raw)
  To: Liang, Kan
  Cc: Peter Zijlstra, Ingo Molnar, linux-kernel, Andi Kleen,
	Arnaldo Carvalho de Melo, Mark Rutland, Andy Lutomirski,
	Stephane Eranian, Namhyung Kim

On Tue, May 11, 2021 at 12:59 PM Liang, Kan <kan.liang@linux.intel.com> wrote:
>
>
>
> On 5/10/2021 4:29 PM, Rob Herring wrote:
> > On Mon, May 10, 2021 at 2:18 PM Peter Zijlstra <peterz@infradead.org> wrote:
> >>
> >> On Thu, Apr 22, 2021 at 11:25:52AM -0700, kan.liang@linux.intel.com wrote:
> >>
> >>> - Add a new method check_leakage() to check and clear dirty counters
> >>>    to prevent potential leakage.
> >>
> >> I really dislike adding spurious callbacks, also because indirect calls
> >> are teh suck, but also because it pollutes the interface so.
> >>
> >> That said, I'm not sure I actually like the below any better :/
> >>
>
> Maybe we can add a atomic variable to track the number of
> event_mapped(). Only invoke sched_task() when the number > 0.

Except that it is only needed when mapped and user access is allowed/enabled.

>
> It looks like only X86 implements the event_mapped(). So it should not
> impact other ARCHs.

Arm will have one if we ever settle on the implementation.

>
> What do you think?
>
> diff --git a/arch/x86/events/core.c b/arch/x86/events/core.c
> index c6fedd2..ae5b0e7 100644
> --- a/arch/x86/events/core.c
> +++ b/arch/x86/events/core.c
> @@ -1636,6 +1636,8 @@ static void x86_pmu_del(struct perf_event *event,
> int flags)
>         if (cpuc->txn_flags & PERF_PMU_TXN_ADD)
>                 goto do_del;
>
> +       __set_bit(event->hw.idx, cpuc->dirty);
> +
>         /*
>          * Not a TXN, therefore cleanup properly.
>          */
> @@ -2484,6 +2486,31 @@ static int x86_pmu_event_init(struct perf_event
> *event)
>         return err;
>   }
>
> +static void x86_pmu_clear_dirty_counters(void)
> +{
> +       struct cpu_hw_events *cpuc = this_cpu_ptr(&cpu_hw_events);
> +       int i;
> +
> +        /* Don't need to clear the assigned counter. */
> +       for (i = 0; i < cpuc->n_events; i++)
> +               __clear_bit(cpuc->assign[i], cpuc->dirty);
> +
> +       if (bitmap_empty(cpuc->dirty, X86_PMC_IDX_MAX))
> +               return;
> +
> +       for_each_set_bit(i, cpuc->dirty, X86_PMC_IDX_MAX) {
> +               /* Metrics and fake events don't have corresponding HW counters. */
> +               if (is_metric_idx(i) || (i == INTEL_PMC_IDX_FIXED_VLBR))
> +                       continue;
> +               else if (i >= INTEL_PMC_IDX_FIXED)
> +                       wrmsrl(MSR_ARCH_PERFMON_FIXED_CTR0 + (i - INTEL_PMC_IDX_FIXED), 0);
> +               else
> +                       wrmsrl(x86_pmu_event_addr(i), 0);
> +       }
> +
> +       bitmap_zero(cpuc->dirty, X86_PMC_IDX_MAX);
> +}
> +
>   static void x86_pmu_event_mapped(struct perf_event *event, struct
> mm_struct *mm)
>   {
>         if (!(event->hw.flags & PERF_X86_EVENT_RDPMC_ALLOWED))
> @@ -2507,7 +2534,6 @@ static void x86_pmu_event_mapped(struct perf_event
> *event, struct mm_struct *mm)
>
>   static void x86_pmu_event_unmapped(struct perf_event *event, struct
> mm_struct *mm)
>   {
> -
>         if (!(event->hw.flags & PERF_X86_EVENT_RDPMC_ALLOWED))
>                 return;
>
> @@ -2616,6 +2642,14 @@ static const struct attribute_group
> *x86_pmu_attr_groups[] = {
>   static void x86_pmu_sched_task(struct perf_event_context *ctx, bool
> sched_in)
>   {
>         static_call_cond(x86_pmu_sched_task)(ctx, sched_in);
> +
> +       /*
> +        * If a new task has the RDPMC enabled, clear the dirty counters
> +        * to prevent the potential leak.
> +        */
> +       if (sched_in && ctx && READ_ONCE(x86_pmu.attr_rdpmc) &&
> +           current->mm && atomic_read(&current->mm->context.perf_rdpmc_allowed))
> +               x86_pmu_clear_dirty_counters();
>   }
>
>   static void x86_pmu_swap_task_ctx(struct perf_event_context *prev,
> diff --git a/arch/x86/events/perf_event.h b/arch/x86/events/perf_event.h
> index 10c8171..55bd891 100644
> --- a/arch/x86/events/perf_event.h
> +++ b/arch/x86/events/perf_event.h
> @@ -229,6 +229,7 @@ struct cpu_hw_events {
>          */
>         struct perf_event       *events[X86_PMC_IDX_MAX]; /* in counter order */
>         unsigned long           active_mask[BITS_TO_LONGS(X86_PMC_IDX_MAX)];
> +       unsigned long           dirty[BITS_TO_LONGS(X86_PMC_IDX_MAX)];
>         int                     enabled;
>
>         int                     n_events; /* the # of events in the below arrays */
> diff --git a/kernel/events/core.c b/kernel/events/core.c
> index 1574b70..ef8f6f4 100644
> --- a/kernel/events/core.c
> +++ b/kernel/events/core.c
> @@ -384,6 +384,7 @@ DEFINE_STATIC_KEY_FALSE(perf_sched_events);
>   static DECLARE_DELAYED_WORK(perf_sched_work, perf_sched_delayed);
>   static DEFINE_MUTEX(perf_sched_mutex);
>   static atomic_t perf_sched_count;
> +static atomic_t perf_event_mmap_count;

A global count is not going to work. I think it needs to be per PMU at
least. In the case of Arm big.LITTLE, user access is constrained to
one subset of cores which is 1 PMU instance.

>
>   static DEFINE_PER_CPU(atomic_t, perf_cgroup_events);
>   static DEFINE_PER_CPU(int, perf_sched_cb_usages);
> @@ -3851,7 +3852,7 @@ static void perf_event_context_sched_in(struct
> perf_event_context *ctx,
>                 cpu_ctx_sched_out(cpuctx, EVENT_FLEXIBLE);
>         perf_event_sched_in(cpuctx, ctx, task);
>
> -       if (cpuctx->sched_cb_usage && pmu->sched_task)
> +       if (pmu->sched_task && (cpuctx->sched_cb_usage ||
> atomic_read(&perf_event_mmap_count)))
>                 pmu->sched_task(cpuctx->task_ctx, true);
>
>         perf_pmu_enable(pmu);
> @@ -5988,8 +5989,10 @@ static void perf_mmap_open(struct vm_area_struct
> *vma)
>         if (vma->vm_pgoff)
>                 atomic_inc(&event->rb->aux_mmap_count);
>
> -       if (event->pmu->event_mapped)
> +       if (event->pmu->event_mapped) {
> +               atomic_inc(&perf_event_mmap_count);
>                 event->pmu->event_mapped(event, vma->vm_mm);
> +       }
>   }
>
>   static void perf_pmu_output_stop(struct perf_event *event);
> @@ -6011,8 +6014,10 @@ static void perf_mmap_close(struct vm_area_struct
> *vma)
>         unsigned long size = perf_data_size(rb);
>         bool detach_rest = false;
>
> -       if (event->pmu->event_unmapped)
> +       if (event->pmu->event_unmapped) {
> +               atomic_dec(&perf_event_mmap_count);
>                 event->pmu->event_unmapped(event, vma->vm_mm);
> +       }
>
>         /*
>          * rb->aux_mmap_count will always drop before rb->mmap_count and
> @@ -6329,8 +6334,10 @@ static int perf_mmap(struct file *file, struct
> vm_area_struct *vma)
>         vma->vm_flags |= VM_DONTCOPY | VM_DONTEXPAND | VM_DONTDUMP;
>         vma->vm_ops = &perf_mmap_vmops;
>
> -       if (event->pmu->event_mapped)
> +       if (event->pmu->event_mapped) {
> +               atomic_inc(&perf_event_mmap_count);
>                 event->pmu->event_mapped(event, vma->vm_mm);
> +       }
>
>         return ret;
>   }
>
>
> Thanks,
> Kan

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [PATCH V6] perf: Reset the dirty counter to prevent the leak for an RDPMC task
  2021-05-11 20:39       ` Rob Herring
@ 2021-05-11 21:42         ` Liang, Kan
  2021-05-12  7:35           ` Peter Zijlstra
  2021-05-12 14:54           ` Rob Herring
  0 siblings, 2 replies; 10+ messages in thread
From: Liang, Kan @ 2021-05-11 21:42 UTC (permalink / raw)
  To: Rob Herring, Peter Zijlstra
  Cc: Ingo Molnar, linux-kernel, Andi Kleen, Arnaldo Carvalho de Melo,
	Mark Rutland, Andy Lutomirski, Stephane Eranian, Namhyung Kim



On 5/11/2021 4:39 PM, Rob Herring wrote:
> On Tue, May 11, 2021 at 12:59 PM Liang, Kan <kan.liang@linux.intel.com> wrote:
>>
>>
>>
>> On 5/10/2021 4:29 PM, Rob Herring wrote:
>>> On Mon, May 10, 2021 at 2:18 PM Peter Zijlstra <peterz@infradead.org> wrote:
>>>>
>>>> On Thu, Apr 22, 2021 at 11:25:52AM -0700, kan.liang@linux.intel.com wrote:
>>>>
>>>>> - Add a new method check_leakage() to check and clear dirty counters
>>>>>     to prevent potential leakage.
>>>>
>>>> I really dislike adding spurious callbacks, also because indirect calls
>>>> are teh suck, but also because it pollutes the interface so.
>>>>
>>>> That said, I'm not sure I actually like the below any better :/
>>>>
>>
>> Maybe we can add a atomic variable to track the number of
>> event_mapped(). Only invoke sched_task() when the number > 0.
> 
> Except that it is only needed when mapped and user access is allowed/enabled.
> 
>>
>> It looks like only X86 implements the event_mapped(). So it should not
>> impact other ARCHs.
> 
> Arm will have one if we ever settle on the implementation.
> 
>>
>> What do you think?
>>
>> diff --git a/arch/x86/events/core.c b/arch/x86/events/core.c
>> index c6fedd2..ae5b0e7 100644
>> --- a/arch/x86/events/core.c
>> +++ b/arch/x86/events/core.c
>> @@ -1636,6 +1636,8 @@ static void x86_pmu_del(struct perf_event *event,
>> int flags)
>>          if (cpuc->txn_flags & PERF_PMU_TXN_ADD)
>>                  goto do_del;
>>
>> +       __set_bit(event->hw.idx, cpuc->dirty);
>> +
>>          /*
>>           * Not a TXN, therefore cleanup properly.
>>           */
>> @@ -2484,6 +2486,31 @@ static int x86_pmu_event_init(struct perf_event
>> *event)
>>          return err;
>>    }
>>
>> +static void x86_pmu_clear_dirty_counters(void)
>> +{
>> +       struct cpu_hw_events *cpuc = this_cpu_ptr(&cpu_hw_events);
>> +       int i;
>> +
>> +        /* Don't need to clear the assigned counter. */
>> +       for (i = 0; i < cpuc->n_events; i++)
>> +               __clear_bit(cpuc->assign[i], cpuc->dirty);
>> +
>> +       if (bitmap_empty(cpuc->dirty, X86_PMC_IDX_MAX))
>> +               return;
>> +
>> +       for_each_set_bit(i, cpuc->dirty, X86_PMC_IDX_MAX) {
>> +               /* Metrics and fake events don't have corresponding HW counters. */
>> +               if (is_metric_idx(i) || (i == INTEL_PMC_IDX_FIXED_VLBR))
>> +                       continue;
>> +               else if (i >= INTEL_PMC_IDX_FIXED)
>> +                       wrmsrl(MSR_ARCH_PERFMON_FIXED_CTR0 + (i - INTEL_PMC_IDX_FIXED), 0);
>> +               else
>> +                       wrmsrl(x86_pmu_event_addr(i), 0);
>> +       }
>> +
>> +       bitmap_zero(cpuc->dirty, X86_PMC_IDX_MAX);
>> +}
>> +
>>    static void x86_pmu_event_mapped(struct perf_event *event, struct
>> mm_struct *mm)
>>    {
>>          if (!(event->hw.flags & PERF_X86_EVENT_RDPMC_ALLOWED))
>> @@ -2507,7 +2534,6 @@ static void x86_pmu_event_mapped(struct perf_event
>> *event, struct mm_struct *mm)
>>
>>    static void x86_pmu_event_unmapped(struct perf_event *event, struct
>> mm_struct *mm)
>>    {
>> -
>>          if (!(event->hw.flags & PERF_X86_EVENT_RDPMC_ALLOWED))
>>                  return;
>>
>> @@ -2616,6 +2642,14 @@ static const struct attribute_group
>> *x86_pmu_attr_groups[] = {
>>    static void x86_pmu_sched_task(struct perf_event_context *ctx, bool
>> sched_in)
>>    {
>>          static_call_cond(x86_pmu_sched_task)(ctx, sched_in);
>> +
>> +       /*
>> +        * If a new task has the RDPMC enabled, clear the dirty counters
>> +        * to prevent the potential leak.
>> +        */
>> +       if (sched_in && ctx && READ_ONCE(x86_pmu.attr_rdpmc) &&
>> +           current->mm && atomic_read(&current->mm->context.perf_rdpmc_allowed))
>> +               x86_pmu_clear_dirty_counters();
>>    }
>>
>>    static void x86_pmu_swap_task_ctx(struct perf_event_context *prev,
>> diff --git a/arch/x86/events/perf_event.h b/arch/x86/events/perf_event.h
>> index 10c8171..55bd891 100644
>> --- a/arch/x86/events/perf_event.h
>> +++ b/arch/x86/events/perf_event.h
>> @@ -229,6 +229,7 @@ struct cpu_hw_events {
>>           */
>>          struct perf_event       *events[X86_PMC_IDX_MAX]; /* in counter order */
>>          unsigned long           active_mask[BITS_TO_LONGS(X86_PMC_IDX_MAX)];
>> +       unsigned long           dirty[BITS_TO_LONGS(X86_PMC_IDX_MAX)];
>>          int                     enabled;
>>
>>          int                     n_events; /* the # of events in the below arrays */
>> diff --git a/kernel/events/core.c b/kernel/events/core.c
>> index 1574b70..ef8f6f4 100644
>> --- a/kernel/events/core.c
>> +++ b/kernel/events/core.c
>> @@ -384,6 +384,7 @@ DEFINE_STATIC_KEY_FALSE(perf_sched_events);
>>    static DECLARE_DELAYED_WORK(perf_sched_work, perf_sched_delayed);
>>    static DEFINE_MUTEX(perf_sched_mutex);
>>    static atomic_t perf_sched_count;
>> +static atomic_t perf_event_mmap_count;
> 
> A global count is not going to work. I think it needs to be per PMU at
> least. In the case of Arm big.LITTLE, user access is constrained to
> one subset of cores which is 1 PMU instance.
> 

How about this one?

diff --git a/arch/x86/events/core.c b/arch/x86/events/core.c
index c6fedd2..9052578 100644
--- a/arch/x86/events/core.c
+++ b/arch/x86/events/core.c
@@ -1636,6 +1636,8 @@ static void x86_pmu_del(struct perf_event *event, 
int flags)
  	if (cpuc->txn_flags & PERF_PMU_TXN_ADD)
  		goto do_del;

+	__set_bit(event->hw.idx, cpuc->dirty);
+
  	/*
  	 * Not a TXN, therefore cleanup properly.
  	 */
@@ -2484,12 +2486,43 @@ static int x86_pmu_event_init(struct perf_event 
*event)
  	return err;
  }

+static void x86_pmu_clear_dirty_counters(void)
+{
+	struct cpu_hw_events *cpuc = this_cpu_ptr(&cpu_hw_events);
+	int i;
+
+	 /* Don't need to clear the assigned counter. */
+	for (i = 0; i < cpuc->n_events; i++)
+		__clear_bit(cpuc->assign[i], cpuc->dirty);
+
+	if (bitmap_empty(cpuc->dirty, X86_PMC_IDX_MAX))
+		return;
+
+	for_each_set_bit(i, cpuc->dirty, X86_PMC_IDX_MAX) {
+		/* Metrics and fake events don't have corresponding HW counters. */
+		if (is_metric_idx(i) || (i == INTEL_PMC_IDX_FIXED_VLBR))
+			continue;
+		else if (i >= INTEL_PMC_IDX_FIXED)
+			wrmsrl(MSR_ARCH_PERFMON_FIXED_CTR0 + (i - INTEL_PMC_IDX_FIXED), 0);
+		else
+			wrmsrl(x86_pmu_event_addr(i), 0);
+	}
+
+	bitmap_zero(cpuc->dirty, X86_PMC_IDX_MAX);
+}
+
  static void x86_pmu_event_mapped(struct perf_event *event, struct 
mm_struct *mm)
  {
  	if (!(event->hw.flags & PERF_X86_EVENT_RDPMC_ALLOWED))
  		return;

  	/*
+	 * Enable sched_task() for the RDPMC task.
+	 */
+	if (x86_pmu.sched_task && event->hw.target)
+		atomic_inc(&event->pmu->sched_cb_usages);
+
+	/*
  	 * This function relies on not being called concurrently in two
  	 * tasks in the same mm.  Otherwise one task could observe
  	 * perf_rdpmc_allowed > 1 and return all the way back to
@@ -2507,10 +2540,12 @@ static void x86_pmu_event_mapped(struct 
perf_event *event, struct mm_struct *mm)

  static void x86_pmu_event_unmapped(struct perf_event *event, struct 
mm_struct *mm)
  {
-
  	if (!(event->hw.flags & PERF_X86_EVENT_RDPMC_ALLOWED))
  		return;

+	if (x86_pmu.sched_task && event->hw.target)
+		atomic_dec(&event->pmu->sched_cb_usages);
+
  	if (atomic_dec_and_test(&mm->context.perf_rdpmc_allowed))
  		on_each_cpu_mask(mm_cpumask(mm), cr4_update_pce, NULL, 1);
  }
@@ -2616,6 +2651,14 @@ static const struct attribute_group 
*x86_pmu_attr_groups[] = {
  static void x86_pmu_sched_task(struct perf_event_context *ctx, bool 
sched_in)
  {
  	static_call_cond(x86_pmu_sched_task)(ctx, sched_in);
+
+	/*
+	 * If a new task has the RDPMC enabled, clear the dirty counters
+	 * to prevent the potential leak.
+	 */
+	if (sched_in && ctx && READ_ONCE(x86_pmu.attr_rdpmc) &&
+	    current->mm && atomic_read(&current->mm->context.perf_rdpmc_allowed))
+		x86_pmu_clear_dirty_counters();
  }

  static void x86_pmu_swap_task_ctx(struct perf_event_context *prev,
diff --git a/arch/x86/events/perf_event.h b/arch/x86/events/perf_event.h
index 10c8171..55bd891 100644
--- a/arch/x86/events/perf_event.h
+++ b/arch/x86/events/perf_event.h
@@ -229,6 +229,7 @@ struct cpu_hw_events {
  	 */
  	struct perf_event	*events[X86_PMC_IDX_MAX]; /* in counter order */
  	unsigned long		active_mask[BITS_TO_LONGS(X86_PMC_IDX_MAX)];
+	unsigned long		dirty[BITS_TO_LONGS(X86_PMC_IDX_MAX)];
  	int			enabled;

  	int			n_events; /* the # of events in the below arrays */
diff --git a/include/linux/perf_event.h b/include/linux/perf_event.h
index c8a3388..3a85dbe 100644
--- a/include/linux/perf_event.h
+++ b/include/linux/perf_event.h
@@ -301,6 +301,9 @@ struct pmu {
  	/* number of address filters this PMU can do */
  	unsigned int			nr_addr_filters;

+	/* Track the per PMU sched_task() callback users */
+	atomic_t			sched_cb_usages;
+
  	/*
  	 * Fully disable/enable this PMU, can be used to protect from the PMI
  	 * as well as for lazy/batch writing of the MSRs.
diff --git a/kernel/events/core.c b/kernel/events/core.c
index 1574b70..8216acc 100644
--- a/kernel/events/core.c
+++ b/kernel/events/core.c
@@ -3851,7 +3851,7 @@ static void perf_event_context_sched_in(struct 
perf_event_context *ctx,
  		cpu_ctx_sched_out(cpuctx, EVENT_FLEXIBLE);
  	perf_event_sched_in(cpuctx, ctx, task);

-	if (cpuctx->sched_cb_usage && pmu->sched_task)
+	if (pmu->sched_task && (cpuctx->sched_cb_usage || 
atomic_read(&pmu->sched_cb_usages)))
  		pmu->sched_task(cpuctx->task_ctx, true);

  	perf_pmu_enable(pmu);

Thanks,
Kan

^ permalink raw reply related	[flat|nested] 10+ messages in thread

* Re: [PATCH V6] perf: Reset the dirty counter to prevent the leak for an RDPMC task
  2021-05-11 21:42         ` Liang, Kan
@ 2021-05-12  7:35           ` Peter Zijlstra
  2021-05-12 14:09             ` Liang, Kan
  2021-05-12 14:54           ` Rob Herring
  1 sibling, 1 reply; 10+ messages in thread
From: Peter Zijlstra @ 2021-05-12  7:35 UTC (permalink / raw)
  To: Liang, Kan
  Cc: Rob Herring, Ingo Molnar, linux-kernel, Andi Kleen,
	Arnaldo Carvalho de Melo, Mark Rutland, Andy Lutomirski,
	Stephane Eranian, Namhyung Kim

On Tue, May 11, 2021 at 05:42:54PM -0400, Liang, Kan wrote:
> diff --git a/kernel/events/core.c b/kernel/events/core.c
> index 1574b70..8216acc 100644
> --- a/kernel/events/core.c
> +++ b/kernel/events/core.c
> @@ -3851,7 +3851,7 @@ static void perf_event_context_sched_in(struct
> perf_event_context *ctx,
>  		cpu_ctx_sched_out(cpuctx, EVENT_FLEXIBLE);
>  	perf_event_sched_in(cpuctx, ctx, task);
> 
> -	if (cpuctx->sched_cb_usage && pmu->sched_task)
> +	if (pmu->sched_task && (cpuctx->sched_cb_usage ||
> atomic_read(&pmu->sched_cb_usages)))
>  		pmu->sched_task(cpuctx->task_ctx, true);

Aside from the obvious whitespace issues; I think this should work.

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [PATCH V6] perf: Reset the dirty counter to prevent the leak for an RDPMC task
  2021-05-12  7:35           ` Peter Zijlstra
@ 2021-05-12 14:09             ` Liang, Kan
  0 siblings, 0 replies; 10+ messages in thread
From: Liang, Kan @ 2021-05-12 14:09 UTC (permalink / raw)
  To: Peter Zijlstra
  Cc: Rob Herring, Ingo Molnar, linux-kernel, Andi Kleen,
	Arnaldo Carvalho de Melo, Mark Rutland, Andy Lutomirski,
	Stephane Eranian, Namhyung Kim



On 5/12/2021 3:35 AM, Peter Zijlstra wrote:
> On Tue, May 11, 2021 at 05:42:54PM -0400, Liang, Kan wrote:
>> diff --git a/kernel/events/core.c b/kernel/events/core.c
>> index 1574b70..8216acc 100644
>> --- a/kernel/events/core.c
>> +++ b/kernel/events/core.c
>> @@ -3851,7 +3851,7 @@ static void perf_event_context_sched_in(struct
>> perf_event_context *ctx,
>>   		cpu_ctx_sched_out(cpuctx, EVENT_FLEXIBLE);
>>   	perf_event_sched_in(cpuctx, ctx, task);
>>
>> -	if (cpuctx->sched_cb_usage && pmu->sched_task)
>> +	if (pmu->sched_task && (cpuctx->sched_cb_usage ||
>> atomic_read(&pmu->sched_cb_usages)))
>>   		pmu->sched_task(cpuctx->task_ctx, true);
> 
> Aside from the obvious whitespace issues; I think this should work.
> 

Thanks. The whitespace should be caused by the copy/paste. I will fix it 
in the V7.

I did more tests. For some cases, I can still observe the dirty counter 
for the first RDPMC read. I think we still have to clear the dirty 
counters in the x86_pmu_event_mapped() for the first RDPMC read.
I have to disable the the interrupts to prevent the preemption.


  static void x86_pmu_event_mapped(struct perf_event *event, struct 
mm_struct *mm)
  {
+	unsigned long flags;
+
  	if (!(event->hw.flags & PERF_X86_EVENT_RDPMC_ALLOWED))
  		return;

  	/*
+	 * Enable sched_task() for the RDPMC task,
+	 * and clear the existing dirty counters.
+	 */
+	if (x86_pmu.sched_task && event->hw.target) {
+		atomic_inc(&event->pmu->sched_cb_usages);
+		local_irq_save(flags);
+		x86_pmu_clear_dirty_counters();
+		local_irq_restore(flags);
+	}
+
+	/*
  	 * This function relies on not being called concurrently in two
  	 * tasks in the same mm.  Otherwise one task could observe
  	 * perf_rdpmc_allowed > 1 and return all the way back to

Thanks,
Kan

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [PATCH V6] perf: Reset the dirty counter to prevent the leak for an RDPMC task
  2021-05-11 21:42         ` Liang, Kan
  2021-05-12  7:35           ` Peter Zijlstra
@ 2021-05-12 14:54           ` Rob Herring
  2021-05-12 15:36             ` Liang, Kan
  1 sibling, 1 reply; 10+ messages in thread
From: Rob Herring @ 2021-05-12 14:54 UTC (permalink / raw)
  To: Liang, Kan
  Cc: Peter Zijlstra, Ingo Molnar, linux-kernel, Andi Kleen,
	Arnaldo Carvalho de Melo, Mark Rutland, Andy Lutomirski,
	Stephane Eranian, Namhyung Kim

On Tue, May 11, 2021 at 4:43 PM Liang, Kan <kan.liang@linux.intel.com> wrote:
>
>
>
> On 5/11/2021 4:39 PM, Rob Herring wrote:
> > On Tue, May 11, 2021 at 12:59 PM Liang, Kan <kan.liang@linux.intel.com> wrote:
> >>
> >>
> >>
> >> On 5/10/2021 4:29 PM, Rob Herring wrote:
> >>> On Mon, May 10, 2021 at 2:18 PM Peter Zijlstra <peterz@infradead.org> wrote:
> >>>>
> >>>> On Thu, Apr 22, 2021 at 11:25:52AM -0700, kan.liang@linux.intel.com wrote:
> >>>>
> >>>>> - Add a new method check_leakage() to check and clear dirty counters
> >>>>>     to prevent potential leakage.
> >>>>
> >>>> I really dislike adding spurious callbacks, also because indirect calls
> >>>> are teh suck, but also because it pollutes the interface so.
> >>>>
> >>>> That said, I'm not sure I actually like the below any better :/
> >>>>
> >>
> >> Maybe we can add a atomic variable to track the number of
> >> event_mapped(). Only invoke sched_task() when the number > 0.
> >
> > Except that it is only needed when mapped and user access is allowed/enabled.
> >
> >>
> >> It looks like only X86 implements the event_mapped(). So it should not
> >> impact other ARCHs.
> >
> > Arm will have one if we ever settle on the implementation.
> >
> >>
> >> What do you think?
> >>
> >> diff --git a/arch/x86/events/core.c b/arch/x86/events/core.c
> >> index c6fedd2..ae5b0e7 100644
> >> --- a/arch/x86/events/core.c
> >> +++ b/arch/x86/events/core.c
> >> @@ -1636,6 +1636,8 @@ static void x86_pmu_del(struct perf_event *event,
> >> int flags)
> >>          if (cpuc->txn_flags & PERF_PMU_TXN_ADD)
> >>                  goto do_del;
> >>
> >> +       __set_bit(event->hw.idx, cpuc->dirty);
> >> +
> >>          /*
> >>           * Not a TXN, therefore cleanup properly.
> >>           */
> >> @@ -2484,6 +2486,31 @@ static int x86_pmu_event_init(struct perf_event
> >> *event)
> >>          return err;
> >>    }
> >>
> >> +static void x86_pmu_clear_dirty_counters(void)
> >> +{
> >> +       struct cpu_hw_events *cpuc = this_cpu_ptr(&cpu_hw_events);
> >> +       int i;
> >> +
> >> +        /* Don't need to clear the assigned counter. */
> >> +       for (i = 0; i < cpuc->n_events; i++)
> >> +               __clear_bit(cpuc->assign[i], cpuc->dirty);
> >> +
> >> +       if (bitmap_empty(cpuc->dirty, X86_PMC_IDX_MAX))
> >> +               return;
> >> +
> >> +       for_each_set_bit(i, cpuc->dirty, X86_PMC_IDX_MAX) {
> >> +               /* Metrics and fake events don't have corresponding HW counters. */
> >> +               if (is_metric_idx(i) || (i == INTEL_PMC_IDX_FIXED_VLBR))
> >> +                       continue;
> >> +               else if (i >= INTEL_PMC_IDX_FIXED)
> >> +                       wrmsrl(MSR_ARCH_PERFMON_FIXED_CTR0 + (i - INTEL_PMC_IDX_FIXED), 0);
> >> +               else
> >> +                       wrmsrl(x86_pmu_event_addr(i), 0);
> >> +       }
> >> +
> >> +       bitmap_zero(cpuc->dirty, X86_PMC_IDX_MAX);
> >> +}
> >> +
> >>    static void x86_pmu_event_mapped(struct perf_event *event, struct
> >> mm_struct *mm)
> >>    {
> >>          if (!(event->hw.flags & PERF_X86_EVENT_RDPMC_ALLOWED))
> >> @@ -2507,7 +2534,6 @@ static void x86_pmu_event_mapped(struct perf_event
> >> *event, struct mm_struct *mm)
> >>
> >>    static void x86_pmu_event_unmapped(struct perf_event *event, struct
> >> mm_struct *mm)
> >>    {
> >> -
> >>          if (!(event->hw.flags & PERF_X86_EVENT_RDPMC_ALLOWED))
> >>                  return;
> >>
> >> @@ -2616,6 +2642,14 @@ static const struct attribute_group
> >> *x86_pmu_attr_groups[] = {
> >>    static void x86_pmu_sched_task(struct perf_event_context *ctx, bool
> >> sched_in)
> >>    {
> >>          static_call_cond(x86_pmu_sched_task)(ctx, sched_in);
> >> +
> >> +       /*
> >> +        * If a new task has the RDPMC enabled, clear the dirty counters
> >> +        * to prevent the potential leak.
> >> +        */
> >> +       if (sched_in && ctx && READ_ONCE(x86_pmu.attr_rdpmc) &&
> >> +           current->mm && atomic_read(&current->mm->context.perf_rdpmc_allowed))
> >> +               x86_pmu_clear_dirty_counters();
> >>    }
> >>
> >>    static void x86_pmu_swap_task_ctx(struct perf_event_context *prev,
> >> diff --git a/arch/x86/events/perf_event.h b/arch/x86/events/perf_event.h
> >> index 10c8171..55bd891 100644
> >> --- a/arch/x86/events/perf_event.h
> >> +++ b/arch/x86/events/perf_event.h
> >> @@ -229,6 +229,7 @@ struct cpu_hw_events {
> >>           */
> >>          struct perf_event       *events[X86_PMC_IDX_MAX]; /* in counter order */
> >>          unsigned long           active_mask[BITS_TO_LONGS(X86_PMC_IDX_MAX)];
> >> +       unsigned long           dirty[BITS_TO_LONGS(X86_PMC_IDX_MAX)];
> >>          int                     enabled;
> >>
> >>          int                     n_events; /* the # of events in the below arrays */
> >> diff --git a/kernel/events/core.c b/kernel/events/core.c
> >> index 1574b70..ef8f6f4 100644
> >> --- a/kernel/events/core.c
> >> +++ b/kernel/events/core.c
> >> @@ -384,6 +384,7 @@ DEFINE_STATIC_KEY_FALSE(perf_sched_events);
> >>    static DECLARE_DELAYED_WORK(perf_sched_work, perf_sched_delayed);
> >>    static DEFINE_MUTEX(perf_sched_mutex);
> >>    static atomic_t perf_sched_count;
> >> +static atomic_t perf_event_mmap_count;
> >
> > A global count is not going to work. I think it needs to be per PMU at
> > least. In the case of Arm big.LITTLE, user access is constrained to
> > one subset of cores which is 1 PMU instance.
> >
>
> How about this one?

Would you mind splitting this to core and x86 parts.

>
> diff --git a/arch/x86/events/core.c b/arch/x86/events/core.c
> index c6fedd2..9052578 100644
> --- a/arch/x86/events/core.c
> +++ b/arch/x86/events/core.c
> @@ -1636,6 +1636,8 @@ static void x86_pmu_del(struct perf_event *event,
> int flags)
>         if (cpuc->txn_flags & PERF_PMU_TXN_ADD)
>                 goto do_del;
>
> +       __set_bit(event->hw.idx, cpuc->dirty);
> +
>         /*
>          * Not a TXN, therefore cleanup properly.
>          */
> @@ -2484,12 +2486,43 @@ static int x86_pmu_event_init(struct perf_event
> *event)
>         return err;
>   }
>
> +static void x86_pmu_clear_dirty_counters(void)
> +{
> +       struct cpu_hw_events *cpuc = this_cpu_ptr(&cpu_hw_events);
> +       int i;
> +
> +        /* Don't need to clear the assigned counter. */
> +       for (i = 0; i < cpuc->n_events; i++)
> +               __clear_bit(cpuc->assign[i], cpuc->dirty);
> +
> +       if (bitmap_empty(cpuc->dirty, X86_PMC_IDX_MAX))
> +               return;
> +
> +       for_each_set_bit(i, cpuc->dirty, X86_PMC_IDX_MAX) {
> +               /* Metrics and fake events don't have corresponding HW counters. */
> +               if (is_metric_idx(i) || (i == INTEL_PMC_IDX_FIXED_VLBR))
> +                       continue;
> +               else if (i >= INTEL_PMC_IDX_FIXED)
> +                       wrmsrl(MSR_ARCH_PERFMON_FIXED_CTR0 + (i - INTEL_PMC_IDX_FIXED), 0);
> +               else
> +                       wrmsrl(x86_pmu_event_addr(i), 0);
> +       }
> +
> +       bitmap_zero(cpuc->dirty, X86_PMC_IDX_MAX);
> +}
> +
>   static void x86_pmu_event_mapped(struct perf_event *event, struct
> mm_struct *mm)
>   {
>         if (!(event->hw.flags & PERF_X86_EVENT_RDPMC_ALLOWED))
>                 return;
>
>         /*
> +        * Enable sched_task() for the RDPMC task.
> +        */
> +       if (x86_pmu.sched_task && event->hw.target)
> +               atomic_inc(&event->pmu->sched_cb_usages);
> +
> +       /*
>          * This function relies on not being called concurrently in two
>          * tasks in the same mm.  Otherwise one task could observe
>          * perf_rdpmc_allowed > 1 and return all the way back to
> @@ -2507,10 +2540,12 @@ static void x86_pmu_event_mapped(struct
> perf_event *event, struct mm_struct *mm)
>
>   static void x86_pmu_event_unmapped(struct perf_event *event, struct
> mm_struct *mm)
>   {
> -
>         if (!(event->hw.flags & PERF_X86_EVENT_RDPMC_ALLOWED))
>                 return;
>
> +       if (x86_pmu.sched_task && event->hw.target)
> +               atomic_dec(&event->pmu->sched_cb_usages);
> +
>         if (atomic_dec_and_test(&mm->context.perf_rdpmc_allowed))
>                 on_each_cpu_mask(mm_cpumask(mm), cr4_update_pce, NULL, 1);
>   }
> @@ -2616,6 +2651,14 @@ static const struct attribute_group
> *x86_pmu_attr_groups[] = {
>   static void x86_pmu_sched_task(struct perf_event_context *ctx, bool
> sched_in)
>   {
>         static_call_cond(x86_pmu_sched_task)(ctx, sched_in);
> +
> +       /*
> +        * If a new task has the RDPMC enabled, clear the dirty counters
> +        * to prevent the potential leak.
> +        */
> +       if (sched_in && ctx && READ_ONCE(x86_pmu.attr_rdpmc) &&
> +           current->mm && atomic_read(&current->mm->context.perf_rdpmc_allowed))
> +               x86_pmu_clear_dirty_counters();
>   }
>
>   static void x86_pmu_swap_task_ctx(struct perf_event_context *prev,
> diff --git a/arch/x86/events/perf_event.h b/arch/x86/events/perf_event.h
> index 10c8171..55bd891 100644
> --- a/arch/x86/events/perf_event.h
> +++ b/arch/x86/events/perf_event.h
> @@ -229,6 +229,7 @@ struct cpu_hw_events {
>          */
>         struct perf_event       *events[X86_PMC_IDX_MAX]; /* in counter order */
>         unsigned long           active_mask[BITS_TO_LONGS(X86_PMC_IDX_MAX)];
> +       unsigned long           dirty[BITS_TO_LONGS(X86_PMC_IDX_MAX)];
>         int                     enabled;
>
>         int                     n_events; /* the # of events in the below arrays */
> diff --git a/include/linux/perf_event.h b/include/linux/perf_event.h
> index c8a3388..3a85dbe 100644
> --- a/include/linux/perf_event.h
> +++ b/include/linux/perf_event.h
> @@ -301,6 +301,9 @@ struct pmu {
>         /* number of address filters this PMU can do */
>         unsigned int                    nr_addr_filters;
>
> +       /* Track the per PMU sched_task() callback users */
> +       atomic_t                        sched_cb_usages;

To align with the per cpu one: s/usages/usage/

I think we should be able to use refcount_t here instead?

> +
>         /*
>          * Fully disable/enable this PMU, can be used to protect from the PMI
>          * as well as for lazy/batch writing of the MSRs.
> diff --git a/kernel/events/core.c b/kernel/events/core.c
> index 1574b70..8216acc 100644
> --- a/kernel/events/core.c
> +++ b/kernel/events/core.c
> @@ -3851,7 +3851,7 @@ static void perf_event_context_sched_in(struct
> perf_event_context *ctx,
>                 cpu_ctx_sched_out(cpuctx, EVENT_FLEXIBLE);
>         perf_event_sched_in(cpuctx, ctx, task);
>
> -       if (cpuctx->sched_cb_usage && pmu->sched_task)
> +       if (pmu->sched_task && (cpuctx->sched_cb_usage ||
> atomic_read(&pmu->sched_cb_usages)))

For completeness, shouldn't this condition be added everywhere
->sched_task() can be called perhaps with the exception of
__perf_pmu_sched_task() which is only called when the task context
doesn't change.

>                 pmu->sched_task(cpuctx->task_ctx, true);
>
>         perf_pmu_enable(pmu);
>
> Thanks,
> Kan

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [PATCH V6] perf: Reset the dirty counter to prevent the leak for an RDPMC task
  2021-05-12 14:54           ` Rob Herring
@ 2021-05-12 15:36             ` Liang, Kan
  0 siblings, 0 replies; 10+ messages in thread
From: Liang, Kan @ 2021-05-12 15:36 UTC (permalink / raw)
  To: Rob Herring
  Cc: Peter Zijlstra, Ingo Molnar, linux-kernel, Andi Kleen,
	Arnaldo Carvalho de Melo, Mark Rutland, Andy Lutomirski,
	Stephane Eranian, Namhyung Kim



On 5/12/2021 10:54 AM, Rob Herring wrote:
>> How about this one?
> Would you mind splitting this to core and x86 parts.
> 

Sure, I will split the patch.

>> diff --git a/arch/x86/events/core.c b/arch/x86/events/core.c
>> index c6fedd2..9052578 100644
>> --- a/arch/x86/events/core.c
>> +++ b/arch/x86/events/core.c
>> @@ -1636,6 +1636,8 @@ static void x86_pmu_del(struct perf_event *event,
>> int flags)
>>          if (cpuc->txn_flags & PERF_PMU_TXN_ADD)
>>                  goto do_del;
>>
>> +       __set_bit(event->hw.idx, cpuc->dirty);
>> +
>>          /*
>>           * Not a TXN, therefore cleanup properly.
>>           */
>> @@ -2484,12 +2486,43 @@ static int x86_pmu_event_init(struct perf_event
>> *event)
>>          return err;
>>    }
>>
>> +static void x86_pmu_clear_dirty_counters(void)
>> +{
>> +       struct cpu_hw_events *cpuc = this_cpu_ptr(&cpu_hw_events);
>> +       int i;
>> +
>> +        /* Don't need to clear the assigned counter. */
>> +       for (i = 0; i < cpuc->n_events; i++)
>> +               __clear_bit(cpuc->assign[i], cpuc->dirty);
>> +
>> +       if (bitmap_empty(cpuc->dirty, X86_PMC_IDX_MAX))
>> +               return;
>> +
>> +       for_each_set_bit(i, cpuc->dirty, X86_PMC_IDX_MAX) {
>> +               /* Metrics and fake events don't have corresponding HW counters. */
>> +               if (is_metric_idx(i) || (i == INTEL_PMC_IDX_FIXED_VLBR))
>> +                       continue;
>> +               else if (i >= INTEL_PMC_IDX_FIXED)
>> +                       wrmsrl(MSR_ARCH_PERFMON_FIXED_CTR0 + (i - INTEL_PMC_IDX_FIXED), 0);
>> +               else
>> +                       wrmsrl(x86_pmu_event_addr(i), 0);
>> +       }
>> +
>> +       bitmap_zero(cpuc->dirty, X86_PMC_IDX_MAX);
>> +}
>> +
>>    static void x86_pmu_event_mapped(struct perf_event *event, struct
>> mm_struct *mm)
>>    {
>>          if (!(event->hw.flags & PERF_X86_EVENT_RDPMC_ALLOWED))
>>                  return;
>>
>>          /*
>> +        * Enable sched_task() for the RDPMC task.
>> +        */
>> +       if (x86_pmu.sched_task && event->hw.target)
>> +               atomic_inc(&event->pmu->sched_cb_usages);
>> +
>> +       /*
>>           * This function relies on not being called concurrently in two
>>           * tasks in the same mm.  Otherwise one task could observe
>>           * perf_rdpmc_allowed > 1 and return all the way back to
>> @@ -2507,10 +2540,12 @@ static void x86_pmu_event_mapped(struct
>> perf_event *event, struct mm_struct *mm)
>>
>>    static void x86_pmu_event_unmapped(struct perf_event *event, struct
>> mm_struct *mm)
>>    {
>> -
>>          if (!(event->hw.flags & PERF_X86_EVENT_RDPMC_ALLOWED))
>>                  return;
>>
>> +       if (x86_pmu.sched_task && event->hw.target)
>> +               atomic_dec(&event->pmu->sched_cb_usages);
>> +
>>          if (atomic_dec_and_test(&mm->context.perf_rdpmc_allowed))
>>                  on_each_cpu_mask(mm_cpumask(mm), cr4_update_pce, NULL, 1);
>>    }
>> @@ -2616,6 +2651,14 @@ static const struct attribute_group
>> *x86_pmu_attr_groups[] = {
>>    static void x86_pmu_sched_task(struct perf_event_context *ctx, bool
>> sched_in)
>>    {
>>          static_call_cond(x86_pmu_sched_task)(ctx, sched_in);
>> +
>> +       /*
>> +        * If a new task has the RDPMC enabled, clear the dirty counters
>> +        * to prevent the potential leak.
>> +        */
>> +       if (sched_in && ctx && READ_ONCE(x86_pmu.attr_rdpmc) &&
>> +           current->mm && atomic_read(&current->mm->context.perf_rdpmc_allowed))
>> +               x86_pmu_clear_dirty_counters();
>>    }
>>
>>    static void x86_pmu_swap_task_ctx(struct perf_event_context *prev,
>> diff --git a/arch/x86/events/perf_event.h b/arch/x86/events/perf_event.h
>> index 10c8171..55bd891 100644
>> --- a/arch/x86/events/perf_event.h
>> +++ b/arch/x86/events/perf_event.h
>> @@ -229,6 +229,7 @@ struct cpu_hw_events {
>>           */
>>          struct perf_event       *events[X86_PMC_IDX_MAX]; /* in counter order */
>>          unsigned long           active_mask[BITS_TO_LONGS(X86_PMC_IDX_MAX)];
>> +       unsigned long           dirty[BITS_TO_LONGS(X86_PMC_IDX_MAX)];
>>          int                     enabled;
>>
>>          int                     n_events; /* the # of events in the below arrays */
>> diff --git a/include/linux/perf_event.h b/include/linux/perf_event.h
>> index c8a3388..3a85dbe 100644
>> --- a/include/linux/perf_event.h
>> +++ b/include/linux/perf_event.h
>> @@ -301,6 +301,9 @@ struct pmu {
>>          /* number of address filters this PMU can do */
>>          unsigned int                    nr_addr_filters;
>>
>> +       /* Track the per PMU sched_task() callback users */
>> +       atomic_t                        sched_cb_usages;
> To align with the per cpu one: s/usages/usage/
> 

OK

> I think we should be able to use refcount_t here instead?

I think they are the same for this case. Is there a particular reason 
for the change? Are they different in ARM?

> 
>> +
>>          /*
>>           * Fully disable/enable this PMU, can be used to protect from the PMI
>>           * as well as for lazy/batch writing of the MSRs.
>> diff --git a/kernel/events/core.c b/kernel/events/core.c
>> index 1574b70..8216acc 100644
>> --- a/kernel/events/core.c
>> +++ b/kernel/events/core.c
>> @@ -3851,7 +3851,7 @@ static void perf_event_context_sched_in(struct
>> perf_event_context *ctx,
>>                  cpu_ctx_sched_out(cpuctx, EVENT_FLEXIBLE);
>>          perf_event_sched_in(cpuctx, ctx, task);
>>
>> -       if (cpuctx->sched_cb_usage && pmu->sched_task)
>> +       if (pmu->sched_task && (cpuctx->sched_cb_usage ||
>> atomic_read(&pmu->sched_cb_usages)))
> For completeness, shouldn't this condition be added everywhere
> ->sched_task() can be called perhaps with the exception of
> __perf_pmu_sched_task() which is only called when the task context
> doesn't change.

In theory, it's harmless to add it in the other places, because we also 
check it in the X86 specific code. But the other checks can bring some 
overhead. I'd like to avoid the overhead in a context switch.

Since X86 is the only user for sched_task() for now, I prefer to only 
add the check here. I will add some comments to explain the reason.

If ARM needs it in the other places later, please feel free to add it.

Thanks,
Kan

^ permalink raw reply	[flat|nested] 10+ messages in thread

end of thread, other threads:[~2021-05-12 16:27 UTC | newest]

Thread overview: 10+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2021-04-22 18:25 [PATCH V6] perf: Reset the dirty counter to prevent the leak for an RDPMC task kan.liang
2021-05-10 19:18 ` Peter Zijlstra
2021-05-10 20:29   ` Rob Herring
2021-05-11 17:59     ` Liang, Kan
2021-05-11 20:39       ` Rob Herring
2021-05-11 21:42         ` Liang, Kan
2021-05-12  7:35           ` Peter Zijlstra
2021-05-12 14:09             ` Liang, Kan
2021-05-12 14:54           ` Rob Herring
2021-05-12 15:36             ` Liang, Kan

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).