All of lore.kernel.org
 help / color / mirror / Atom feed
From: Rob Herring <robh@kernel.org>
To: "Liang, Kan" <kan.liang@linux.intel.com>
Cc: Peter Zijlstra <peterz@infradead.org>,
	Ingo Molnar <mingo@redhat.com>,
	"linux-kernel@vger.kernel.org" <linux-kernel@vger.kernel.org>,
	Andi Kleen <ak@linux.intel.com>,
	Arnaldo Carvalho de Melo <acme@kernel.org>,
	Mark Rutland <mark.rutland@arm.com>,
	Andy Lutomirski <luto@amacapital.net>,
	Stephane Eranian <eranian@google.com>,
	Namhyung Kim <namhyung@kernel.org>
Subject: Re: [PATCH V6] perf: Reset the dirty counter to prevent the leak for an RDPMC task
Date: Wed, 12 May 2021 09:54:22 -0500	[thread overview]
Message-ID: <CAL_JsqKN4YcCpL9uiOqea4CcqRWgU7Af=V8JNMjypivaVHq4sQ@mail.gmail.com> (raw)
In-Reply-To: <f390aa11-e475-9d9d-9384-959a7ed32fd6@linux.intel.com>

On Tue, May 11, 2021 at 4:43 PM Liang, Kan <kan.liang@linux.intel.com> wrote:
>
>
>
> On 5/11/2021 4:39 PM, Rob Herring wrote:
> > On Tue, May 11, 2021 at 12:59 PM Liang, Kan <kan.liang@linux.intel.com> wrote:
> >>
> >>
> >>
> >> On 5/10/2021 4:29 PM, Rob Herring wrote:
> >>> On Mon, May 10, 2021 at 2:18 PM Peter Zijlstra <peterz@infradead.org> wrote:
> >>>>
> >>>> On Thu, Apr 22, 2021 at 11:25:52AM -0700, kan.liang@linux.intel.com wrote:
> >>>>
> >>>>> - Add a new method check_leakage() to check and clear dirty counters
> >>>>>     to prevent potential leakage.
> >>>>
> >>>> I really dislike adding spurious callbacks, also because indirect calls
> >>>> are teh suck, but also because it pollutes the interface so.
> >>>>
> >>>> That said, I'm not sure I actually like the below any better :/
> >>>>
> >>
> >> Maybe we can add a atomic variable to track the number of
> >> event_mapped(). Only invoke sched_task() when the number > 0.
> >
> > Except that it is only needed when mapped and user access is allowed/enabled.
> >
> >>
> >> It looks like only X86 implements the event_mapped(). So it should not
> >> impact other ARCHs.
> >
> > Arm will have one if we ever settle on the implementation.
> >
> >>
> >> What do you think?
> >>
> >> diff --git a/arch/x86/events/core.c b/arch/x86/events/core.c
> >> index c6fedd2..ae5b0e7 100644
> >> --- a/arch/x86/events/core.c
> >> +++ b/arch/x86/events/core.c
> >> @@ -1636,6 +1636,8 @@ static void x86_pmu_del(struct perf_event *event,
> >> int flags)
> >>          if (cpuc->txn_flags & PERF_PMU_TXN_ADD)
> >>                  goto do_del;
> >>
> >> +       __set_bit(event->hw.idx, cpuc->dirty);
> >> +
> >>          /*
> >>           * Not a TXN, therefore cleanup properly.
> >>           */
> >> @@ -2484,6 +2486,31 @@ static int x86_pmu_event_init(struct perf_event
> >> *event)
> >>          return err;
> >>    }
> >>
> >> +static void x86_pmu_clear_dirty_counters(void)
> >> +{
> >> +       struct cpu_hw_events *cpuc = this_cpu_ptr(&cpu_hw_events);
> >> +       int i;
> >> +
> >> +        /* Don't need to clear the assigned counter. */
> >> +       for (i = 0; i < cpuc->n_events; i++)
> >> +               __clear_bit(cpuc->assign[i], cpuc->dirty);
> >> +
> >> +       if (bitmap_empty(cpuc->dirty, X86_PMC_IDX_MAX))
> >> +               return;
> >> +
> >> +       for_each_set_bit(i, cpuc->dirty, X86_PMC_IDX_MAX) {
> >> +               /* Metrics and fake events don't have corresponding HW counters. */
> >> +               if (is_metric_idx(i) || (i == INTEL_PMC_IDX_FIXED_VLBR))
> >> +                       continue;
> >> +               else if (i >= INTEL_PMC_IDX_FIXED)
> >> +                       wrmsrl(MSR_ARCH_PERFMON_FIXED_CTR0 + (i - INTEL_PMC_IDX_FIXED), 0);
> >> +               else
> >> +                       wrmsrl(x86_pmu_event_addr(i), 0);
> >> +       }
> >> +
> >> +       bitmap_zero(cpuc->dirty, X86_PMC_IDX_MAX);
> >> +}
> >> +
> >>    static void x86_pmu_event_mapped(struct perf_event *event, struct
> >> mm_struct *mm)
> >>    {
> >>          if (!(event->hw.flags & PERF_X86_EVENT_RDPMC_ALLOWED))
> >> @@ -2507,7 +2534,6 @@ static void x86_pmu_event_mapped(struct perf_event
> >> *event, struct mm_struct *mm)
> >>
> >>    static void x86_pmu_event_unmapped(struct perf_event *event, struct
> >> mm_struct *mm)
> >>    {
> >> -
> >>          if (!(event->hw.flags & PERF_X86_EVENT_RDPMC_ALLOWED))
> >>                  return;
> >>
> >> @@ -2616,6 +2642,14 @@ static const struct attribute_group
> >> *x86_pmu_attr_groups[] = {
> >>    static void x86_pmu_sched_task(struct perf_event_context *ctx, bool
> >> sched_in)
> >>    {
> >>          static_call_cond(x86_pmu_sched_task)(ctx, sched_in);
> >> +
> >> +       /*
> >> +        * If a new task has the RDPMC enabled, clear the dirty counters
> >> +        * to prevent the potential leak.
> >> +        */
> >> +       if (sched_in && ctx && READ_ONCE(x86_pmu.attr_rdpmc) &&
> >> +           current->mm && atomic_read(&current->mm->context.perf_rdpmc_allowed))
> >> +               x86_pmu_clear_dirty_counters();
> >>    }
> >>
> >>    static void x86_pmu_swap_task_ctx(struct perf_event_context *prev,
> >> diff --git a/arch/x86/events/perf_event.h b/arch/x86/events/perf_event.h
> >> index 10c8171..55bd891 100644
> >> --- a/arch/x86/events/perf_event.h
> >> +++ b/arch/x86/events/perf_event.h
> >> @@ -229,6 +229,7 @@ struct cpu_hw_events {
> >>           */
> >>          struct perf_event       *events[X86_PMC_IDX_MAX]; /* in counter order */
> >>          unsigned long           active_mask[BITS_TO_LONGS(X86_PMC_IDX_MAX)];
> >> +       unsigned long           dirty[BITS_TO_LONGS(X86_PMC_IDX_MAX)];
> >>          int                     enabled;
> >>
> >>          int                     n_events; /* the # of events in the below arrays */
> >> diff --git a/kernel/events/core.c b/kernel/events/core.c
> >> index 1574b70..ef8f6f4 100644
> >> --- a/kernel/events/core.c
> >> +++ b/kernel/events/core.c
> >> @@ -384,6 +384,7 @@ DEFINE_STATIC_KEY_FALSE(perf_sched_events);
> >>    static DECLARE_DELAYED_WORK(perf_sched_work, perf_sched_delayed);
> >>    static DEFINE_MUTEX(perf_sched_mutex);
> >>    static atomic_t perf_sched_count;
> >> +static atomic_t perf_event_mmap_count;
> >
> > A global count is not going to work. I think it needs to be per PMU at
> > least. In the case of Arm big.LITTLE, user access is constrained to
> > one subset of cores which is 1 PMU instance.
> >
>
> How about this one?

Would you mind splitting this to core and x86 parts.

>
> diff --git a/arch/x86/events/core.c b/arch/x86/events/core.c
> index c6fedd2..9052578 100644
> --- a/arch/x86/events/core.c
> +++ b/arch/x86/events/core.c
> @@ -1636,6 +1636,8 @@ static void x86_pmu_del(struct perf_event *event,
> int flags)
>         if (cpuc->txn_flags & PERF_PMU_TXN_ADD)
>                 goto do_del;
>
> +       __set_bit(event->hw.idx, cpuc->dirty);
> +
>         /*
>          * Not a TXN, therefore cleanup properly.
>          */
> @@ -2484,12 +2486,43 @@ static int x86_pmu_event_init(struct perf_event
> *event)
>         return err;
>   }
>
> +static void x86_pmu_clear_dirty_counters(void)
> +{
> +       struct cpu_hw_events *cpuc = this_cpu_ptr(&cpu_hw_events);
> +       int i;
> +
> +        /* Don't need to clear the assigned counter. */
> +       for (i = 0; i < cpuc->n_events; i++)
> +               __clear_bit(cpuc->assign[i], cpuc->dirty);
> +
> +       if (bitmap_empty(cpuc->dirty, X86_PMC_IDX_MAX))
> +               return;
> +
> +       for_each_set_bit(i, cpuc->dirty, X86_PMC_IDX_MAX) {
> +               /* Metrics and fake events don't have corresponding HW counters. */
> +               if (is_metric_idx(i) || (i == INTEL_PMC_IDX_FIXED_VLBR))
> +                       continue;
> +               else if (i >= INTEL_PMC_IDX_FIXED)
> +                       wrmsrl(MSR_ARCH_PERFMON_FIXED_CTR0 + (i - INTEL_PMC_IDX_FIXED), 0);
> +               else
> +                       wrmsrl(x86_pmu_event_addr(i), 0);
> +       }
> +
> +       bitmap_zero(cpuc->dirty, X86_PMC_IDX_MAX);
> +}
> +
>   static void x86_pmu_event_mapped(struct perf_event *event, struct
> mm_struct *mm)
>   {
>         if (!(event->hw.flags & PERF_X86_EVENT_RDPMC_ALLOWED))
>                 return;
>
>         /*
> +        * Enable sched_task() for the RDPMC task.
> +        */
> +       if (x86_pmu.sched_task && event->hw.target)
> +               atomic_inc(&event->pmu->sched_cb_usages);
> +
> +       /*
>          * This function relies on not being called concurrently in two
>          * tasks in the same mm.  Otherwise one task could observe
>          * perf_rdpmc_allowed > 1 and return all the way back to
> @@ -2507,10 +2540,12 @@ static void x86_pmu_event_mapped(struct
> perf_event *event, struct mm_struct *mm)
>
>   static void x86_pmu_event_unmapped(struct perf_event *event, struct
> mm_struct *mm)
>   {
> -
>         if (!(event->hw.flags & PERF_X86_EVENT_RDPMC_ALLOWED))
>                 return;
>
> +       if (x86_pmu.sched_task && event->hw.target)
> +               atomic_dec(&event->pmu->sched_cb_usages);
> +
>         if (atomic_dec_and_test(&mm->context.perf_rdpmc_allowed))
>                 on_each_cpu_mask(mm_cpumask(mm), cr4_update_pce, NULL, 1);
>   }
> @@ -2616,6 +2651,14 @@ static const struct attribute_group
> *x86_pmu_attr_groups[] = {
>   static void x86_pmu_sched_task(struct perf_event_context *ctx, bool
> sched_in)
>   {
>         static_call_cond(x86_pmu_sched_task)(ctx, sched_in);
> +
> +       /*
> +        * If a new task has the RDPMC enabled, clear the dirty counters
> +        * to prevent the potential leak.
> +        */
> +       if (sched_in && ctx && READ_ONCE(x86_pmu.attr_rdpmc) &&
> +           current->mm && atomic_read(&current->mm->context.perf_rdpmc_allowed))
> +               x86_pmu_clear_dirty_counters();
>   }
>
>   static void x86_pmu_swap_task_ctx(struct perf_event_context *prev,
> diff --git a/arch/x86/events/perf_event.h b/arch/x86/events/perf_event.h
> index 10c8171..55bd891 100644
> --- a/arch/x86/events/perf_event.h
> +++ b/arch/x86/events/perf_event.h
> @@ -229,6 +229,7 @@ struct cpu_hw_events {
>          */
>         struct perf_event       *events[X86_PMC_IDX_MAX]; /* in counter order */
>         unsigned long           active_mask[BITS_TO_LONGS(X86_PMC_IDX_MAX)];
> +       unsigned long           dirty[BITS_TO_LONGS(X86_PMC_IDX_MAX)];
>         int                     enabled;
>
>         int                     n_events; /* the # of events in the below arrays */
> diff --git a/include/linux/perf_event.h b/include/linux/perf_event.h
> index c8a3388..3a85dbe 100644
> --- a/include/linux/perf_event.h
> +++ b/include/linux/perf_event.h
> @@ -301,6 +301,9 @@ struct pmu {
>         /* number of address filters this PMU can do */
>         unsigned int                    nr_addr_filters;
>
> +       /* Track the per PMU sched_task() callback users */
> +       atomic_t                        sched_cb_usages;

To align with the per cpu one: s/usages/usage/

I think we should be able to use refcount_t here instead?

> +
>         /*
>          * Fully disable/enable this PMU, can be used to protect from the PMI
>          * as well as for lazy/batch writing of the MSRs.
> diff --git a/kernel/events/core.c b/kernel/events/core.c
> index 1574b70..8216acc 100644
> --- a/kernel/events/core.c
> +++ b/kernel/events/core.c
> @@ -3851,7 +3851,7 @@ static void perf_event_context_sched_in(struct
> perf_event_context *ctx,
>                 cpu_ctx_sched_out(cpuctx, EVENT_FLEXIBLE);
>         perf_event_sched_in(cpuctx, ctx, task);
>
> -       if (cpuctx->sched_cb_usage && pmu->sched_task)
> +       if (pmu->sched_task && (cpuctx->sched_cb_usage ||
> atomic_read(&pmu->sched_cb_usages)))

For completeness, shouldn't this condition be added everywhere
->sched_task() can be called perhaps with the exception of
__perf_pmu_sched_task() which is only called when the task context
doesn't change.

>                 pmu->sched_task(cpuctx->task_ctx, true);
>
>         perf_pmu_enable(pmu);
>
> Thanks,
> Kan

  parent reply	other threads:[~2021-05-12 14:56 UTC|newest]

Thread overview: 10+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2021-04-22 18:25 [PATCH V6] perf: Reset the dirty counter to prevent the leak for an RDPMC task kan.liang
2021-05-10 19:18 ` Peter Zijlstra
2021-05-10 20:29   ` Rob Herring
2021-05-11 17:59     ` Liang, Kan
2021-05-11 20:39       ` Rob Herring
2021-05-11 21:42         ` Liang, Kan
2021-05-12  7:35           ` Peter Zijlstra
2021-05-12 14:09             ` Liang, Kan
2021-05-12 14:54           ` Rob Herring [this message]
2021-05-12 15:36             ` Liang, Kan

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to='CAL_JsqKN4YcCpL9uiOqea4CcqRWgU7Af=V8JNMjypivaVHq4sQ@mail.gmail.com' \
    --to=robh@kernel.org \
    --cc=acme@kernel.org \
    --cc=ak@linux.intel.com \
    --cc=eranian@google.com \
    --cc=kan.liang@linux.intel.com \
    --cc=linux-kernel@vger.kernel.org \
    --cc=luto@amacapital.net \
    --cc=mark.rutland@arm.com \
    --cc=mingo@redhat.com \
    --cc=namhyung@kernel.org \
    --cc=peterz@infradead.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.