linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: David Carrillo-Cisneros <davidcc@google.com>
To: Mark Rutland <mark.rutland@arm.com>
Cc: linux-kernel <linux-kernel@vger.kernel.org>,
	"x86@kernel.org" <x86@kernel.org>, Ingo Molnar <mingo@redhat.com>,
	Thomas Gleixner <tglx@linutronix.de>,
	Andi Kleen <ak@linux.intel.com>, Kan Liang <kan.liang@intel.com>,
	Peter Zijlstra <peterz@infradead.org>,
	Borislav Petkov <bp@suse.de>,
	Srinivas Pandruvada <srinivas.pandruvada@linux.intel.com>,
	Dave Hansen <dave.hansen@linux.intel.com>,
	Vikas Shivappa <vikas.shivappa@linux.intel.com>,
	Arnaldo Carvalho de Melo <acme@kernel.org>,
	Vince Weaver <vince@deater.net>, Paul Turner <pjt@google.com>,
	Stephane Eranian <eranian@google.com>
Subject: Re: [RFC 1/6] perf/core: create active and inactive event groups
Date: Tue, 10 Jan 2017 12:45:31 -0800	[thread overview]
Message-ID: <CALcN6midaLYDbPBGE6TSz04VGkzumAueVzYPtU4Cg6=Gxkufbg@mail.gmail.com> (raw)
In-Reply-To: <20170110134925.GB19704@leverpostej>

>> diff --git a/include/linux/perf_event.h b/include/linux/perf_event.h
>> index 4741ecdb9817..3fa18f05c9b0 100644
>> --- a/include/linux/perf_event.h
>> +++ b/include/linux/perf_event.h
>> @@ -573,6 +573,7 @@ struct perf_event {
>>
>>       struct hlist_node               hlist_entry;
>>       struct list_head                active_entry;
>> +     struct list_head                ctx_active_entry;
>
> I think we should be able to kill off active_entry as part of this
> series; it's there to do the same thing (optimize iteration over active
> events).
>
> If we expose a for_each_ctx_active_event() helper which iterates of the
> pinned and flexible lists, I think we may be able to migrate existing
> users over and kill off perf_event::active_entry, and the redundant list
> manipulation in drivers.

The problem with that would be iterating over all CPU contexts, when most
users of active_entry only install evens in one CPU per package/socket.

Maybe we can create yet another list of cpu contexts to have contexts with
at least one active event.

>
> ... there might be some fun and games ordering manipulation against PMI
> handlers, tough, so it may turn out that we need both.
>
>>       int                             nr_siblings;
>>
>>       /* Not serialized. Only written during event initialization. */
>> @@ -734,6 +735,11 @@ struct perf_event_context {
>>       struct list_head                active_ctx_list;
>>       struct list_head                pinned_groups;
>>       struct list_head                flexible_groups;
>> +
>> +     struct list_head                active_pinned_groups;
>> +     struct list_head                active_flexible_groups;
>> +     struct list_head                inactive_groups;
>> +
>>       struct list_head                event_list;
>>       int                             nr_events;
>>       int                             nr_active;
>> diff --git a/kernel/events/core.c b/kernel/events/core.c
>> index faf073d0287f..b744b5a8dbd0 100644
>> --- a/kernel/events/core.c
>> +++ b/kernel/events/core.c
>> @@ -1462,6 +1462,21 @@ ctx_group_list(struct perf_event *event, struct perf_event_context *ctx)
>>               return &ctx->flexible_groups;
>>  }
>>
>> +static void
>> +ctx_sched_groups_to_inactive(struct perf_event *event,
>> +                          struct perf_event_context *ctx)
>> +{
>> +     WARN_ON(event->state != PERF_EVENT_STATE_INACTIVE);
>> +     list_move_tail(&event->ctx_active_entry, &ctx->inactive_groups);
>> +};
>
>> @@ -1851,6 +1877,11 @@ group_sched_out(struct perf_event *group_event,
>>
>>       if (state == PERF_EVENT_STATE_ACTIVE && group_event->attr.exclusive)
>>               cpuctx->exclusive = 0;
>> +
>> +     if (group_event->state <= PERF_EVENT_STATE_INACTIVE)
>> +             ctx_sched_groups_to_inactive(group_event, ctx);
>
> Was this intended to be '==' ?

It's '<=' so that the event is removed from the rb-tree even if it
went to ERROR state
between the last group_sched_in and this group_sched_out

>
> As-is, this looks inconsistent with the WARN_ON() in
> ctx_sched_groups_to_inactive() ...

Yes, that WARN_ON is likely wrong ...

>
>> +     if (group_event->state < PERF_EVENT_STATE_INACTIVE)
>> +             ctx_sched_groups_del(group_event, ctx);
>
> ... and here we'll subsequently delete most events from the inactive
> list, rather than never adding them to the inactive list in the first
> place.

Yeah, that's not right. I'll review this.

>
>>  }
>>
>>  #define DETACH_GROUP 0x01UL
>> @@ -1918,6 +1949,8 @@ static void __perf_event_disable(struct perf_event *event,
>>               group_sched_out(event, cpuctx, ctx);
>>       else
>>               event_sched_out(event, cpuctx, ctx);
>> +     if (event->state == PERF_EVENT_STATE_INACTIVE)
>> +             ctx_sched_groups_del(event, ctx);
>>       event->state = PERF_EVENT_STATE_OFF;
>>  }
>>
>> @@ -2014,6 +2047,17 @@ static void perf_set_shadow_time(struct perf_event *event,
>>  static void perf_log_throttle(struct perf_event *event, int enable);
>>  static void perf_log_itrace_start(struct perf_event *event);
>>
>> +static void
>> +ctx_sched_groups_to_active(struct perf_event *event, struct perf_event_context *ctx)
>> +{
>> +     struct list_head *h = event->attr.pinned ? &ctx->active_pinned_groups :
>> +                                                &ctx->active_flexible_groups;
>
> It would be nicer to splti the definition from the intisation. That way
> the lines can be shorter and more legible, we can s/h/head/ ...
>

Will do.

>> +     WARN_ON(!event);
>
> ... and we can move the dereference of event after the check here.
>
> That said, is there ever a risk of this being NULL? Won't the event have
> to be the container of a list element we walked? Or is there a path
> where that is not the case?

No, it's should never be NULL. I should've removed all this debug code
that only introduces noise. Will do next time.

>
> We didn't add a similar check to ctx_sched_groups_to_inactive(), so if
> nothing else it seems inconsistent.
>
>> +     WARN_ON(list_empty(&event->ctx_active_entry));
>
> I take it this is because we always expect the event to be in the
> inactive list first?
>
>> +     WARN_ON(event->state != PERF_EVENT_STATE_ACTIVE);
>> +     list_move_tail(&event->ctx_active_entry, h);
>> +}
>> +
>>  static int
>>  event_sched_in(struct perf_event *event,
>>                struct perf_cpu_context *cpuctx,
>> @@ -2091,9 +2135,7 @@ group_sched_in(struct perf_event *group_event,
>>       u64 now = ctx->time;
>>       bool simulate = false;
>>
>> -     if (group_event->state == PERF_EVENT_STATE_OFF)
>> -             return 0;
>> -
>> +     WARN_ON(group_event->state != PERF_EVENT_STATE_INACTIVE);
>>       pmu->start_txn(pmu, PERF_PMU_TXN_ADD);
>>
>>       if (event_sched_in(group_event, cpuctx, ctx)) {
>> @@ -2112,9 +2154,10 @@ group_sched_in(struct perf_event *group_event,
>>               }
>>       }
>>
>> -     if (!pmu->commit_txn(pmu))
>> +     if (!pmu->commit_txn(pmu)) {
>> +             ctx_sched_groups_to_active(group_event, ctx);
>>               return 0;
>
> I think IRQs are disabled in this path (though I'll need to
> double-check), but I don't think the PMU is disabled, so I believe a PMI
> can come in between the commit_txn() and the addition of events to their
> active list.
>
> I'm not immediately sure if that matters -- we'll need to consider what
> list manipulation might happen in a PMI handler.
>
> If it does matter, we could always add the events to an active list
> first, then try the commit, then remove them if the commit failed. It
> means we might see some not-actually-active events in the active lists
> occasionally, but the lists would still be shorter than the full event
> list.

Just checked, the pmu is disabled from perf_event_context_sched_in, so
it should be fine.

>
>> -
>> +     }
>>  group_error:
>>       /*
>>        * Groups can be scheduled in as one unit only, so undo any
>> @@ -2396,6 +2439,7 @@ static void __perf_event_enable(struct perf_event *event,
>>               ctx_sched_out(ctx, cpuctx, EVENT_TIME);
>>
>>       __perf_event_mark_enabled(event);
>> +     ctx_sched_groups_add(event, ctx);
>>
>>       if (!ctx->is_active)
>>               return;
>> @@ -2611,7 +2655,7 @@ static void ctx_sched_out(struct perf_event_context *ctx,
>>                         enum event_type_t event_type)
>>  {
>>       int is_active = ctx->is_active;
>> -     struct perf_event *event;
>> +     struct perf_event *event, *tmp;
>>
>>       lockdep_assert_held(&ctx->lock);
>>
>> @@ -2658,13 +2702,17 @@ static void ctx_sched_out(struct perf_event_context *ctx,
>>
>>       perf_pmu_disable(ctx->pmu);
>>       if (is_active & EVENT_PINNED) {
>> -             list_for_each_entry(event, &ctx->pinned_groups, group_entry)
>> +             list_for_each_entry_safe(event, tmp, &ctx->active_pinned_groups, ctx_active_entry) {
>> +                     WARN_ON(event->state != PERF_EVENT_STATE_ACTIVE);
>>                       group_sched_out(event, cpuctx, ctx);
>> +             }
>>       }
>>
>>       if (is_active & EVENT_FLEXIBLE) {
>> -             list_for_each_entry(event, &ctx->flexible_groups, group_entry)
>> +             list_for_each_entry_safe(event, tmp, &ctx->active_flexible_groups, ctx_active_entry) {
>> +                     WARN_ON(event->state != PERF_EVENT_STATE_ACTIVE);
>>                       group_sched_out(event, cpuctx, ctx);
>> +             }
>>       }
>>       perf_pmu_enable(ctx->pmu);
>>  }
>> @@ -2962,10 +3010,11 @@ static void
>>  ctx_pinned_sched_in(struct perf_event_context *ctx,
>>                   struct perf_cpu_context *cpuctx)
>>  {
>> -     struct perf_event *event;
>> +     struct perf_event *event = NULL, *tmp;
>
> I don't believe we need to initialise event here;

Ups, it leaked from the next patch where we actually need to initialize event.

> list_for_each_entry_safe() should do that as required.
>
>>
>> -     list_for_each_entry(event, &ctx->pinned_groups, group_entry) {
>> -             if (event->state <= PERF_EVENT_STATE_OFF)
>> +     list_for_each_entry_safe(
>> +                     event, tmp, &ctx->inactive_groups, ctx_active_entry) {
>> +             if (WARN_ON(event->state != PERF_EVENT_STATE_INACTIVE)) /* debug only */
>>                       continue;
>
> Given the comment, is this still needed?

No, will remove.
>
>>               if (!event_filter_match(event))
>>                       continue;
>> @@ -2983,6 +3032,7 @@ ctx_pinned_sched_in(struct perf_event_context *ctx,
>>                */
>>               if (event->state == PERF_EVENT_STATE_INACTIVE) {
>>                       update_group_times(event);
>> +                     ctx_sched_groups_del(event, ctx);
>>                       event->state = PERF_EVENT_STATE_ERROR;
>>               }
>>       }
>> @@ -2992,12 +3042,12 @@ static void
>>  ctx_flexible_sched_in(struct perf_event_context *ctx,
>>                     struct perf_cpu_context *cpuctx)
>>  {
>> -     struct perf_event *event;
>> +     struct perf_event *event = NULL, *tmp;
>>       int can_add_hw = 1;
>>
>> -     list_for_each_entry(event, &ctx->flexible_groups, group_entry) {
>> -             /* Ignore events in OFF or ERROR state */
>> -             if (event->state <= PERF_EVENT_STATE_OFF)
>> +     list_for_each_entry_safe(
>> +                     event, tmp, &ctx->inactive_groups, ctx_active_entry) {
>> +             if (WARN_ON(event->state != PERF_EVENT_STATE_INACTIVE)) /* debug only */
>>                       continue;
>
> Likewise, is this still needed?

No, will remove.

>
> Thanks,
> Mark.

  reply	other threads:[~2017-01-10 20:45 UTC|newest]

Thread overview: 33+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2017-01-10 10:24 [RFC 0/6] optimize ctx switch with rb-tree David Carrillo-Cisneros
2017-01-10 10:24 ` [RFC 1/6] perf/core: create active and inactive event groups David Carrillo-Cisneros
2017-01-10 13:49   ` Mark Rutland
2017-01-10 20:45     ` David Carrillo-Cisneros [this message]
2017-01-12 11:05       ` Mark Rutland
     [not found]         ` <CALcN6mhPmpSqKhE3Ua+j-xROLzeAyrgdCk4AGGtfF9kExXRTJg@mail.gmail.com>
2017-01-13 11:01           ` Mark Rutland
2017-01-10 10:24 ` [RFC 2/6] perf/core: add a rb-tree index to inactive_groups David Carrillo-Cisneros
2017-01-10 14:14   ` Mark Rutland
2017-01-10 20:20     ` David Carrillo-Cisneros
2017-01-12 11:47       ` Mark Rutland
2017-01-13  7:34         ` David Carrillo-Cisneros
2017-01-16  2:03   ` [lkp-developer] [perf/core] 33da94bd89: BUG:unable_to_handle_kernel kernel test robot
2017-01-10 10:24 ` [RFC 3/6] perf/core: use rb-tree to sched in event groups David Carrillo-Cisneros
2017-01-10 16:38   ` Mark Rutland
2017-01-10 20:51     ` David Carrillo-Cisneros
2017-01-12 12:14       ` Mark Rutland
2017-01-13  8:01         ` David Carrillo-Cisneros
2017-01-13 10:24           ` Mark Rutland
2017-01-11 20:31     ` Liang, Kan
2017-01-12 10:11       ` Mark Rutland
2017-01-12 13:28         ` Liang, Kan
2017-01-13  8:05           ` David Carrillo-Cisneros
2017-01-10 10:25 ` [RFC 4/6] perf/core: avoid rb-tree traversal when no inactive events David Carrillo-Cisneros
2017-01-10 10:25 ` [RFC 5/6] perf/core: rotation no longer necessary. Behavior has changed. Beware David Carrillo-Cisneros
2017-01-10 10:25 ` [RFC 6/6] perf/core: use rb-tree index to optimize filtered perf_iterate_ctx David Carrillo-Cisneros
2017-01-16  2:05   ` [lkp-developer] [perf/core] 49c04ee1a7: WARNING:at_kernel/events/core.c:#perf_iterate_ctx_matching kernel test robot
2017-04-25 17:27 ` [RFC 0/6] optimize ctx switch with rb-tree Liang, Kan
2017-04-25 17:49   ` David Carrillo-Cisneros
2017-04-25 18:11     ` Budankov, Alexey
2017-04-25 18:54       ` David Carrillo-Cisneros
2017-04-26 10:34         ` Budankov, Alexey
2017-04-26 19:40           ` David Carrillo-Cisneros
2017-04-26 10:52         ` Mark Rutland

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to='CALcN6midaLYDbPBGE6TSz04VGkzumAueVzYPtU4Cg6=Gxkufbg@mail.gmail.com' \
    --to=davidcc@google.com \
    --cc=acme@kernel.org \
    --cc=ak@linux.intel.com \
    --cc=bp@suse.de \
    --cc=dave.hansen@linux.intel.com \
    --cc=eranian@google.com \
    --cc=kan.liang@intel.com \
    --cc=linux-kernel@vger.kernel.org \
    --cc=mark.rutland@arm.com \
    --cc=mingo@redhat.com \
    --cc=peterz@infradead.org \
    --cc=pjt@google.com \
    --cc=srinivas.pandruvada@linux.intel.com \
    --cc=tglx@linutronix.de \
    --cc=vikas.shivappa@linux.intel.com \
    --cc=vince@deater.net \
    --cc=x86@kernel.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).