linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: Mark Rutland <mark.rutland@arm.com>
To: David Carrillo-Cisneros <davidcc@google.com>
Cc: linux-kernel <linux-kernel@vger.kernel.org>,
	"x86@kernel.org" <x86@kernel.org>, Ingo Molnar <mingo@redhat.com>,
	Thomas Gleixner <tglx@linutronix.de>,
	Andi Kleen <ak@linux.intel.com>, Kan Liang <kan.liang@intel.com>,
	Peter Zijlstra <peterz@infradead.org>,
	Borislav Petkov <bp@suse.de>,
	Srinivas Pandruvada <srinivas.pandruvada@linux.intel.com>,
	Dave Hansen <dave.hansen@linux.intel.com>,
	Vikas Shivappa <vikas.shivappa@linux.intel.com>,
	Arnaldo Carvalho de Melo <acme@kernel.org>,
	Vince Weaver <vince@deater.net>, Paul Turner <pjt@google.com>,
	Stephane Eranian <eranian@google.com>
Subject: Re: [RFC 1/6] perf/core: create active and inactive event groups
Date: Fri, 13 Jan 2017 11:01:19 +0000	[thread overview]
Message-ID: <20170113110118.GB26120@leverpostej> (raw)
In-Reply-To: <CALcN6mhPmpSqKhE3Ua+j-xROLzeAyrgdCk4AGGtfF9kExXRTJg@mail.gmail.com>

On Fri, Jan 13, 2017 at 07:20:48AM +0000, David Carrillo-Cisneros wrote:
> On Thu, Jan 12, 2017 at 3:05 AM, Mark Rutland <mark.rutland@arm.com> wrote:
> > On Tue, Jan 10, 2017 at 12:45:31PM -0800, David Carrillo-Cisneros wrote:
> >> >> diff --git a/include/linux/perf_event.h b/include/linux/perf_event.h
> >> >> index 4741ecdb9817..3fa18f05c9b0 100644
> >> >> --- a/include/linux/perf_event.h
> >> >> +++ b/include/linux/perf_event.h
> >> >> @@ -573,6 +573,7 @@ struct perf_event {
> >> >>
> >> >>       struct hlist_node               hlist_entry;
> >> >>       struct list_head                active_entry;
> >> >> +     struct list_head                ctx_active_entry;
> >> >
> >> > I think we should be able to kill off active_entry as part of this
> >> > series; it's there to do the same thing (optimize iteration over active
> >> > events).
> >> >
> >> > If we expose a for_each_ctx_active_event() helper which iterates of the
> >> > pinned and flexible lists, I think we may be able to migrate existing
> >> > users over and kill off perf_event::active_entry, and the redundant
> list
> >> > manipulation in drivers.
> >>
> >> The problem with that would be iterating over all CPU contexts, when most
> >> users of active_entry only install evens in one CPU per package/socket.
> >>
> >> Maybe we can create yet another list of cpu contexts to have contexts
> with
> >> at least one active event.
> >
> > Ah. I thought these were already per-context rather than per-{pmu,box}.
> > My bad. I guess having bth isn't really a problem.
> >
> > In all cases they're used by a hrtimer that iterates over events to
> > update them (and they're only used by uncore PMUs).
> >
> > We could instead register a hrtimer affine to a cpu when the first event
> > was created, which would iterate over that CPU's events alone. When you
> > only have events on one CPU, that behaves the same. When you have events
> > on multiple CPUs, you avoid repeated cross-calls to update the events.
> 
> That sounds good, although I wonder if the overhead is currently a problem.

Sure, that's not clear to me either.

> FWIK, only system-wide uncore events use this. These are system-wide so
> there is no really a reason to have more than one per socket/package per
> event type.

Even with that, there's the risk that one CPU is blocked for a period
wiating for those few others to respond to IPIs. But as you say, it's
not obviously a problem in practice today.

> The one advantage of converting them to per cpu hrtimers is that we could
> remove event->active_entry and just use the ctx->active_groups list in this
> series.

Given it's of minor/unknown benefit, and would complicate this series,
feel free to forget about it. I'll add it to the list of things to clean
up at some point in future. ;)

[...]

> > In some cases, several PMUs share the same task context, so events in a
> > context might be for a HW PMU other than ctx->pmu. Thus, the
> > perf_pmu_disable(ctx->pmu) in perf_event_context_sched_in() isn't
> > necessarily sufficient to disable the PMU that group_sched_in() is
> > operating on. AFAICT, the PMU is only necessarily disabled in the blcok
> > from start_txn() to {stop,cancel}_txn().
> 
> I see the problem now. Thank you for explaining the details. It seems like
> this only matters in contexts with more that one pmu and with pmus that can
> fail commit_txn.

For the *sched_in path, yes. 

> This makes me like more the idea of removing context sharing for
> non-sw pmus that you describe below (and renaming sw contexts to "not
> fail pmu->add contexts").

If we could handle dynamic contexts, I'd try to get rid of the context
type entirely, and have a flag on the PMU to describe that it was a pure
SW PMU.

> > If any of those PMUs sharing a context can raise an NMI, then I believe
> > the problem I raised above stands. Currently the ARM PMU don't, but
> > there's ongoing work to implement NMIs for GICv3. I'm not sure about the
> > other PMUs.
> 
> Agree with this being problematic. Also intel cmt pmu uses
> sw context even though it can fail to ad events (although it does not
> raises NMI).

I hadn't realised that was the case. That'll violate one of the main
invariants regarding SW events, so we've probably got additional subtle
breakage there (e.g. SW elligible events not being scheduled).

> > This context sharing is problematic for a number of reasons outside of
> > scheduling. I had hoped to address this by reworking the way we handle
> > task contexts to avoid sharing. It turns out that there are many subtle
> > locking and ordering issues to consider for that, so I hadn't made much
> > headway with that approach.
> 
> Do heterogeneous CPUs have a large number of distinct cpu pmus?

In general, I would expect a small number. 

One or two CPU PMUs is the likely case today, but we could see more in
future, and tracing PMUs will add to that (those can differ across
microarchitectures, too). I can imagine seeing at least four in the near
term.

> Could this be addressed short-term by adding extra types of task contexts
> (e.g. hw_arch_context)? So architectures could chose this as an extra
> context to accommodate their special behaviors?

Maybe. Bumping perf_nr_task_contexts, and dynamically allocating the
pmu::task_ctx_nr for HW PMUs would give some relief.

Thanks,
Mark.

  parent reply	other threads:[~2017-01-13 11:02 UTC|newest]

Thread overview: 33+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2017-01-10 10:24 [RFC 0/6] optimize ctx switch with rb-tree David Carrillo-Cisneros
2017-01-10 10:24 ` [RFC 1/6] perf/core: create active and inactive event groups David Carrillo-Cisneros
2017-01-10 13:49   ` Mark Rutland
2017-01-10 20:45     ` David Carrillo-Cisneros
2017-01-12 11:05       ` Mark Rutland
     [not found]         ` <CALcN6mhPmpSqKhE3Ua+j-xROLzeAyrgdCk4AGGtfF9kExXRTJg@mail.gmail.com>
2017-01-13 11:01           ` Mark Rutland [this message]
2017-01-10 10:24 ` [RFC 2/6] perf/core: add a rb-tree index to inactive_groups David Carrillo-Cisneros
2017-01-10 14:14   ` Mark Rutland
2017-01-10 20:20     ` David Carrillo-Cisneros
2017-01-12 11:47       ` Mark Rutland
2017-01-13  7:34         ` David Carrillo-Cisneros
2017-01-16  2:03   ` [lkp-developer] [perf/core] 33da94bd89: BUG:unable_to_handle_kernel kernel test robot
2017-01-10 10:24 ` [RFC 3/6] perf/core: use rb-tree to sched in event groups David Carrillo-Cisneros
2017-01-10 16:38   ` Mark Rutland
2017-01-10 20:51     ` David Carrillo-Cisneros
2017-01-12 12:14       ` Mark Rutland
2017-01-13  8:01         ` David Carrillo-Cisneros
2017-01-13 10:24           ` Mark Rutland
2017-01-11 20:31     ` Liang, Kan
2017-01-12 10:11       ` Mark Rutland
2017-01-12 13:28         ` Liang, Kan
2017-01-13  8:05           ` David Carrillo-Cisneros
2017-01-10 10:25 ` [RFC 4/6] perf/core: avoid rb-tree traversal when no inactive events David Carrillo-Cisneros
2017-01-10 10:25 ` [RFC 5/6] perf/core: rotation no longer necessary. Behavior has changed. Beware David Carrillo-Cisneros
2017-01-10 10:25 ` [RFC 6/6] perf/core: use rb-tree index to optimize filtered perf_iterate_ctx David Carrillo-Cisneros
2017-01-16  2:05   ` [lkp-developer] [perf/core] 49c04ee1a7: WARNING:at_kernel/events/core.c:#perf_iterate_ctx_matching kernel test robot
2017-04-25 17:27 ` [RFC 0/6] optimize ctx switch with rb-tree Liang, Kan
2017-04-25 17:49   ` David Carrillo-Cisneros
2017-04-25 18:11     ` Budankov, Alexey
2017-04-25 18:54       ` David Carrillo-Cisneros
2017-04-26 10:34         ` Budankov, Alexey
2017-04-26 19:40           ` David Carrillo-Cisneros
2017-04-26 10:52         ` Mark Rutland

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20170113110118.GB26120@leverpostej \
    --to=mark.rutland@arm.com \
    --cc=acme@kernel.org \
    --cc=ak@linux.intel.com \
    --cc=bp@suse.de \
    --cc=dave.hansen@linux.intel.com \
    --cc=davidcc@google.com \
    --cc=eranian@google.com \
    --cc=kan.liang@intel.com \
    --cc=linux-kernel@vger.kernel.org \
    --cc=mingo@redhat.com \
    --cc=peterz@infradead.org \
    --cc=pjt@google.com \
    --cc=srinivas.pandruvada@linux.intel.com \
    --cc=tglx@linutronix.de \
    --cc=vikas.shivappa@linux.intel.com \
    --cc=vince@deater.net \
    --cc=x86@kernel.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).