All of lore.kernel.org
 help / color / mirror / Atom feed
From: Song Liu <songliubraving@fb.com>
To: Peter Zijlstra <peterz@infradead.org>
Cc: Alexey Budankov <alexey.budankov@linux.intel.com>,
	Ingo Molnar <mingo@kernel.org>,
	lkml <linux-kernel@vger.kernel.org>,
	"acme@kernel.org" <acme@kernel.org>,
	Alexander Shishkin <alexander.shishkin@linux.intel.com>,
	Jiri Olsa <jolsa@redhat.com>,
	Stephane Eranian <eranian@google.com>,
	"Thomas Gleixner" <tglx@linutronix.de>,
	"mark.rutland@arm.com" <mark.rutland@arm.com>,
	"megha.dey@intel.com" <megha.dey@intel.com>,
	"frederic@kernel.org" <frederic@kernel.org>
Subject: Re: [RFC][PATCH] perf: Rewrite core context handling
Date: Tue, 16 Oct 2018 18:28:10 +0000	[thread overview]
Message-ID: <FE66AE20-9785-4CF7-8D8D-CF2A9C696923@fb.com> (raw)
In-Reply-To: <AF0FCD08-E014-4D29-B023-EA780056DBB5@fb.com>

Hi Peter,

> On Oct 15, 2018, at 3:09 PM, Song Liu <songliubraving@fb.com> wrote:
> 
> 
> 
>> On Oct 15, 2018, at 1:34 AM, Peter Zijlstra <peterz@infradead.org> wrote:
>> 
>> On Mon, Oct 15, 2018 at 10:26:06AM +0300, Alexey Budankov wrote:
>>> Hi,
>>> 
>>> On 10.10.2018 13:45, Peter Zijlstra wrote:
>>>> Hi all,
>>>> 
>>>> There have been various issues and limitations with the way perf uses
>>>> (task) contexts to track events. Most notable is the single hardware PMU
>>>> task context, which has resulted in a number of yucky things (both
>>>> proposed and merged).
>>>> 
>>>> Notably:
>>>> 
>>>> - HW breakpoint PMU
>>>> - ARM big.little PMU
>>>> - Intel Branch Monitoring PMU
>>>> 
>>>> Since we now track the events in RB trees, we can 'simply' add a pmu
>>>> order to them and have them grouped that way, reducing to a single
>>>> context. Of course, reality never quite works out that simple, and below
>>>> ends up adding an intermediate data structure to bridge the context ->
>>>> pmu mapping.
>>>> 
>>>> Something a little like:
>>>> 
>>>>             ,------------------------[1:n]---------------------.
>>>>             V                                                  V
>>>>   perf_event_context <-[1:n]-> perf_event_pmu_context <--- perf_event
>>>>             ^                      ^     |                     |
>>>>             `--------[1:n]---------'     `-[n:1]-> pmu <-[1:n]-'
>>>> 
>>>> This patch builds (provided you disable CGROUP_PERF), boots and survives
>>>> perf-top without the machine catching fire.
>>>> 
>>>> There's still a fair bit of loose ends (look for XXX), but I think this
>>>> is the direction we should be going.
>>>> 
>>>> Comments?
>>>> 
>>>> Not-Quite-Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
>>>> ---
>>>> arch/powerpc/perf/core-book3s.c |    4 
>>>> arch/x86/events/core.c          |    4 
>>>> arch/x86/events/intel/core.c    |    6 
>>>> arch/x86/events/intel/ds.c      |    6 
>>>> arch/x86/events/intel/lbr.c     |   16 
>>>> arch/x86/events/perf_event.h    |    6 
>>>> include/linux/perf_event.h      |   80 +-
>>>> include/linux/sched.h           |    2 
>>>> kernel/events/core.c            | 1412 ++++++++++++++++++++--------------------
>>>> 9 files changed, 815 insertions(+), 721 deletions(-)
>>> 
>>> Rewrite is impressive however it doesn't result in code base reduction as it is.
>> 
>> Yeah.. that seems to be nature of these things ..
>> 
>>> Nonetheless there is a clear demand for per pmu events groups tracking and rotation 
>>> in single cpu context (HW breakpoints, ARM big.little, Intel LBRs) and there is 
>>> a supply thru groups ordering on RB-tree.
>>> 
>>> This might be driven into the kernel by some new Perf features that would base on 
>>> that RB-tree groups ordering or by refactoring of existing code but in the way it 
>>> would result in overall code base reduction thus lowering support cost.
>> 
>> If you have a concrete suggestion on how to reduce complexity? I tried,
>> but couldn't find any (without breaking something).
>> 
>> The active lists and pmu_ctx_list could arguably be replaced with
>> (slower) iteratons over the RB tree, but you'll still need the per pmu
>> nr_events/nr_active counts to determine if rotation is required at all.
>> 
>> And like you know, performance is quite important here too. I'd love to
>> reduce complexity while maintaining or improve performance, but that
>> rarely if ever happens :/
> 
> How about this: 
> 
> 1. Keep multiple perf_cpu_context per CPU, just like before this patch. 
> 
> 2. For perf_event_context, add PMU as an order for the RB tree. 
> 
> 3. (hw) pmu->perf_cpu_context->ctx only has events for this PMU (and sw 
>   events moved to this context).
> 
> 4. task->perf_event_ctxp has events for all PMUs. 
> 
> With this path, we keep the existing perf_cpu_context/perf_event_context
> logic as-is, which I think is simp\x10ler than the new logic (with extra
> *_pmu_context). And it should also solve the problem. 
> 
> Does this make sense? If this doesn't look too broken, I am happy to
> draft RFC for it. 
> 

I am not sure whether you missed this one, or found it totally insane. 
Could you please share your comments on it? My gut feeling is that this 
would be a simpler patch to solve the problem (two hw PMUs). (It might 
be less efficient though). 

Thanks,
Song 



  reply	other threads:[~2018-10-16 18:28 UTC|newest]

Thread overview: 38+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2018-10-10 10:45 [RFC][PATCH] perf: Rewrite core context handling Peter Zijlstra
2018-10-11  7:50 ` Song Liu
2018-10-11  9:29   ` Peter Zijlstra
2018-10-11 22:37     ` Song Liu
2018-10-12  9:50       ` Peter Zijlstra
2018-10-12 14:25         ` Peter Zijlstra
2018-10-13  8:31         ` Song Liu
2018-10-16  9:50           ` Peter Zijlstra
2018-10-16 16:34             ` Song Liu
2018-10-16 18:10               ` Peter Zijlstra
2018-10-16 18:24                 ` Song Liu
2018-10-12  7:04     ` Alexey Budankov
2018-10-12 11:54       ` Peter Zijlstra
2018-10-15  7:26 ` Alexey Budankov
2018-10-15  8:34   ` Peter Zijlstra
2018-10-15  8:53     ` Peter Zijlstra
2018-10-15 17:29     ` Alexey Budankov
2018-10-15 18:31       ` Stephane Eranian
2018-10-16  6:39         ` Alexey Budankov
2018-10-16  9:32         ` Peter Zijlstra
2018-10-15 22:09     ` Song Liu
2018-10-16 18:28       ` Song Liu [this message]
2018-10-17 11:06         ` Peter Zijlstra
2018-10-17 16:43           ` Song Liu
2018-10-17 17:19             ` Peter Zijlstra
2018-10-17 18:33               ` Peter Zijlstra
2018-10-17 18:57                 ` Song Liu
2018-10-16 16:26 ` Mark Rutland
2018-10-16 18:07   ` Peter Zijlstra
2018-10-17  8:57 ` Alexey Budankov
2018-10-17 15:01   ` Alexander Shishkin
2018-10-17 15:58     ` Alexey Budankov
2018-10-17 16:30   ` Peter Zijlstra
2018-10-18  7:05     ` Alexey Budankov
2018-10-22 13:26 ` Alexander Shishkin
2018-10-23  6:13 ` Song Liu
2018-10-23  6:55   ` Peter Zijlstra
2019-05-15 11:17 ` Alexander Shishkin

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=FE66AE20-9785-4CF7-8D8D-CF2A9C696923@fb.com \
    --to=songliubraving@fb.com \
    --cc=acme@kernel.org \
    --cc=alexander.shishkin@linux.intel.com \
    --cc=alexey.budankov@linux.intel.com \
    --cc=eranian@google.com \
    --cc=frederic@kernel.org \
    --cc=jolsa@redhat.com \
    --cc=linux-kernel@vger.kernel.org \
    --cc=mark.rutland@arm.com \
    --cc=megha.dey@intel.com \
    --cc=mingo@kernel.org \
    --cc=peterz@infradead.org \
    --cc=tglx@linutronix.de \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.