From: Song Liu <songliubraving@fb.com>
To: Peter Zijlstra <peterz@infradead.org>
Cc: Alexey Budankov <alexey.budankov@linux.intel.com>,
Ingo Molnar <mingo@kernel.org>,
lkml <linux-kernel@vger.kernel.org>,
"acme@kernel.org" <acme@kernel.org>,
Alexander Shishkin <alexander.shishkin@linux.intel.com>,
Jiri Olsa <jolsa@redhat.com>,
Stephane Eranian <eranian@google.com>,
"Thomas Gleixner" <tglx@linutronix.de>,
"mark.rutland@arm.com" <mark.rutland@arm.com>,
"megha.dey@intel.com" <megha.dey@intel.com>,
"frederic@kernel.org" <frederic@kernel.org>
Subject: Re: [RFC][PATCH] perf: Rewrite core context handling
Date: Tue, 16 Oct 2018 18:28:10 +0000 [thread overview]
Message-ID: <FE66AE20-9785-4CF7-8D8D-CF2A9C696923@fb.com> (raw)
In-Reply-To: <AF0FCD08-E014-4D29-B023-EA780056DBB5@fb.com>
Hi Peter,
> On Oct 15, 2018, at 3:09 PM, Song Liu <songliubraving@fb.com> wrote:
>
>
>
>> On Oct 15, 2018, at 1:34 AM, Peter Zijlstra <peterz@infradead.org> wrote:
>>
>> On Mon, Oct 15, 2018 at 10:26:06AM +0300, Alexey Budankov wrote:
>>> Hi,
>>>
>>> On 10.10.2018 13:45, Peter Zijlstra wrote:
>>>> Hi all,
>>>>
>>>> There have been various issues and limitations with the way perf uses
>>>> (task) contexts to track events. Most notable is the single hardware PMU
>>>> task context, which has resulted in a number of yucky things (both
>>>> proposed and merged).
>>>>
>>>> Notably:
>>>>
>>>> - HW breakpoint PMU
>>>> - ARM big.little PMU
>>>> - Intel Branch Monitoring PMU
>>>>
>>>> Since we now track the events in RB trees, we can 'simply' add a pmu
>>>> order to them and have them grouped that way, reducing to a single
>>>> context. Of course, reality never quite works out that simple, and below
>>>> ends up adding an intermediate data structure to bridge the context ->
>>>> pmu mapping.
>>>>
>>>> Something a little like:
>>>>
>>>> ,------------------------[1:n]---------------------.
>>>> V V
>>>> perf_event_context <-[1:n]-> perf_event_pmu_context <--- perf_event
>>>> ^ ^ | |
>>>> `--------[1:n]---------' `-[n:1]-> pmu <-[1:n]-'
>>>>
>>>> This patch builds (provided you disable CGROUP_PERF), boots and survives
>>>> perf-top without the machine catching fire.
>>>>
>>>> There's still a fair bit of loose ends (look for XXX), but I think this
>>>> is the direction we should be going.
>>>>
>>>> Comments?
>>>>
>>>> Not-Quite-Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
>>>> ---
>>>> arch/powerpc/perf/core-book3s.c | 4
>>>> arch/x86/events/core.c | 4
>>>> arch/x86/events/intel/core.c | 6
>>>> arch/x86/events/intel/ds.c | 6
>>>> arch/x86/events/intel/lbr.c | 16
>>>> arch/x86/events/perf_event.h | 6
>>>> include/linux/perf_event.h | 80 +-
>>>> include/linux/sched.h | 2
>>>> kernel/events/core.c | 1412 ++++++++++++++++++++--------------------
>>>> 9 files changed, 815 insertions(+), 721 deletions(-)
>>>
>>> Rewrite is impressive however it doesn't result in code base reduction as it is.
>>
>> Yeah.. that seems to be nature of these things ..
>>
>>> Nonetheless there is a clear demand for per pmu events groups tracking and rotation
>>> in single cpu context (HW breakpoints, ARM big.little, Intel LBRs) and there is
>>> a supply thru groups ordering on RB-tree.
>>>
>>> This might be driven into the kernel by some new Perf features that would base on
>>> that RB-tree groups ordering or by refactoring of existing code but in the way it
>>> would result in overall code base reduction thus lowering support cost.
>>
>> If you have a concrete suggestion on how to reduce complexity? I tried,
>> but couldn't find any (without breaking something).
>>
>> The active lists and pmu_ctx_list could arguably be replaced with
>> (slower) iteratons over the RB tree, but you'll still need the per pmu
>> nr_events/nr_active counts to determine if rotation is required at all.
>>
>> And like you know, performance is quite important here too. I'd love to
>> reduce complexity while maintaining or improve performance, but that
>> rarely if ever happens :/
>
> How about this:
>
> 1. Keep multiple perf_cpu_context per CPU, just like before this patch.
>
> 2. For perf_event_context, add PMU as an order for the RB tree.
>
> 3. (hw) pmu->perf_cpu_context->ctx only has events for this PMU (and sw
> events moved to this context).
>
> 4. task->perf_event_ctxp has events for all PMUs.
>
> With this path, we keep the existing perf_cpu_context/perf_event_context
> logic as-is, which I think is simp\x10ler than the new logic (with extra
> *_pmu_context). And it should also solve the problem.
>
> Does this make sense? If this doesn't look too broken, I am happy to
> draft RFC for it.
>
I am not sure whether you missed this one, or found it totally insane.
Could you please share your comments on it? My gut feeling is that this
would be a simpler patch to solve the problem (two hw PMUs). (It might
be less efficient though).
Thanks,
Song
next prev parent reply other threads:[~2018-10-16 18:28 UTC|newest]
Thread overview: 38+ messages / expand[flat|nested] mbox.gz Atom feed top
2018-10-10 10:45 [RFC][PATCH] perf: Rewrite core context handling Peter Zijlstra
2018-10-11 7:50 ` Song Liu
2018-10-11 9:29 ` Peter Zijlstra
2018-10-11 22:37 ` Song Liu
2018-10-12 9:50 ` Peter Zijlstra
2018-10-12 14:25 ` Peter Zijlstra
2018-10-13 8:31 ` Song Liu
2018-10-16 9:50 ` Peter Zijlstra
2018-10-16 16:34 ` Song Liu
2018-10-16 18:10 ` Peter Zijlstra
2018-10-16 18:24 ` Song Liu
2018-10-12 7:04 ` Alexey Budankov
2018-10-12 11:54 ` Peter Zijlstra
2018-10-15 7:26 ` Alexey Budankov
2018-10-15 8:34 ` Peter Zijlstra
2018-10-15 8:53 ` Peter Zijlstra
2018-10-15 17:29 ` Alexey Budankov
2018-10-15 18:31 ` Stephane Eranian
2018-10-16 6:39 ` Alexey Budankov
2018-10-16 9:32 ` Peter Zijlstra
2018-10-15 22:09 ` Song Liu
2018-10-16 18:28 ` Song Liu [this message]
2018-10-17 11:06 ` Peter Zijlstra
2018-10-17 16:43 ` Song Liu
2018-10-17 17:19 ` Peter Zijlstra
2018-10-17 18:33 ` Peter Zijlstra
2018-10-17 18:57 ` Song Liu
2018-10-16 16:26 ` Mark Rutland
2018-10-16 18:07 ` Peter Zijlstra
2018-10-17 8:57 ` Alexey Budankov
2018-10-17 15:01 ` Alexander Shishkin
2018-10-17 15:58 ` Alexey Budankov
2018-10-17 16:30 ` Peter Zijlstra
2018-10-18 7:05 ` Alexey Budankov
2018-10-22 13:26 ` Alexander Shishkin
2018-10-23 6:13 ` Song Liu
2018-10-23 6:55 ` Peter Zijlstra
2019-05-15 11:17 ` Alexander Shishkin
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=FE66AE20-9785-4CF7-8D8D-CF2A9C696923@fb.com \
--to=songliubraving@fb.com \
--cc=acme@kernel.org \
--cc=alexander.shishkin@linux.intel.com \
--cc=alexey.budankov@linux.intel.com \
--cc=eranian@google.com \
--cc=frederic@kernel.org \
--cc=jolsa@redhat.com \
--cc=linux-kernel@vger.kernel.org \
--cc=mark.rutland@arm.com \
--cc=megha.dey@intel.com \
--cc=mingo@kernel.org \
--cc=peterz@infradead.org \
--cc=tglx@linutronix.de \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).