From mboxrd@z Thu Jan  1 00:00:00 1970
Return-Path: <linux-arm-kernel-bounces+linux-arm-kernel=archiver.kernel.org@lists.infradead.org>
X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on
	aws-us-west-2-korg-lkml-1.web.codeaurora.org
Received: from bombadil.infradead.org (bombadil.infradead.org [198.137.202.133])
	(using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits))
	(No client certificate requested)
	by smtp.lore.kernel.org (Postfix) with ESMTPS id BCAEEC433EF
	for <linux-arm-kernel@archiver.kernel.org>; Tue, 19 Apr 2022 13:57:51 +0000 (UTC)
DKIM-Signature: v=1; a=rsa-sha256; q=dns/txt; c=relaxed/relaxed;
	d=lists.infradead.org; s=bombadil.20210309; h=Sender:
	Content-Transfer-Encoding:Content-Type:List-Subscribe:List-Help:List-Post:
	List-Archive:List-Unsubscribe:List-Id:MIME-Version:References:In-Reply-To:
	Subject:Cc:To:From:Message-ID:Date:Reply-To:Content-ID:Content-Description:
	Resent-Date:Resent-From:Resent-Sender:Resent-To:Resent-Cc:Resent-Message-ID:
	List-Owner; bh=Xqzge3mP7vnb9H5njCnVrpeZ3qaVKU6jd1l5dz2qZEQ=; b=JAfg/KNvrQaosZ
	nfck7i7qCIRrEjMpRoHAxgp2R8y4vJEHTPdOPAs+At9BvHZBJDFfFWDemw+BmiaimABj5ORKERIyE
	SFT5NS8lFkn3wibc7UnGNhy1O1m0+46ujWJlphpF/sSNFr4eu/6r20dTrRgl0Igf/pKZD4+sIiAFp
	ri7gWUpdRR3XvxNe5e2uN5FbnsB3N5yhyEirReg0QnhZQPJugUhCRDy5C/LhlCg2eYkMChNXeIvjA
	tq1K60ArheO2y3qV4blUq+Xl6MDU5iKXHWB/seIbWsnkkJVf5MYiT36tnprjYW+lssBlqadTb0rjj
	6UTRYnqQJvaoG57fvEmg==;
Received: from localhost ([::1] helo=bombadil.infradead.org)
	by bombadil.infradead.org with esmtp (Exim 4.94.2 #2 (Red Hat Linux))
	id 1ngoK0-0042qj-VN; Tue, 19 Apr 2022 13:55:13 +0000
Received: from ams.source.kernel.org ([145.40.68.75])
 by bombadil.infradead.org with esmtps (Exim 4.94.2 #2 (Red Hat Linux))
 id 1ngnbY-003ixk-Cm
 for linux-arm-kernel@lists.infradead.org; Tue, 19 Apr 2022 13:09:19 +0000
Received: from smtp.kernel.org (relay.kernel.org [52.25.139.140])
 (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits))
 (No client certificate requested)
 by ams.source.kernel.org (Postfix) with ESMTPS id 31518B818EE;
 Tue, 19 Apr 2022 13:09:14 +0000 (UTC)
Received: by smtp.kernel.org (Postfix) with ESMTPSA id AAA7FC385A5;
 Tue, 19 Apr 2022 13:09:12 +0000 (UTC)
DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=kernel.org;
 s=k20201202; t=1650373752;
 bh=AFWQMdgF5I9NNEFVTwY0PE8ofYiwAgY9cIFd9d2iTD8=;
 h=Date:From:To:Cc:Subject:In-Reply-To:References:From;
 b=Ex4IebPN2WuSVEL/qHH6BR2aWxmTekDFglw6KBq+CYpJvLJm2TaG7yq9JH1ok//wn
 v9sTYsP600ec2iTRKORe/DuR5CEs5DDX0U6rElYXFiSc8eE2LFStLz+glA+he9otLR
 LhyQkwM/yXdqfL4JmJxm3SsypLUi4zaiDTdR52c22JXk1QAZsfO/FBjBe+tVV1G2Gh
 Lxganh7zQ/thEzcULj9r8xUCft0DjHJBwlLWGGLxi9mEFDw9BYPzTCJpR/shVfErQU
 Xzr1rZXxB9w0Re15AtSVXF5Tgd31MbwJfHeVk6M7sSyQu8zsJuUIHy7HmC6/vG3xU9
 aSdxEK8JTDhvA==
Received: from sofa.misterjones.org ([185.219.108.64] helo=why.misterjones.org)
 by disco-boy.misterjones.org with esmtpsa (TLS1.3) tls
 TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384 (Exim 4.94.2)
 (envelope-from <maz@kernel.org>)
 id 1ngnbS-005Kkz-6g; Tue, 19 Apr 2022 14:09:10 +0100
Date: Tue, 19 Apr 2022 14:09:09 +0100
Message-ID: <87pmld9q4q.wl-maz@kernel.org>
From: Marc Zyngier <maz@kernel.org>
To: Yichao Yu <yyc1992@gmail.com>
Cc: linux-arm-kernel@lists.infradead.org, khuong@os.amperecomputing.com,
 will@kernel.org, mark.rutland@arm.com, Frank.li@nxp.com,
 zhangshaokun@hisilicon.com, liuqi115@huawei.com, john.garry@huawei.com,
 Mathieu Poirier <mathieu.poirier@linaro.org>, Leo Yan <leo.yan@linaro.org>
Subject: Re: Kernel perf counter support (for apple M1 and others)
In-Reply-To: <CAMvDr+R5TE4C++wNYAre-GJQMBXcjmmp=WcXx_KONAky8amE+Q@mail.gmail.com>
References: <CAMvDr+Q-4gzYozLV8f8R8DgTEDYrkvYAOC+ND6bm11L+4mQDWw@mail.gmail.com>
 <87o80yior2.wl-maz@kernel.org>
 <CAMvDr+R5TE4C++wNYAre-GJQMBXcjmmp=WcXx_KONAky8amE+Q@mail.gmail.com>
User-Agent: Wanderlust/2.15.9 (Almost Unreal) SEMI-EPG/1.14.7 (Harue)
 FLIM-LB/1.14.9 (=?UTF-8?B?R29qxY0=?=) APEL-LB/10.8 EasyPG/1.0.0 Emacs/27.1
 (x86_64-pc-linux-gnu) MULE/6.0 (HANACHIRUSATO)
MIME-Version: 1.0 (generated by SEMI-EPG 1.14.7 - "Harue")
X-SA-Exim-Connect-IP: 185.219.108.64
X-SA-Exim-Rcpt-To: yyc1992@gmail.com, linux-arm-kernel@lists.infradead.org,
 khuong@os.amperecomputing.com, will@kernel.org, mark.rutland@arm.com,
 Frank.li@nxp.com, zhangshaokun@hisilicon.com, liuqi115@huawei.com,
 john.garry@huawei.com, mathieu.poirier@linaro.org, leo.yan@linaro.org
X-SA-Exim-Mail-From: maz@kernel.org
X-SA-Exim-Scanned: No (on disco-boy.misterjones.org);
 SAEximRunCond expanded to false
X-CRM114-Version: 20100106-BlameMichelson ( TRE 0.8.0 (BSD) ) MR-646709E3 
X-CRM114-CacheID: sfid-20220419_060916_779778_9DAE0699 
X-CRM114-Status: GOOD (  80.17  )
X-BeenThere: linux-arm-kernel@lists.infradead.org
X-Mailman-Version: 2.1.34
Precedence: list
List-Id: <linux-arm-kernel.lists.infradead.org>
List-Unsubscribe: <http://lists.infradead.org/mailman/options/linux-arm-kernel>, 
 <mailto:linux-arm-kernel-request@lists.infradead.org?subject=unsubscribe>
List-Archive: <http://lists.infradead.org/pipermail/linux-arm-kernel/>
List-Post: <mailto:linux-arm-kernel@lists.infradead.org>
List-Help: <mailto:linux-arm-kernel-request@lists.infradead.org?subject=help>
List-Subscribe: <http://lists.infradead.org/mailman/listinfo/linux-arm-kernel>, 
 <mailto:linux-arm-kernel-request@lists.infradead.org?subject=subscribe>
Content-Type: text/plain; charset="us-ascii"
Content-Transfer-Encoding: 7bit
Sender: "linux-arm-kernel" <linux-arm-kernel-bounces@lists.infradead.org>
Errors-To: linux-arm-kernel-bounces+linux-arm-kernel=archiver.kernel.org@lists.infradead.org

On Tue, 19 Apr 2022 13:06:37 +0100,
Yichao Yu <yyc1992@gmail.com> wrote:
> 
> > - I don't think there is any value in stashing any of these HW events
> >   in the kernel. In most cases, the kernel definition only matches the
> >   x86 definition, and doesn't accurately describe the vast majority of
> >   the events implemented on an ARM CPU. The ARM architecture mentions
> >   a handful of architectural events that actually match the kernel
> >   definition, and for these CPUs the kernel carries the in-kernel
> >   description.
> >
> > - For the M1, none of the above applies, because there is *NO*
> >   architectural description for the events reported by the (non
> >   architectural) PMU, and there is no guarantee that they actually
> >   match the common understanding we have of these events.
> 
> You mentioned documents from Apple on IRC and below. Why is that the
> only acceptable source?

Because that would be the only one giving an exact definition to what
you are counting. Anything else is guess-work. Very good guess-work,
I'm sure, but still very much a wet finger in the air.

> The entire support for M1 is based on reverse engineering/testing of
> the hardware so why would those not be acceptable sources here as
> well?

Because there is a difference between getting something to work (the
PMU driver itself) and interpreting the results it gives. All we know
is that it is counting something. You can sort of guess what, but you
don't know for sure.

> My understanding is that the current cycles and instructions counters
> were figured out this way so I don't see why you want them to be
> removed.

Because you use them as an argument to pile more crap in the
kernel. Gee, at this stage, it is the driver itself I am going to
remove.

> There are also other counters that I believe are matching the
> canonical definitions and I don't see why those should be left out
> either.
>
> > - The correct place for these non-architectural events is in a JSON
> >   description that would be built into perf, which would give you
> >   symbolic events. Bloating the kernel for something we're not sure
> >   about seems counterproductive.
> 
> As I've mentioned before, perf isn't the only user of performance
> counters. If there is a shared place, or even good document, for this,
> it might have been better.

Feel free to write a document or something else. The only thing I care
about is in the kernel tree.

> Currently, just by reading the document of the hardware event type, it
> seems that it should work if the hardware supports such counters.

Such document would be the JSON file I mentioned. But since you have
stated that you don't intend to write anything that ends up in the
kernel, I guess that's a moot point.

> > > 2. Should the kernel provide names for hardware events? Here I'm
> > > talking about things under
> > > `/sys/bus/event_source/devices/<pmu>/events` which I assume is
> > > provided by the kernel (that or my understanding of sysfs has been
> > > fundamentally wrong/out-of-date...). Based on the fact that the
> > > current pmu kernel driver for the M1 does provide this and this
> > > comment https://github.com/torvalds/linux/blob/e8b767f5e04097aaedcd6e06e2270f9fe5282696/drivers/perf/apple_m1_cpu_pmu.c#L31
> > > I assume it's desired. This would also agree with what I've observed
> > > on other (including non-x86) systems. If this is the case, I assume
> > > the kernel driver for the M1 PMU isn't fully "done" yet.
> >
> > See my reply above: there are no architectural descriptions for these
> > events, and we don't know how closely they match the definition Linux
> > has. If one day Apple shows up and tells us how close these events are
> > from their Linux (and thus x86) definition, we can expand this. Until
> > then, the interpretation belongs, IMHO, to userspace.
> >
> > I'd rather *remove* CYCLES and INSTRUCTIONS definitions from the
> > kernel than add any other.
> 
> Replied above.
> 
> > > 3. For counting events on a system with asymmetric cores.
> > >     I understand that if the system contains multiple processors of
> > > different characteristics, it may not make sense to provide a counter
> > > that counts events on both (or all) types of cores. However, there are
> > > events (PERF_COUNT_HW_INSTRUCTIONS and
> > > PERF_COUNT_HW_BRANCH_INSTRUCTIONS at the least) that shouldn't really
> > > be affected by this (and in fact, any counters that counts events
> > > visible directly to the software/userspace). I want to even say that
> > > branch misses/cache reference/misses might be in this category as well
> > > although certainly not as clear cut.
> >
> > That boat has sailed a long time ago, when the BL PMU support was
> > introduced, and all counters are treated equally: they are *NOT*
> > counted globally. Changing this would be an ABI break, and I seriously
> > doubt we want to go there.
> 
> Sorry I'm not familiar with the names here. What's the "BL PMU"
> support? And what are the counters that are not counted globally?

BL stands for Big-Little. Asymmetric support, if you want. None of the
counters are counted globally, only per PMU type. And this is an ABI
we cannot break.

> 
> > It would also mean that the kernel would need to know which counters
> > it can accumulate over the various CPU types (which is often more than
> > 2, these days). All of that to save userspace adding things? I doubt
> > this is worth it.
> >
> > > 4. There are other events that may not make as much sense to combine
> > > (cycles for example). However, I feel like a combined cycle count
> > > isn't going to be much tricker to use given that the cycle count on a
> > > single core is still affected by frequency scaling and it can still be
> > > used correctly by pinning the thread.
> >
> > I don't understand what frequency scaling has anything to do with this
> > (a cycle is still a cycle at any frequency).
> 
> Exactly, a cycle is still a cycle, so I don't see why it's that big a
> problem to count it globally.

Because you are going to walk the list of events generated during a
time slice, work out which ones are to be merged and which ones
aren't, and accumulate them into global, userspace visible counters? I
dread to imagine the effect on scheduling latency. All that to avoid
adding two values into userspace. Great.

> What I meant exactly was that if a code runs for 100 cycles at 1 GHz,
> it doesn't mean it'll also run (close to) 100 cycles at 3 GHz.
> Similarly, if it runs for 100 cycles on the E core, it doesn't mean
> it'll run for 100 cycles on the P core.

And? What do you derive from this set of statements?

> We already allow the former case to count using the same counter
> everywhere, I don't see why the latter can't be allowed. (ABI change
> issue aside)

*blink*. If you don't see a problem with changing the ABI, I'm at a
loss.

> I don't have hardware to test this but it also seems that on the new
> intel chips, the E core and the P core are counted together. (this is
> purely based on the lack of multiple counter support in rr to support
> the new chip...)

Colour me uninterested on both count. x86 can do whatever they want.

> 
> > > The main reasons I'm asking about 3 and 4 is that
> > > 1. Right now, even to just count instructions without pinning the
> > > thread, I need to create two counters.
> >
> > How bad is that? I mean, the counters are per-CPU anyway, so there
> > *are* N counters (N being the number of CPUs). You only have to create
> > a counter per PMU.
> >
> > > 2. Even if the number isn't exactly accurate, it can still be useful
> > > as a general guideline. Right now, even if I just want to do a quick
> > > check, I still need to manually specify a dozen of events in `perf
> > > stat -e` rather than simply using `perf stat` (to make it worse, perf
> > > doesn't even provide any useful warning about it). It is also much
> > > harder to do things generically (which is at least partially because
> > > of the lack of documentation....).
> >
> > I see this as a potential perf-tool improvement. Being able to say
> > 'Count this event on all CPU PMUs'  would certainly be valuable to all
> > asymmetric systems.
> 
> Short answer is not that bad if and only if there's a standard and
> documented way to do this, userspace or kernel.

Feel free to improve the kernel documentation[1], which is admittedly
pretty sparse on the subject.

> (A userspace solution that automatically sums counters value together
> would also need to handle the grouping as well).
> However, as I mentioned above, based on the document I can find, there
> isn't a standard interface for a userspace program to figure out how
> to use these counters correctly and the document for perf_event_open
> also doesn't mention these kinds of limitations. These are what my
> expectation of the kernel interface come from.

The kernel gives you the tools to match PMUs and CPUs (just rummage in
sysfs). If userspace knows which counter is what, you're in business.
Do document your findings, by any mean.

	M.

[1] https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/tree/Documentation/arm64/perf.rst#n136

-- 
Without deviation from the norm, progress is not possible.

_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel