Re: [PATCH v2 2/2] perf mem: Support HITM for when mem_lvl_num is used

From: Ali Saidi <alisaidi@amazon.com>
To: <german.gomez@arm.com>
Cc: <Nick.Forrington@arm.com>, <acme@kernel.org>,
	<alexander.shishkin@linux.intel.com>, <alisaidi@amazon.com>,
	<andrew.kilroy@arm.com>, <benh@kernel.crashing.org>,
	<james.clark@arm.com>, <john.garry@huawei.com>,
	<jolsa@kernel.org>, <kjain@linux.ibm.com>, <leo.yan@linaro.org>,
	<lihuafei1@huawei.com>, <linux-arm-kernel@lists.infradead.org>,
	<linux-kernel@vger.kernel.org>,
	<linux-perf-users@vger.kernel.org>, <mark.rutland@arm.com>,
	<mathieu.poirier@linaro.org>, <mingo@redhat.com>,
	<namhyung@kernel.org>, <peterz@infradead.org>, <will@kernel.org>,
	<yao.jin@linux.intel.com>
Subject: Re: [PATCH v2 2/2] perf mem: Support HITM for when mem_lvl_num is used
Date: Mon, 14 Mar 2022 18:37:21 +0000	[thread overview]
Message-ID: <20220314183721.3198-1-alisaidi@amazon.com> (raw)
In-Reply-To: <df73fb93-3892-6713-8ebe-bc57a861ec5d@arm.com>

Hi German and Leo,

On   Mon, 14 Mar 2022 18:00:13 +0000, German Gomez wrote:
> Hi Leo, Ali
> 
> On 14/03/2022 06:33, Leo Yan wrote:
> > On Sun, Mar 13, 2022 at 07:19:33PM +0000, Ali Saidi wrote:
> >
> > [...]
> >
> >>>>> +			if (lvl & P(LVL, L3) || lnum == P(LVLNUM, L4)) {
> >>>> According to a comment in the previous patch, using L4 is specific to Neoverse, right?
> >>>>
> >>>> Maybe we need to distinguish the Neoverse case from the generic one here as well
> >>>>
> >>>> if (is_neoverse)
> >>>> // treat L4 as llc
> >>>> else
> >>>> // treat L3 as llc
> >>> I personally think it's not good idea to distinguish platforms in the decoding code.
> >> I agree here. The more we talk about this, the more I'm wondering if we're
> >> spending too much code solving a problem that doesn't exist. I know of no
> >> Neoverse systems that actually have 4 cache levels, they all actually have three
> >> even though it's technically possible to have four.  I have some doubts anyone
> >> will actually build four levels of cache and perhaps the most prudent path here
> >> is to assume only three levels (and adjust the previous patch) until someone 
> >> actually produces a system with four levels instead of a lot of code that is
> >> never actually exercised?
> > I am not right person to say L4 cache is not implemented in Neoverse
> > platforms; my guess for a "System cache" data source might be L3 or
> > L4 and it is a implementation dependent.  Maybe German or Arm mates
> > could confirm for this.
> 
> I had a look at the TRMs for the N1[1], V1[2] and N2[3] Neoverse cores
> (specifically the LL_CACHE_RD pmu events). If we were to assign a number
> to the system cache (assuming all caches are implemented):
> 
> *For N1*, if L2 and L3 are implemented, system cache would follow at *L4*
To date no one has built 4 level though. Everyone has only built three.

> *For V1 and N2*, if L2 is implemented, system cache would follow at *L3*
> (these don't seem to have the same/similar per-cluster L3 cache from the N1)

And in the future they're not able to build >3. German and Leo if there aren't
strong objections I think the best path forward is for me to respin these
assuming only 3 levels and if someone builds 4 in a far-off-future we can always
change the implementation then. Agreed?

Thanks,
Ali