Re: [PATCH v2 2/2] perf mem: Support HITM for when mem_lvl_num is used

From: German Gomez <german.gomez@arm.com>
To: Leo Yan <leo.yan@linaro.org>, Ali Saidi <alisaidi@amazon.com>
Cc: acme@kernel.org, alexander.shishkin@linux.intel.com,
	andrew.kilroy@arm.com, benh@kernel.crashing.org,
	james.clark@arm.com, john.garry@huawei.com, jolsa@kernel.org,
	kjain@linux.ibm.com, lihuafei1@huawei.com,
	linux-arm-kernel@lists.infradead.org,
	linux-kernel@vger.kernel.org, linux-perf-users@vger.kernel.org,
	mark.rutland@arm.com, mathieu.poirier@linaro.org,
	mingo@redhat.com, namhyung@kernel.org, peterz@infradead.org,
	will@kernel.org, yao.jin@linux.intel.com,
	Nick.Forrington@arm.com
Subject: Re: [PATCH v2 2/2] perf mem: Support HITM for when mem_lvl_num is used
Date: Wed, 16 Mar 2022 15:10:55 +0000	[thread overview]
Message-ID: <1637567b-42df-57d5-2987-939ffbf451ef@arm.com> (raw)
In-Reply-To: <20220316124208.GA310478@leoy-ThinkPad-X240s>

On 16/03/2022 12:42, Leo Yan wrote:
> On Wed, Mar 16, 2022 at 11:43:52AM +0000, German Gomez wrote:
>
> [...]
>
>>>>> I had a look at the TRMs for the N1[1], V1[2] and N2[3] Neoverse cores
>>>>> (specifically the LL_CACHE_RD pmu events). If we were to assign a number
>>>>> to the system cache (assuming all caches are implemented):
>>>>>
>>>>> *For N1*, if L2 and L3 are implemented, system cache would follow at *L4*
>>>> To date no one has built 4 level though. Everyone has only built three.
>>> The N1SDP board advertises 4 levels (we use it regularly for testing perf patches)
>> That said, it's probably the odd one out.
>>
>> I'm not against assuming 3 levels. Later if there's is a strong need for L4, indeed we can go back and change it.
> Thanks for the info.
>
> For exploring cache hierarchy via sysFS is a good idea, the only one
> concern for me is: can we simply take the system cache as the same
> thing as the highest level cache?  If so, I think another option is to

For Neoverse, it should be. LL_CACHE_RD pmu event says (if system cache is implemented):

* If CPUECTLR.EXTLLC is set: This event counts any cacheable read transaction which returns a data source of 'interconnect cache'.

> define a cache level as "PERF_MEM_LVLNUM_SYSTEM_CACHE" and extend the
> decoding code for support it.
>
> With PERF_MEM_LVLNUM_SYSTEM_CACHE, it can tell users clearly the data
> source from system cache, and users can easily map this info with the
> cache media on the working platform.
>
> In practice, I don't object to use cache level 3 at first step.  At
> least this can meet the requirement at current stage.

Ok, I agree. I think for now it is a good compromise.
Detecting the caches seems like an additional/separate perf feature.

Thanks,
German

> Thanks,
> Leo