All of lore.kernel.org
 help / color / mirror / Atom feed
From: Ali Saidi <alisaidi@amazon.com>
To: <leo.yan@linaro.org>
Cc: <Nick.Forrington@arm.com>, <acme@kernel.org>,
	<alexander.shishkin@linux.intel.com>, <alisaidi@amazon.com>,
	<andrew.kilroy@arm.com>, <benh@kernel.crashing.org>,
	<german.gomez@arm.com>, <james.clark@arm.com>,
	<john.garry@huawei.com>, <jolsa@kernel.org>,
	<kjain@linux.ibm.com>, <lihuafei1@huawei.com>,
	<linux-arm-kernel@lists.infradead.org>,
	<linux-kernel@vger.kernel.org>,
	<linux-perf-users@vger.kernel.org>, <mark.rutland@arm.com>,
	<mathieu.poirier@linaro.org>, <mingo@redhat.com>,
	<namhyung@kernel.org>, <peterz@infradead.org>, <will@kernel.org>
Subject: Re: [PATCH v4 2/4] perf arm-spe: Use SPE data source for neoverse cores
Date: Wed, 6 Apr 2022 21:00:17 +0000	[thread overview]
Message-ID: <20220406210017.11887-1-alisaidi@amazon.com> (raw)
In-Reply-To: <20220404151218.GA898573@leoy-ThinkPad-X240s>

On Mon, 4 Apr 2022 15:12:18  +0000, Leo Yan wrote:
> On Sun, Apr 03, 2022 at 08:33:37PM +0000, Ali Saidi wrote:

[...]

> > The latter logic is why I think it's perfectly acceptable to use HITM to
> > indicate a peer cache-to-cache transfer, however since others don't feel that way
> > let me propose a single additional snooping type PERF_MEM_SNOOP_PEER that
> > indicates some peer of the hierarchy below the originating core sourced the
> > data.  This clears up the definition that line came from from a peer and may or
> > may not have been modified, but it doesn't add a lot of implementation dependant
> > functionality into the SNOOP API. 
> > 
> > We could use the mem-level to indicate the level of the cache hierarchy we had
> > to get to before the snoop traveled upward, which seems like what x86 is doing
> > here.
> 
> It makes sense to me that to use the highest cache level as mem-level.
> Please add comments in the code for this, this would be useful for
> understanding the code.

Ok.

> > PEER_CORE -> MEM_SNOOP_PEER + L2
> > PEER_CLSTR -> MEM_SNOOP_PEER + L3
> > PEER_LCL_CLSTR -> MEM_SNOOP_PEER + L3 (since newer neoverse cores don't support
> > the clusters and the existing commercial implementations don't have them).
> 
> Generally, this idea is fine for me.

Great.  

Now the next tricky thing. Since we're not using HITM for recording the memory
events, the question becomes for the c2c output should we output the SNOOP_PEER
events as if they are HITM events with a clarification in the perf-c2c man page
or effectively duplicate all the lcl_hitm logic, which is a fair amount,  in
perf c2c to add a column and sort option? 

> Following your suggestion, if we connect the concepts PoC and PoU in Arm
> reference manual, we can extend the snooping mode with MEM_SNOOP_POU
> (for PoU) and MEM_SNOOP_POC (for PoC), so:
> 
> PEER_CORE -> MEM_SNOOP_POU + L2
> PEER_LCL_CLSTR -> MEM_SNOOP_POU + L3
> PEER_CLSTR -> MEM_SNOOP_POC + L3
> 
> Seems to me, we could consider for this.  If this is over complexity or
> even I said any wrong concepts for this, please use your method.

I think this adds a lot of complexity and reduces clarity. Some systems
implement coherent icaches and the PoU would be the L1 cache, others don't so
that would be the L2 (or wherever there is a unified cache). Similarly, with the
point of coherency, some systems would consider that dram, but other systems
have transparent LLCs and it would be the LLC. 

Thanks,
Ali


_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel

WARNING: multiple messages have this Message-ID (diff)
From: Ali Saidi <alisaidi@amazon.com>
To: <leo.yan@linaro.org>
Cc: <Nick.Forrington@arm.com>, <acme@kernel.org>,
	<alexander.shishkin@linux.intel.com>, <alisaidi@amazon.com>,
	<andrew.kilroy@arm.com>, <benh@kernel.crashing.org>,
	<german.gomez@arm.com>, <james.clark@arm.com>,
	<john.garry@huawei.com>, <jolsa@kernel.org>,
	<kjain@linux.ibm.com>, <lihuafei1@huawei.com>,
	<linux-arm-kernel@lists.infradead.org>,
	<linux-kernel@vger.kernel.org>,
	<linux-perf-users@vger.kernel.org>, <mark.rutland@arm.com>,
	<mathieu.poirier@linaro.org>, <mingo@redhat.com>,
	<namhyung@kernel.org>, <peterz@infradead.org>, <will@kernel.org>
Subject: Re: [PATCH v4 2/4] perf arm-spe: Use SPE data source for neoverse cores
Date: Wed, 6 Apr 2022 21:00:17 +0000	[thread overview]
Message-ID: <20220406210017.11887-1-alisaidi@amazon.com> (raw)
In-Reply-To: <20220404151218.GA898573@leoy-ThinkPad-X240s>

On Mon, 4 Apr 2022 15:12:18  +0000, Leo Yan wrote:
> On Sun, Apr 03, 2022 at 08:33:37PM +0000, Ali Saidi wrote:

[...]

> > The latter logic is why I think it's perfectly acceptable to use HITM to
> > indicate a peer cache-to-cache transfer, however since others don't feel that way
> > let me propose a single additional snooping type PERF_MEM_SNOOP_PEER that
> > indicates some peer of the hierarchy below the originating core sourced the
> > data.  This clears up the definition that line came from from a peer and may or
> > may not have been modified, but it doesn't add a lot of implementation dependant
> > functionality into the SNOOP API. 
> > 
> > We could use the mem-level to indicate the level of the cache hierarchy we had
> > to get to before the snoop traveled upward, which seems like what x86 is doing
> > here.
> 
> It makes sense to me that to use the highest cache level as mem-level.
> Please add comments in the code for this, this would be useful for
> understanding the code.

Ok.

> > PEER_CORE -> MEM_SNOOP_PEER + L2
> > PEER_CLSTR -> MEM_SNOOP_PEER + L3
> > PEER_LCL_CLSTR -> MEM_SNOOP_PEER + L3 (since newer neoverse cores don't support
> > the clusters and the existing commercial implementations don't have them).
> 
> Generally, this idea is fine for me.

Great.  

Now the next tricky thing. Since we're not using HITM for recording the memory
events, the question becomes for the c2c output should we output the SNOOP_PEER
events as if they are HITM events with a clarification in the perf-c2c man page
or effectively duplicate all the lcl_hitm logic, which is a fair amount,  in
perf c2c to add a column and sort option? 

> Following your suggestion, if we connect the concepts PoC and PoU in Arm
> reference manual, we can extend the snooping mode with MEM_SNOOP_POU
> (for PoU) and MEM_SNOOP_POC (for PoC), so:
> 
> PEER_CORE -> MEM_SNOOP_POU + L2
> PEER_LCL_CLSTR -> MEM_SNOOP_POU + L3
> PEER_CLSTR -> MEM_SNOOP_POC + L3
> 
> Seems to me, we could consider for this.  If this is over complexity or
> even I said any wrong concepts for this, please use your method.

I think this adds a lot of complexity and reduces clarity. Some systems
implement coherent icaches and the PoU would be the L1 cache, others don't so
that would be the L2 (or wherever there is a unified cache). Similarly, with the
point of coherency, some systems would consider that dram, but other systems
have transparent LLCs and it would be the LLC. 

Thanks,
Ali


  reply	other threads:[~2022-04-06 21:02 UTC|newest]

Thread overview: 66+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2022-03-24 18:33 [PATCH v4 0/4] perf: arm-spe: Decode SPE source and use for perf c2c Ali Saidi
2022-03-24 18:33 ` Ali Saidi
2022-03-24 18:33 ` [PATCH v4 1/4] tools: arm64: Import cputype.h Ali Saidi
2022-03-24 18:33   ` Ali Saidi
2022-03-25 18:39   ` Arnaldo Carvalho de Melo
2022-03-25 18:39     ` Arnaldo Carvalho de Melo
2022-03-25 18:58     ` Ali Saidi
2022-03-25 18:58       ` Ali Saidi
2022-03-25 19:42     ` Arnaldo Carvalho de Melo
2022-03-25 19:42       ` Arnaldo Carvalho de Melo
2022-03-26  5:49       ` Leo Yan
2022-03-26  5:49         ` Leo Yan
2022-03-26 13:59         ` Arnaldo Carvalho de Melo
2022-03-26 13:59           ` Arnaldo Carvalho de Melo
2022-03-24 18:33 ` [PATCH v4 2/4] perf arm-spe: Use SPE data source for neoverse cores Ali Saidi
2022-03-24 18:33   ` Ali Saidi
2022-03-26 13:47   ` Leo Yan
2022-03-26 13:47     ` Leo Yan
2022-03-26 13:52     ` Arnaldo Carvalho de Melo
2022-03-26 13:52       ` Arnaldo Carvalho de Melo
2022-03-26 13:56       ` Leo Yan
2022-03-26 13:56         ` Leo Yan
2022-03-26 14:04         ` Arnaldo Carvalho de Melo
2022-03-26 14:04           ` Arnaldo Carvalho de Melo
2022-03-26 19:43     ` Ali Saidi
2022-03-26 19:43       ` Ali Saidi
2022-03-27  9:09       ` Leo Yan
2022-03-27  9:09         ` Leo Yan
2022-03-28  3:08       ` Ali Saidi
2022-03-28  3:08         ` Ali Saidi
2022-03-28 13:05         ` Leo Yan
2022-03-28 13:05           ` Leo Yan
2022-03-29 13:34           ` Shuai Xue
2022-03-29 13:34             ` Shuai Xue
2022-03-29 14:32           ` Ali Saidi
2022-03-29 14:32             ` Ali Saidi
2022-03-31 12:19             ` Leo Yan
2022-03-31 12:19               ` Leo Yan
2022-03-31 12:28             ` German Gomez
2022-03-31 12:28               ` German Gomez
2022-03-31 12:44               ` Leo Yan
2022-03-31 12:44                 ` Leo Yan
2022-04-03 20:33                 ` Ali Saidi
2022-04-03 20:33                   ` Ali Saidi
2022-04-04 15:12                   ` Leo Yan
2022-04-04 15:12                     ` Leo Yan
2022-04-06 21:00                     ` Ali Saidi [this message]
2022-04-06 21:00                       ` Ali Saidi
2022-04-08  1:06                       ` Leo Yan
2022-04-08  1:06                         ` Leo Yan
2022-04-07 15:24                 ` German Gomez
2022-04-07 15:24                   ` German Gomez
2022-04-08  1:18                   ` Leo Yan
2022-04-08  1:18                     ` Leo Yan
2022-03-24 18:33 ` [PATCH v4 3/4] perf mem: Support mem_lvl_num in c2c command Ali Saidi
2022-03-24 18:33   ` Ali Saidi
2022-03-26 13:54   ` Arnaldo Carvalho de Melo
2022-03-26 13:54     ` Arnaldo Carvalho de Melo
2022-03-24 18:33 ` [PATCH v4 4/4] perf mem: Support HITM for when mem_lvl_num is any Ali Saidi
2022-03-24 18:33   ` Ali Saidi
2022-03-26  6:23   ` Leo Yan
2022-03-26  6:23     ` Leo Yan
2022-03-26 13:30     ` Arnaldo Carvalho de Melo
2022-03-26 13:30       ` Arnaldo Carvalho de Melo
2022-03-26 19:14     ` Ali Saidi
2022-03-26 19:14       ` Ali Saidi

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20220406210017.11887-1-alisaidi@amazon.com \
    --to=alisaidi@amazon.com \
    --cc=Nick.Forrington@arm.com \
    --cc=acme@kernel.org \
    --cc=alexander.shishkin@linux.intel.com \
    --cc=andrew.kilroy@arm.com \
    --cc=benh@kernel.crashing.org \
    --cc=german.gomez@arm.com \
    --cc=james.clark@arm.com \
    --cc=john.garry@huawei.com \
    --cc=jolsa@kernel.org \
    --cc=kjain@linux.ibm.com \
    --cc=leo.yan@linaro.org \
    --cc=lihuafei1@huawei.com \
    --cc=linux-arm-kernel@lists.infradead.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-perf-users@vger.kernel.org \
    --cc=mark.rutland@arm.com \
    --cc=mathieu.poirier@linaro.org \
    --cc=mingo@redhat.com \
    --cc=namhyung@kernel.org \
    --cc=peterz@infradead.org \
    --cc=will@kernel.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.