linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [PATCH 0/5] perf: arm64: Support ARMv8.3-SPE extensions
@ 2020-09-22 10:12 Andre Przywara
  2020-09-22 10:12 ` [PATCH 1/5] arm64: spe: Allow new bits in SPE filter register Andre Przywara
                   ` (4 more replies)
  0 siblings, 5 replies; 22+ messages in thread
From: Andre Przywara @ 2020-09-22 10:12 UTC (permalink / raw)
  To: Will Deacon, Catalin Marinas, Peter Zijlstra, Ingo Molnar,
	Arnaldo Carvalho de Melo
  Cc: Mark Rutland, Alexander Shishkin, Jiri Olsa, Namhyung Kim,
	Suzuki K Poulose, Leo Yan, Tan Xiaojun, James Clark,
	linux-arm-kernel, linux-kernel

The "ARMv8.3-SPE extensions" add some bits to SPE to cover newer
architecture features, most prominently SVE.

Add the new bits where needed, mostly to perf's SPE packet decoder.

Cheers,
Andre

Andre Przywara (5):
  arm64: spe: Allow new bits in SPE filter register
  perf: arm_spe: Add new event packet bits
  perf: arm_spe: Add nested virt event decoding
  perf: arm_spe: Decode memory tagging properties
  perf: arm_spe: Decode SVE events

 arch/arm64/include/asm/sysreg.h               |  2 +-
 .../arm-spe-decoder/arm-spe-pkt-decoder.c     | 86 +++++++++++++++++--
 2 files changed, 81 insertions(+), 7 deletions(-)

-- 
2.17.1


^ permalink raw reply	[flat|nested] 22+ messages in thread

* [PATCH 1/5] arm64: spe: Allow new bits in SPE filter register
  2020-09-22 10:12 [PATCH 0/5] perf: arm64: Support ARMv8.3-SPE extensions Andre Przywara
@ 2020-09-22 10:12 ` Andre Przywara
  2020-09-27  2:51   ` Leo Yan
  2020-09-22 10:12 ` [PATCH 2/5] perf: arm_spe: Add new event packet bits Andre Przywara
                   ` (3 subsequent siblings)
  4 siblings, 1 reply; 22+ messages in thread
From: Andre Przywara @ 2020-09-22 10:12 UTC (permalink / raw)
  To: Will Deacon, Catalin Marinas, Peter Zijlstra, Ingo Molnar,
	Arnaldo Carvalho de Melo
  Cc: Mark Rutland, Alexander Shishkin, Jiri Olsa, Namhyung Kim,
	Suzuki K Poulose, Leo Yan, Tan Xiaojun, James Clark,
	linux-arm-kernel, linux-kernel

The ARMv8.3-SPE extension adds some new bits for the event filter.

Remove bits 11, 17 and 18 from the RES0 mask, so they can be used
correctly.

Signed-off-by: Andre Przywara <andre.przywara@arm.com>
---
 arch/arm64/include/asm/sysreg.h | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/arch/arm64/include/asm/sysreg.h b/arch/arm64/include/asm/sysreg.h
index 554a7e8ecb07..efca4ee28671 100644
--- a/arch/arm64/include/asm/sysreg.h
+++ b/arch/arm64/include/asm/sysreg.h
@@ -281,7 +281,7 @@
 #define SYS_PMSFCR_EL1_ST_SHIFT		18
 
 #define SYS_PMSEVFR_EL1			sys_reg(3, 0, 9, 9, 5)
-#define SYS_PMSEVFR_EL1_RES0		0x0000ffff00ff0f55UL
+#define SYS_PMSEVFR_EL1_RES0		0x0000ffff00f90755UL
 
 #define SYS_PMSLATFR_EL1		sys_reg(3, 0, 9, 9, 6)
 #define SYS_PMSLATFR_EL1_MINLAT_SHIFT	0
-- 
2.17.1


^ permalink raw reply related	[flat|nested] 22+ messages in thread

* [PATCH 2/5] perf: arm_spe: Add new event packet bits
  2020-09-22 10:12 [PATCH 0/5] perf: arm64: Support ARMv8.3-SPE extensions Andre Przywara
  2020-09-22 10:12 ` [PATCH 1/5] arm64: spe: Allow new bits in SPE filter register Andre Przywara
@ 2020-09-22 10:12 ` Andre Przywara
  2020-09-27  3:03   ` Leo Yan
  2020-09-22 10:12 ` [PATCH 3/5] perf: arm_spe: Add nested virt event decoding Andre Przywara
                   ` (2 subsequent siblings)
  4 siblings, 1 reply; 22+ messages in thread
From: Andre Przywara @ 2020-09-22 10:12 UTC (permalink / raw)
  To: Will Deacon, Catalin Marinas, Peter Zijlstra, Ingo Molnar,
	Arnaldo Carvalho de Melo
  Cc: Mark Rutland, Alexander Shishkin, Jiri Olsa, Namhyung Kim,
	Suzuki K Poulose, Leo Yan, Tan Xiaojun, James Clark,
	linux-arm-kernel, linux-kernel

The ARMv8.3-SPE extension adds some new bits to the event packet
fields.

Handle bits 11 (alignment), 17 and 18 (SVE predication) when decoding
the SPE buffer content.

Signed-off-by: Andre Przywara <andre.przywara@arm.com>
---
 .../util/arm-spe-decoder/arm-spe-pkt-decoder.c  | 17 +++++++++++++++++
 1 file changed, 17 insertions(+)

diff --git a/tools/perf/util/arm-spe-decoder/arm-spe-pkt-decoder.c b/tools/perf/util/arm-spe-decoder/arm-spe-pkt-decoder.c
index b94001b756c7..e633bb5b8e65 100644
--- a/tools/perf/util/arm-spe-decoder/arm-spe-pkt-decoder.c
+++ b/tools/perf/util/arm-spe-decoder/arm-spe-pkt-decoder.c
@@ -346,6 +346,23 @@ int arm_spe_pkt_desc(const struct arm_spe_pkt *packet, char *buf,
 				buf += ret;
 				blen -= ret;
 			}
+			if (payload & BIT(11)) {
+				ret = snprintf(buf, buf_len, " ALIGNMENT");
+				buf += ret;
+				blen -= ret;
+			}
+		}
+		if (idx > 2) {
+			if (payload & BIT(17)) {
+				ret = snprintf(buf, buf_len, " SVE-PARTIAL-PRED");
+				buf += ret;
+				blen -= ret;
+			}
+			if (payload & BIT(18)) {
+				ret = snprintf(buf, buf_len, " SVE-EMPTY-PRED");
+				buf += ret;
+				blen -= ret;
+			}
 		}
 		if (ret < 0)
 			return ret;
-- 
2.17.1


^ permalink raw reply related	[flat|nested] 22+ messages in thread

* [PATCH 3/5] perf: arm_spe: Add nested virt event decoding
  2020-09-22 10:12 [PATCH 0/5] perf: arm64: Support ARMv8.3-SPE extensions Andre Przywara
  2020-09-22 10:12 ` [PATCH 1/5] arm64: spe: Allow new bits in SPE filter register Andre Przywara
  2020-09-22 10:12 ` [PATCH 2/5] perf: arm_spe: Add new event packet bits Andre Przywara
@ 2020-09-22 10:12 ` Andre Przywara
  2020-09-27  3:11   ` Leo Yan
  2020-09-22 10:12 ` [PATCH 4/5] perf: arm_spe: Decode memory tagging properties Andre Przywara
  2020-09-22 10:12 ` [PATCH 5/5] perf: arm_spe: Decode SVE events Andre Przywara
  4 siblings, 1 reply; 22+ messages in thread
From: Andre Przywara @ 2020-09-22 10:12 UTC (permalink / raw)
  To: Will Deacon, Catalin Marinas, Peter Zijlstra, Ingo Molnar,
	Arnaldo Carvalho de Melo
  Cc: Mark Rutland, Alexander Shishkin, Jiri Olsa, Namhyung Kim,
	Suzuki K Poulose, Leo Yan, Tan Xiaojun, James Clark,
	linux-arm-kernel, linux-kernel

The ARMv8.4 nested virtualisation extension can redirect system register
accesses to a memory page controlled by the hypervisor. The SPE
profiling feature in newer implementations can tag those memory accesses
accordingly.

Add the bit pattern describing this load/store type, so that the perf
tool can decode it properly.

Signed-off-by: Andre Przywara <andre.przywara@arm.com>
---
 tools/perf/util/arm-spe-decoder/arm-spe-pkt-decoder.c | 4 ++++
 1 file changed, 4 insertions(+)

diff --git a/tools/perf/util/arm-spe-decoder/arm-spe-pkt-decoder.c b/tools/perf/util/arm-spe-decoder/arm-spe-pkt-decoder.c
index e633bb5b8e65..943e4155b246 100644
--- a/tools/perf/util/arm-spe-decoder/arm-spe-pkt-decoder.c
+++ b/tools/perf/util/arm-spe-decoder/arm-spe-pkt-decoder.c
@@ -398,6 +398,10 @@ int arm_spe_pkt_desc(const struct arm_spe_pkt *packet, char *buf,
 					buf += ret;
 					blen -= ret;
 				}
+			} else if ((payload & 0xfe) == 0x30) {
+				ret = snprintf(buf, buf_len, " NV-SYSREG");
+				buf += ret;
+				blen -= ret;
 			} else if (payload & 0x4) {
 				ret = snprintf(buf, buf_len, " SIMD-FP");
 				buf += ret;
-- 
2.17.1


^ permalink raw reply related	[flat|nested] 22+ messages in thread

* [PATCH 4/5] perf: arm_spe: Decode memory tagging properties
  2020-09-22 10:12 [PATCH 0/5] perf: arm64: Support ARMv8.3-SPE extensions Andre Przywara
                   ` (2 preceding siblings ...)
  2020-09-22 10:12 ` [PATCH 3/5] perf: arm_spe: Add nested virt event decoding Andre Przywara
@ 2020-09-22 10:12 ` Andre Przywara
  2020-09-27  3:19   ` Leo Yan
  2020-09-22 10:12 ` [PATCH 5/5] perf: arm_spe: Decode SVE events Andre Przywara
  4 siblings, 1 reply; 22+ messages in thread
From: Andre Przywara @ 2020-09-22 10:12 UTC (permalink / raw)
  To: Will Deacon, Catalin Marinas, Peter Zijlstra, Ingo Molnar,
	Arnaldo Carvalho de Melo
  Cc: Mark Rutland, Alexander Shishkin, Jiri Olsa, Namhyung Kim,
	Suzuki K Poulose, Leo Yan, Tan Xiaojun, James Clark,
	linux-arm-kernel, linux-kernel

When SPE records a physical address, it can additionally tag the event
with information from the Memory Tagging architecture extension.

Decode the two additional fields in the SPE event payload.

Signed-off-by: Andre Przywara <andre.przywara@arm.com>
---
 .../util/arm-spe-decoder/arm-spe-pkt-decoder.c  | 17 ++++++++++++-----
 1 file changed, 12 insertions(+), 5 deletions(-)

diff --git a/tools/perf/util/arm-spe-decoder/arm-spe-pkt-decoder.c b/tools/perf/util/arm-spe-decoder/arm-spe-pkt-decoder.c
index 943e4155b246..a033f34846a6 100644
--- a/tools/perf/util/arm-spe-decoder/arm-spe-pkt-decoder.c
+++ b/tools/perf/util/arm-spe-decoder/arm-spe-pkt-decoder.c
@@ -8,13 +8,14 @@
 #include <string.h>
 #include <endian.h>
 #include <byteswap.h>
+#include <linux/bits.h>
 
 #include "arm-spe-pkt-decoder.h"
 
-#define BIT(n)		(1ULL << (n))
-
 #define NS_FLAG		BIT(63)
 #define EL_FLAG		(BIT(62) | BIT(61))
+#define CH_FLAG		BIT(62)
+#define PAT_FLAG	GENMASK_ULL(59, 56)
 
 #define SPE_HEADER0_PAD			0x0
 #define SPE_HEADER0_END			0x1
@@ -447,10 +448,16 @@ int arm_spe_pkt_desc(const struct arm_spe_pkt *packet, char *buf,
 			return snprintf(buf, buf_len, "%s 0x%llx el%d ns=%d",
 				        (idx == 1) ? "TGT" : "PC", payload, el, ns);
 		case 2:	return snprintf(buf, buf_len, "VA 0x%llx", payload);
-		case 3:	ns = !!(packet->payload & NS_FLAG);
+		case 3:	{
+			int ch = !!(packet->payload & CH_FLAG);
+			int pat = (packet->payload & PAT_FLAG) >> 56;
+
+			ns = !!(packet->payload & NS_FLAG);
 			payload &= ~(0xffULL << 56);
-			return snprintf(buf, buf_len, "PA 0x%llx ns=%d",
-					payload, ns);
+			return snprintf(buf, buf_len,
+					"PA 0x%llx ns=%d ch=%d, pat=%x",
+					payload, ns, ch, pat);
+			}
 		default: return 0;
 		}
 	case ARM_SPE_CONTEXT:
-- 
2.17.1


^ permalink raw reply related	[flat|nested] 22+ messages in thread

* [PATCH 5/5] perf: arm_spe: Decode SVE events
  2020-09-22 10:12 [PATCH 0/5] perf: arm64: Support ARMv8.3-SPE extensions Andre Przywara
                   ` (3 preceding siblings ...)
  2020-09-22 10:12 ` [PATCH 4/5] perf: arm_spe: Decode memory tagging properties Andre Przywara
@ 2020-09-22 10:12 ` Andre Przywara
  2020-09-27  3:30   ` Leo Yan
  2020-09-28 13:21   ` Dave Martin
  4 siblings, 2 replies; 22+ messages in thread
From: Andre Przywara @ 2020-09-22 10:12 UTC (permalink / raw)
  To: Will Deacon, Catalin Marinas, Peter Zijlstra, Ingo Molnar,
	Arnaldo Carvalho de Melo
  Cc: Mark Rutland, Alexander Shishkin, Jiri Olsa, Namhyung Kim,
	Suzuki K Poulose, Leo Yan, Tan Xiaojun, James Clark,
	linux-arm-kernel, linux-kernel

The Scalable Vector Extension (SVE) is an ARMv8 architecture extension
that introduces very long vector operations (up to 2048 bits).
The SPE profiling feature can tag SVE instructions with additional
properties like predication or the effective vector length.

Decode the new operation type bits in the SPE decoder to allow the perf
tool to correctly report about SVE instructions.

Signed-off-by: Andre Przywara <andre.przywara@arm.com>
---
 .../arm-spe-decoder/arm-spe-pkt-decoder.c     | 48 ++++++++++++++++++-
 1 file changed, 47 insertions(+), 1 deletion(-)

diff --git a/tools/perf/util/arm-spe-decoder/arm-spe-pkt-decoder.c b/tools/perf/util/arm-spe-decoder/arm-spe-pkt-decoder.c
index a033f34846a6..f0c369259554 100644
--- a/tools/perf/util/arm-spe-decoder/arm-spe-pkt-decoder.c
+++ b/tools/perf/util/arm-spe-decoder/arm-spe-pkt-decoder.c
@@ -372,8 +372,35 @@ int arm_spe_pkt_desc(const struct arm_spe_pkt *packet, char *buf,
 	}
 	case ARM_SPE_OP_TYPE:
 		switch (idx) {
-		case 0:	return snprintf(buf, buf_len, "%s", payload & 0x1 ?
+		case 0: {
+			size_t blen = buf_len;
+
+			if ((payload & 0x89) == 0x08) {
+				ret = snprintf(buf, buf_len, "SVE");
+				buf += ret;
+				blen -= ret;
+				if (payload & 0x2)
+					ret = snprintf(buf, buf_len, " FP");
+				else
+					ret = snprintf(buf, buf_len, " INT");
+				buf += ret;
+				blen -= ret;
+				if (payload & 0x4) {
+					ret = snprintf(buf, buf_len, " PRED");
+					buf += ret;
+					blen -= ret;
+				}
+				/* Bits [7..4] encode the vector length */
+				ret = snprintf(buf, buf_len, " EVLEN%d",
+					       32 << ((payload >> 4) & 0x7));
+				buf += ret;
+				blen -= ret;
+				return buf_len - blen;
+			}
+
+			return snprintf(buf, buf_len, "%s", payload & 0x1 ?
 					"COND-SELECT" : "INSN-OTHER");
+			}
 		case 1:	{
 			size_t blen = buf_len;
 
@@ -403,6 +430,25 @@ int arm_spe_pkt_desc(const struct arm_spe_pkt *packet, char *buf,
 				ret = snprintf(buf, buf_len, " NV-SYSREG");
 				buf += ret;
 				blen -= ret;
+			} else if ((payload & 0x0a) == 0x08) {
+				ret = snprintf(buf, buf_len, " SVE");
+				buf += ret;
+				blen -= ret;
+				if (payload & 0x4) {
+					ret = snprintf(buf, buf_len, " PRED");
+					buf += ret;
+					blen -= ret;
+				}
+				if (payload & 0x80) {
+					ret = snprintf(buf, buf_len, " SG");
+					buf += ret;
+					blen -= ret;
+				}
+				/* Bits [7..4] encode the vector length */
+				ret = snprintf(buf, buf_len, " EVLEN%d",
+					       32 << ((payload >> 4) & 0x7));
+				buf += ret;
+				blen -= ret;
 			} else if (payload & 0x4) {
 				ret = snprintf(buf, buf_len, " SIMD-FP");
 				buf += ret;
-- 
2.17.1


^ permalink raw reply related	[flat|nested] 22+ messages in thread

* Re: [PATCH 1/5] arm64: spe: Allow new bits in SPE filter register
  2020-09-22 10:12 ` [PATCH 1/5] arm64: spe: Allow new bits in SPE filter register Andre Przywara
@ 2020-09-27  2:51   ` Leo Yan
  0 siblings, 0 replies; 22+ messages in thread
From: Leo Yan @ 2020-09-27  2:51 UTC (permalink / raw)
  To: Andre Przywara
  Cc: Will Deacon, Catalin Marinas, Peter Zijlstra, Ingo Molnar,
	Arnaldo Carvalho de Melo, Mark Rutland, Alexander Shishkin,
	Jiri Olsa, Namhyung Kim, Suzuki K Poulose, Tan Xiaojun,
	James Clark, linux-arm-kernel, linux-kernel

Hi Andre,

On Tue, Sep 22, 2020 at 11:12:21AM +0100, Andre Przywara wrote:
> The ARMv8.3-SPE extension adds some new bits for the event filter.
> 
> Remove bits 11, 17 and 18 from the RES0 mask, so they can be used
> correctly.
> 
> Signed-off-by: Andre Przywara <andre.przywara@arm.com>
> ---
>  arch/arm64/include/asm/sysreg.h | 2 +-
>  1 file changed, 1 insertion(+), 1 deletion(-)
> 
> diff --git a/arch/arm64/include/asm/sysreg.h b/arch/arm64/include/asm/sysreg.h
> index 554a7e8ecb07..efca4ee28671 100644
> --- a/arch/arm64/include/asm/sysreg.h
> +++ b/arch/arm64/include/asm/sysreg.h
> @@ -281,7 +281,7 @@
>  #define SYS_PMSFCR_EL1_ST_SHIFT		18
>  
>  #define SYS_PMSEVFR_EL1			sys_reg(3, 0, 9, 9, 5)
> -#define SYS_PMSEVFR_EL1_RES0		0x0000ffff00ff0f55UL
> +#define SYS_PMSEVFR_EL1_RES0		0x0000ffff00f90755UL

This patch is duplicate with Wei Li's patch [1].  You could see there
have some discussion and Will gave suggestions [2] for the patch, this
would be a good start point to continue this work.

Thanks,
Leo

[1] https://www.spinics.net/lists/arm-kernel/msg825364.html
[2] https://www.spinics.net/lists/arm-kernel/msg835733.html

>  
>  #define SYS_PMSLATFR_EL1		sys_reg(3, 0, 9, 9, 6)
>  #define SYS_PMSLATFR_EL1_MINLAT_SHIFT	0
> -- 
> 2.17.1
> 

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [PATCH 2/5] perf: arm_spe: Add new event packet bits
  2020-09-22 10:12 ` [PATCH 2/5] perf: arm_spe: Add new event packet bits Andre Przywara
@ 2020-09-27  3:03   ` Leo Yan
  0 siblings, 0 replies; 22+ messages in thread
From: Leo Yan @ 2020-09-27  3:03 UTC (permalink / raw)
  To: Andre Przywara
  Cc: Will Deacon, Catalin Marinas, Peter Zijlstra, Ingo Molnar,
	Arnaldo Carvalho de Melo, Mark Rutland, Alexander Shishkin,
	Jiri Olsa, Namhyung Kim, Suzuki K Poulose, Tan Xiaojun,
	James Clark, linux-arm-kernel, linux-kernel

On Tue, Sep 22, 2020 at 11:12:22AM +0100, Andre Przywara wrote:
> The ARMv8.3-SPE extension adds some new bits to the event packet
> fields.
> 
> Handle bits 11 (alignment), 17 and 18 (SVE predication) when decoding
> the SPE buffer content.
> 
> Signed-off-by: Andre Przywara <andre.przywara@arm.com>
> ---
>  .../util/arm-spe-decoder/arm-spe-pkt-decoder.c  | 17 +++++++++++++++++
>  1 file changed, 17 insertions(+)
> 
> diff --git a/tools/perf/util/arm-spe-decoder/arm-spe-pkt-decoder.c b/tools/perf/util/arm-spe-decoder/arm-spe-pkt-decoder.c
> index b94001b756c7..e633bb5b8e65 100644
> --- a/tools/perf/util/arm-spe-decoder/arm-spe-pkt-decoder.c
> +++ b/tools/perf/util/arm-spe-decoder/arm-spe-pkt-decoder.c
> @@ -346,6 +346,23 @@ int arm_spe_pkt_desc(const struct arm_spe_pkt *packet, char *buf,
>  				buf += ret;
>  				blen -= ret;
>  			}
> +			if (payload & BIT(11)) {
> +				ret = snprintf(buf, buf_len, " ALIGNMENT");
> +				buf += ret;
> +				blen -= ret;
> +			}
> +		}
> +		if (idx > 2) {
> +			if (payload & BIT(17)) {
> +				ret = snprintf(buf, buf_len, " SVE-PARTIAL-PRED");
> +				buf += ret;
> +				blen -= ret;
> +			}
> +			if (payload & BIT(18)) {
> +				ret = snprintf(buf, buf_len, " SVE-EMPTY-PRED");
> +				buf += ret;
> +				blen -= ret;
> +			}

From patch 02 to patch 05, some changes have been included in the
patch set "perf arm-spe: Refactor decoding & dumping flow".  I
refactored the Arm SPE decoder so uses macros to replace the hard code
numbers for packet formats.  So I'd like your changes could rebase on
this refactor patch set, thus can reuse the predefined macros for
decoding.

For this patch, it has been included in the patch [2].  You could see
your implementation is difference for handling "ALIGNMENT", it misses
to check "idx > 2".  It would be very helpful if you could review
patch [2].

Thanks,
Leo

[1] https://lore.kernel.org/patchwork/cover/1288406/
[2] https://lore.kernel.org/patchwork/patch/1288413/

>  		}
>  		if (ret < 0)
>  			return ret;
> -- 
> 2.17.1
> 

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [PATCH 3/5] perf: arm_spe: Add nested virt event decoding
  2020-09-22 10:12 ` [PATCH 3/5] perf: arm_spe: Add nested virt event decoding Andre Przywara
@ 2020-09-27  3:11   ` Leo Yan
  0 siblings, 0 replies; 22+ messages in thread
From: Leo Yan @ 2020-09-27  3:11 UTC (permalink / raw)
  To: Andre Przywara
  Cc: Will Deacon, Catalin Marinas, Peter Zijlstra, Ingo Molnar,
	Arnaldo Carvalho de Melo, Mark Rutland, Alexander Shishkin,
	Jiri Olsa, Namhyung Kim, Suzuki K Poulose, Tan Xiaojun,
	James Clark, linux-arm-kernel, linux-kernel

On Tue, Sep 22, 2020 at 11:12:23AM +0100, Andre Przywara wrote:
> The ARMv8.4 nested virtualisation extension can redirect system register
> accesses to a memory page controlled by the hypervisor. The SPE
> profiling feature in newer implementations can tag those memory accesses
> accordingly.
> 
> Add the bit pattern describing this load/store type, so that the perf
> tool can decode it properly.
> 
> Signed-off-by: Andre Przywara <andre.przywara@arm.com>
> ---
>  tools/perf/util/arm-spe-decoder/arm-spe-pkt-decoder.c | 4 ++++
>  1 file changed, 4 insertions(+)
> 
> diff --git a/tools/perf/util/arm-spe-decoder/arm-spe-pkt-decoder.c b/tools/perf/util/arm-spe-decoder/arm-spe-pkt-decoder.c
> index e633bb5b8e65..943e4155b246 100644
> --- a/tools/perf/util/arm-spe-decoder/arm-spe-pkt-decoder.c
> +++ b/tools/perf/util/arm-spe-decoder/arm-spe-pkt-decoder.c
> @@ -398,6 +398,10 @@ int arm_spe_pkt_desc(const struct arm_spe_pkt *packet, char *buf,
>  					buf += ret;
>  					blen -= ret;
>  				}
> +			} else if ((payload & 0xfe) == 0x30) {
> +				ret = snprintf(buf, buf_len, " NV-SYSREG");
> +				buf += ret;
> +				blen -= ret;

This change has been included in the patch "perf arm-spe: Add more sub
classes for operation packet" [1].

Thanks,
Leo

[1] https://lore.kernel.org/patchwork/patch/1288412/

>  			} else if (payload & 0x4) {
>  				ret = snprintf(buf, buf_len, " SIMD-FP");
>  				buf += ret;
> -- 
> 2.17.1
> 

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [PATCH 4/5] perf: arm_spe: Decode memory tagging properties
  2020-09-22 10:12 ` [PATCH 4/5] perf: arm_spe: Decode memory tagging properties Andre Przywara
@ 2020-09-27  3:19   ` Leo Yan
       [not found]     ` <20201013145103.GE1063281@kernel.org>
  0 siblings, 1 reply; 22+ messages in thread
From: Leo Yan @ 2020-09-27  3:19 UTC (permalink / raw)
  To: Andre Przywara
  Cc: Will Deacon, Catalin Marinas, Peter Zijlstra, Ingo Molnar,
	Arnaldo Carvalho de Melo, Mark Rutland, Alexander Shishkin,
	Jiri Olsa, Namhyung Kim, Suzuki K Poulose, Tan Xiaojun,
	James Clark, linux-arm-kernel, linux-kernel

On Tue, Sep 22, 2020 at 11:12:24AM +0100, Andre Przywara wrote:
> When SPE records a physical address, it can additionally tag the event
> with information from the Memory Tagging architecture extension.
> 
> Decode the two additional fields in the SPE event payload.
> 
> Signed-off-by: Andre Przywara <andre.przywara@arm.com>
> ---
>  .../util/arm-spe-decoder/arm-spe-pkt-decoder.c  | 17 ++++++++++++-----
>  1 file changed, 12 insertions(+), 5 deletions(-)
> 
> diff --git a/tools/perf/util/arm-spe-decoder/arm-spe-pkt-decoder.c b/tools/perf/util/arm-spe-decoder/arm-spe-pkt-decoder.c
> index 943e4155b246..a033f34846a6 100644
> --- a/tools/perf/util/arm-spe-decoder/arm-spe-pkt-decoder.c
> +++ b/tools/perf/util/arm-spe-decoder/arm-spe-pkt-decoder.c
> @@ -8,13 +8,14 @@
>  #include <string.h>
>  #include <endian.h>
>  #include <byteswap.h>
> +#include <linux/bits.h>
>  
>  #include "arm-spe-pkt-decoder.h"
>  
> -#define BIT(n)		(1ULL << (n))
> -
>  #define NS_FLAG		BIT(63)
>  #define EL_FLAG		(BIT(62) | BIT(61))
> +#define CH_FLAG		BIT(62)
> +#define PAT_FLAG	GENMASK_ULL(59, 56)
>  
>  #define SPE_HEADER0_PAD			0x0
>  #define SPE_HEADER0_END			0x1
> @@ -447,10 +448,16 @@ int arm_spe_pkt_desc(const struct arm_spe_pkt *packet, char *buf,
>  			return snprintf(buf, buf_len, "%s 0x%llx el%d ns=%d",
>  				        (idx == 1) ? "TGT" : "PC", payload, el, ns);
>  		case 2:	return snprintf(buf, buf_len, "VA 0x%llx", payload);
> -		case 3:	ns = !!(packet->payload & NS_FLAG);
> +		case 3:	{
> +			int ch = !!(packet->payload & CH_FLAG);
> +			int pat = (packet->payload & PAT_FLAG) >> 56;
> +
> +			ns = !!(packet->payload & NS_FLAG);
>  			payload &= ~(0xffULL << 56);
> -			return snprintf(buf, buf_len, "PA 0x%llx ns=%d",
> -					payload, ns);
> +			return snprintf(buf, buf_len,
> +					"PA 0x%llx ns=%d ch=%d, pat=%x",
> +					payload, ns, ch, pat);
> +			}

Reviewed-by: Leo Yan <leo.yan@linaro.org>

>  		default: return 0;
>  		}
>  	case ARM_SPE_CONTEXT:
> -- 
> 2.17.1
> 

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [PATCH 5/5] perf: arm_spe: Decode SVE events
  2020-09-22 10:12 ` [PATCH 5/5] perf: arm_spe: Decode SVE events Andre Przywara
@ 2020-09-27  3:30   ` Leo Yan
  2020-09-28 10:15     ` André Przywara
  2020-09-28 13:21   ` Dave Martin
  1 sibling, 1 reply; 22+ messages in thread
From: Leo Yan @ 2020-09-27  3:30 UTC (permalink / raw)
  To: Andre Przywara
  Cc: Will Deacon, Catalin Marinas, Peter Zijlstra, Ingo Molnar,
	Arnaldo Carvalho de Melo, Mark Rutland, Alexander Shishkin,
	Jiri Olsa, Namhyung Kim, Suzuki K Poulose, Tan Xiaojun,
	James Clark, linux-arm-kernel, linux-kernel

Hi Andre,

On Tue, Sep 22, 2020 at 11:12:25AM +0100, Andre Przywara wrote:
> The Scalable Vector Extension (SVE) is an ARMv8 architecture extension
> that introduces very long vector operations (up to 2048 bits).
> The SPE profiling feature can tag SVE instructions with additional
> properties like predication or the effective vector length.
> 
> Decode the new operation type bits in the SPE decoder to allow the perf
> tool to correctly report about SVE instructions.
> 
> Signed-off-by: Andre Przywara <andre.przywara@arm.com>
> ---
>  .../arm-spe-decoder/arm-spe-pkt-decoder.c     | 48 ++++++++++++++++++-
>  1 file changed, 47 insertions(+), 1 deletion(-)
> 
> diff --git a/tools/perf/util/arm-spe-decoder/arm-spe-pkt-decoder.c b/tools/perf/util/arm-spe-decoder/arm-spe-pkt-decoder.c
> index a033f34846a6..f0c369259554 100644
> --- a/tools/perf/util/arm-spe-decoder/arm-spe-pkt-decoder.c
> +++ b/tools/perf/util/arm-spe-decoder/arm-spe-pkt-decoder.c
> @@ -372,8 +372,35 @@ int arm_spe_pkt_desc(const struct arm_spe_pkt *packet, char *buf,
>  	}
>  	case ARM_SPE_OP_TYPE:
>  		switch (idx) {
> -		case 0:	return snprintf(buf, buf_len, "%s", payload & 0x1 ?
> +		case 0: {
> +			size_t blen = buf_len;
> +
> +			if ((payload & 0x89) == 0x08) {
> +				ret = snprintf(buf, buf_len, "SVE");
> +				buf += ret;
> +				blen -= ret;
> +				if (payload & 0x2)
> +					ret = snprintf(buf, buf_len, " FP");
> +				else
> +					ret = snprintf(buf, buf_len, " INT");
> +				buf += ret;
> +				blen -= ret;
> +				if (payload & 0x4) {
> +					ret = snprintf(buf, buf_len, " PRED");
> +					buf += ret;
> +					blen -= ret;
> +				}
> +				/* Bits [7..4] encode the vector length */
> +				ret = snprintf(buf, buf_len, " EVLEN%d",
> +					       32 << ((payload >> 4) & 0x7));
> +				buf += ret;
> +				blen -= ret;
> +				return buf_len - blen;
> +			}
> +
> +			return snprintf(buf, buf_len, "%s", payload & 0x1 ?
>  					"COND-SELECT" : "INSN-OTHER");
> +			}
>  		case 1:	{
>  			size_t blen = buf_len;
>  
> @@ -403,6 +430,25 @@ int arm_spe_pkt_desc(const struct arm_spe_pkt *packet, char *buf,
>  				ret = snprintf(buf, buf_len, " NV-SYSREG");
>  				buf += ret;
>  				blen -= ret;
> +			} else if ((payload & 0x0a) == 0x08) {
> +				ret = snprintf(buf, buf_len, " SVE");
> +				buf += ret;
> +				blen -= ret;
> +				if (payload & 0x4) {
> +					ret = snprintf(buf, buf_len, " PRED");
> +					buf += ret;
> +					blen -= ret;
> +				}
> +				if (payload & 0x80) {
> +					ret = snprintf(buf, buf_len, " SG");
> +					buf += ret;
> +					blen -= ret;
> +				}
> +				/* Bits [7..4] encode the vector length */
> +				ret = snprintf(buf, buf_len, " EVLEN%d",
> +					       32 << ((payload >> 4) & 0x7));
> +				buf += ret;
> +				blen -= ret;

The changes in this patch has been included in the patch [1].

So my summary for patches 02 ~ 05, except patch 04, other changes has
been included in the patch set "perf arm-spe: Refactor decoding &
dumping flow".

I'd like to add your patch 04 into the patch set "perf arm-spe:
Refactor decoding & dumping flow" and I will respin the patch set v2 on
the latest perf/core branch and send out to review.

For patch 01, you could continue to try to land it in the kernel.
(Maybe consolidate a bit with Wei?).

Do you think this is okay for you?

Thanks,
Leo

[1] https://lore.kernel.org/patchwork/patch/1288413/

>  			} else if (payload & 0x4) {
>  				ret = snprintf(buf, buf_len, " SIMD-FP");
>  				buf += ret;
> -- 
> 2.17.1
> 

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [PATCH 5/5] perf: arm_spe: Decode SVE events
  2020-09-27  3:30   ` Leo Yan
@ 2020-09-28 10:15     ` André Przywara
  2020-09-28 11:08       ` Leo Yan
  0 siblings, 1 reply; 22+ messages in thread
From: André Przywara @ 2020-09-28 10:15 UTC (permalink / raw)
  To: Leo Yan
  Cc: Will Deacon, Catalin Marinas, Peter Zijlstra, Ingo Molnar,
	Arnaldo Carvalho de Melo, Mark Rutland, Alexander Shishkin,
	Jiri Olsa, Namhyung Kim, Suzuki K Poulose, Tan Xiaojun,
	James Clark, linux-arm-kernel, linux-kernel

On 27/09/2020 04:30, Leo Yan wrote:

Hi Leo,

> On Tue, Sep 22, 2020 at 11:12:25AM +0100, Andre Przywara wrote:
>> The Scalable Vector Extension (SVE) is an ARMv8 architecture extension
>> that introduces very long vector operations (up to 2048 bits).
>> The SPE profiling feature can tag SVE instructions with additional
>> properties like predication or the effective vector length.
>>
>> Decode the new operation type bits in the SPE decoder to allow the perf
>> tool to correctly report about SVE instructions.
>>
>> Signed-off-by: Andre Przywara <andre.przywara@arm.com>
>> ---
>>  .../arm-spe-decoder/arm-spe-pkt-decoder.c     | 48 ++++++++++++++++++-
>>  1 file changed, 47 insertions(+), 1 deletion(-)
>>
>> diff --git a/tools/perf/util/arm-spe-decoder/arm-spe-pkt-decoder.c b/tools/perf/util/arm-spe-decoder/arm-spe-pkt-decoder.c
>> index a033f34846a6..f0c369259554 100644
>> --- a/tools/perf/util/arm-spe-decoder/arm-spe-pkt-decoder.c
>> +++ b/tools/perf/util/arm-spe-decoder/arm-spe-pkt-decoder.c
>> @@ -372,8 +372,35 @@ int arm_spe_pkt_desc(const struct arm_spe_pkt *packet, char *buf,
>>  	}
>>  	case ARM_SPE_OP_TYPE:
>>  		switch (idx) {
>> -		case 0:	return snprintf(buf, buf_len, "%s", payload & 0x1 ?
>> +		case 0: {
>> +			size_t blen = buf_len;
>> +
>> +			if ((payload & 0x89) == 0x08) {
>> +				ret = snprintf(buf, buf_len, "SVE");
>> +				buf += ret;
>> +				blen -= ret;
>> +				if (payload & 0x2)
>> +					ret = snprintf(buf, buf_len, " FP");
>> +				else
>> +					ret = snprintf(buf, buf_len, " INT");
>> +				buf += ret;
>> +				blen -= ret;
>> +				if (payload & 0x4) {
>> +					ret = snprintf(buf, buf_len, " PRED");
>> +					buf += ret;
>> +					blen -= ret;
>> +				}
>> +				/* Bits [7..4] encode the vector length */
>> +				ret = snprintf(buf, buf_len, " EVLEN%d",
>> +					       32 << ((payload >> 4) & 0x7));
>> +				buf += ret;
>> +				blen -= ret;
>> +				return buf_len - blen;
>> +			}
>> +
>> +			return snprintf(buf, buf_len, "%s", payload & 0x1 ?
>>  					"COND-SELECT" : "INSN-OTHER");
>> +			}
>>  		case 1:	{
>>  			size_t blen = buf_len;
>>  
>> @@ -403,6 +430,25 @@ int arm_spe_pkt_desc(const struct arm_spe_pkt *packet, char *buf,
>>  				ret = snprintf(buf, buf_len, " NV-SYSREG");
>>  				buf += ret;
>>  				blen -= ret;
>> +			} else if ((payload & 0x0a) == 0x08) {
>> +				ret = snprintf(buf, buf_len, " SVE");
>> +				buf += ret;
>> +				blen -= ret;
>> +				if (payload & 0x4) {
>> +					ret = snprintf(buf, buf_len, " PRED");
>> +					buf += ret;
>> +					blen -= ret;
>> +				}
>> +				if (payload & 0x80) {
>> +					ret = snprintf(buf, buf_len, " SG");
>> +					buf += ret;
>> +					blen -= ret;
>> +				}
>> +				/* Bits [7..4] encode the vector length */
>> +				ret = snprintf(buf, buf_len, " EVLEN%d",
>> +					       32 << ((payload >> 4) & 0x7));
>> +				buf += ret;
>> +				blen -= ret;
> 
> The changes in this patch has been included in the patch [1].
> 
> So my summary for patches 02 ~ 05, except patch 04, other changes has
> been included in the patch set "perf arm-spe: Refactor decoding &
> dumping flow".

Ah, my sincere apologies, I totally missed Wei's and your series on this
(although I did some research on "prior art").

> I'd like to add your patch 04 into the patch set "perf arm-spe:
> Refactor decoding & dumping flow" and I will respin the patch set v2 on
> the latest perf/core branch and send out to review.
> 
> For patch 01, you could continue to try to land it in the kernel.
> (Maybe consolidate a bit with Wei?).
> 
> Do you think this is okay for you?

Yes, sounds like a plan. So Wei's original series is now fully
integrated into your 13-patch rework, right?

Is "[RESEND,v1,xx/13] ..." the latest revision of your series?
Do you plan on sending a v2 anytime soon? Or shall I do review on the
existing one?

Cheers,
Andre

> 
> [1] https://lore.kernel.org/patchwork/patch/1288413/
> 
>>  			} else if (payload & 0x4) {
>>  				ret = snprintf(buf, buf_len, " SIMD-FP");
>>  				buf += ret;
>> -- 
>> 2.17.1
>>


^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [PATCH 5/5] perf: arm_spe: Decode SVE events
  2020-09-28 10:15     ` André Przywara
@ 2020-09-28 11:08       ` Leo Yan
  0 siblings, 0 replies; 22+ messages in thread
From: Leo Yan @ 2020-09-28 11:08 UTC (permalink / raw)
  To: André Przywara
  Cc: Will Deacon, Catalin Marinas, Peter Zijlstra, Ingo Molnar,
	Arnaldo Carvalho de Melo, Mark Rutland, Alexander Shishkin,
	Jiri Olsa, Namhyung Kim, Suzuki K Poulose, Tan Xiaojun,
	James Clark, linux-arm-kernel, linux-kernel, Wei Li

Hi Andre,

On Mon, Sep 28, 2020 at 11:15:53AM +0100, André Przywara wrote:

[...]

> > So my summary for patches 02 ~ 05, except patch 04, other changes has
> > been included in the patch set "perf arm-spe: Refactor decoding &
> > dumping flow".
> 
> Ah, my sincere apologies, I totally missed Wei's and your series on this
> (although I did some research on "prior art").

No worries!

> > I'd like to add your patch 04 into the patch set "perf arm-spe:
> > Refactor decoding & dumping flow" and I will respin the patch set v2 on
> > the latest perf/core branch and send out to review.
> > 
> > For patch 01, you could continue to try to land it in the kernel.
> > (Maybe consolidate a bit with Wei?).
> > 
> > Do you think this is okay for you?
> 
> Yes, sounds like a plan. So Wei's original series is now fully
> integrated into your 13-patch rework, right?

Thanks for confirmation.

You could see Wei's patch set has 4 patches [1].  I only picked the
patch 02 [2] from Wei's patch set into my refactoring patch set; the
patch 01 is for enabling driver for SVE events, the patches 03/04
introduced new synthesized events.

Patches 03 / 04 should be considered carefully and it's good to prove
these synthesized events will be useful for user cases before upstream
them.  The reason is AFAIK a good direction to generate SPE trace data
for memory events [3], and for SVE, I think we should firstly consider
if can reuse the memory event for profiling rather than adding new
synthesized events.

So I prefer to give priority for patches 01 / 02.

> Is "[RESEND,v1,xx/13] ..." the latest revision of your series?

Yes.

> Do you plan on sending a v2 anytime soon? Or shall I do review on the
> existing one?

For saving time, let me respin patch set v2 and send to LKML (hope in
next 1~2 days).  Then you could review patch set v2.

Thanks,
Leo

[1] https://lore.kernel.org/patchwork/cover/1278778/
[2] https://lore.kernel.org/patchwork/patch/1278780/
[3] https://lore.kernel.org/patchwork/cover/1298085/

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [PATCH 5/5] perf: arm_spe: Decode SVE events
  2020-09-22 10:12 ` [PATCH 5/5] perf: arm_spe: Decode SVE events Andre Przywara
  2020-09-27  3:30   ` Leo Yan
@ 2020-09-28 13:21   ` Dave Martin
  2020-09-28 13:59     ` André Przywara
  1 sibling, 1 reply; 22+ messages in thread
From: Dave Martin @ 2020-09-28 13:21 UTC (permalink / raw)
  To: Andre Przywara
  Cc: Will Deacon, Catalin Marinas, Peter Zijlstra, Ingo Molnar,
	Arnaldo Carvalho de Melo, Mark Rutland, Suzuki K Poulose,
	Alexander Shishkin, linux-kernel, James Clark, Leo Yan,
	Namhyung Kim, Jiri Olsa, Tan Xiaojun, linux-arm-kernel

On Tue, Sep 22, 2020 at 11:12:25AM +0100, Andre Przywara wrote:
> The Scalable Vector Extension (SVE) is an ARMv8 architecture extension
> that introduces very long vector operations (up to 2048 bits).

(8192, in fact, though don't expect to see that on real hardware any
time soon...  qemu and the Arm fast model can do it, though.)

> The SPE profiling feature can tag SVE instructions with additional
> properties like predication or the effective vector length.
> 
> Decode the new operation type bits in the SPE decoder to allow the perf
> tool to correctly report about SVE instructions.


I don't know anything about SPE, so just commenting on a few minor
things that catch my eye here.

> Signed-off-by: Andre Przywara <andre.przywara@arm.com>
> ---
>  .../arm-spe-decoder/arm-spe-pkt-decoder.c     | 48 ++++++++++++++++++-
>  1 file changed, 47 insertions(+), 1 deletion(-)
> 
> diff --git a/tools/perf/util/arm-spe-decoder/arm-spe-pkt-decoder.c b/tools/perf/util/arm-spe-decoder/arm-spe-pkt-decoder.c
> index a033f34846a6..f0c369259554 100644
> --- a/tools/perf/util/arm-spe-decoder/arm-spe-pkt-decoder.c
> +++ b/tools/perf/util/arm-spe-decoder/arm-spe-pkt-decoder.c
> @@ -372,8 +372,35 @@ int arm_spe_pkt_desc(const struct arm_spe_pkt *packet, char *buf,
>  	}
>  	case ARM_SPE_OP_TYPE:
>  		switch (idx) {
> -		case 0:	return snprintf(buf, buf_len, "%s", payload & 0x1 ?
> +		case 0: {
> +			size_t blen = buf_len;
> +
> +			if ((payload & 0x89) == 0x08) {
> +				ret = snprintf(buf, buf_len, "SVE");
> +				buf += ret;
> +				blen -= ret;

(Nit: can ret be < 0 ?  I've never been 100% clear on this myself for
the s*printf() family -- if this assumption is widespread in perf tool
a lready that I guess just go with the flow.)

I wonder if this snprintf+increment+decrement sequence could be wrapped
up as a helper, rather than having to be repeated all over the place.

> +				if (payload & 0x2)
> +					ret = snprintf(buf, buf_len, " FP");
> +				else
> +					ret = snprintf(buf, buf_len, " INT");
> +				buf += ret;
> +				blen -= ret;
> +				if (payload & 0x4) {
> +					ret = snprintf(buf, buf_len, " PRED");
> +					buf += ret;
> +					blen -= ret;
> +				}
> +				/* Bits [7..4] encode the vector length */
> +				ret = snprintf(buf, buf_len, " EVLEN%d",
> +					       32 << ((payload >> 4) & 0x7));

Isn't this just extracting 3 bits (0x7)?  And what unit are we aiming
for here: is it the number of bytes per vector, or something else?  I'm
confused by the fact that this will go up in steps of 32, which doesn't
seem to match up to the architecure.

I notice that bit 7 has to be zero to get into this if() though.

> +				buf += ret;
> +				blen -= ret;
> +				return buf_len - blen;
> +			}
> +
> +			return snprintf(buf, buf_len, "%s", payload & 0x1 ?
>  					"COND-SELECT" : "INSN-OTHER");
> +			}
>  		case 1:	{
>  			size_t blen = buf_len;
>  
> @@ -403,6 +430,25 @@ int arm_spe_pkt_desc(const struct arm_spe_pkt *packet, char *buf,
>  				ret = snprintf(buf, buf_len, " NV-SYSREG");
>  				buf += ret;
>  				blen -= ret;
> +			} else if ((payload & 0x0a) == 0x08) {
> +				ret = snprintf(buf, buf_len, " SVE");
> +				buf += ret;
> +				blen -= ret;
> +				if (payload & 0x4) {
> +					ret = snprintf(buf, buf_len, " PRED");
> +					buf += ret;
> +					blen -= ret;
> +				}
> +				if (payload & 0x80) {
> +					ret = snprintf(buf, buf_len, " SG");
> +					buf += ret;
> +					blen -= ret;
> +				}
> +				/* Bits [7..4] encode the vector length */
> +				ret = snprintf(buf, buf_len, " EVLEN%d",
> +					       32 << ((payload >> 4) & 0x7));

Same comment as above.  Maybe have a common helper for decoding the
vector length bits so it can be fixed in a single place?

> +				buf += ret;
> +				blen -= ret;
>  			} else if (payload & 0x4) {
>  				ret = snprintf(buf, buf_len, " SIMD-FP");
>  				buf += ret;

Cheers
---Dave

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [PATCH 5/5] perf: arm_spe: Decode SVE events
  2020-09-28 13:21   ` Dave Martin
@ 2020-09-28 13:59     ` André Przywara
  2020-09-28 14:47       ` Dave Martin
  0 siblings, 1 reply; 22+ messages in thread
From: André Przywara @ 2020-09-28 13:59 UTC (permalink / raw)
  To: Dave Martin
  Cc: Will Deacon, Catalin Marinas, Peter Zijlstra, Ingo Molnar,
	Arnaldo Carvalho de Melo, Mark Rutland, Suzuki K Poulose,
	Alexander Shishkin, linux-kernel, James Clark, Leo Yan,
	Namhyung Kim, Jiri Olsa, Tan Xiaojun, linux-arm-kernel, Wei Li

On 28/09/2020 14:21, Dave Martin wrote:

Hi Dave,

> On Tue, Sep 22, 2020 at 11:12:25AM +0100, Andre Przywara wrote:
>> The Scalable Vector Extension (SVE) is an ARMv8 architecture extension
>> that introduces very long vector operations (up to 2048 bits).
> 
> (8192, in fact, though don't expect to see that on real hardware any
> time soon...  qemu and the Arm fast model can do it, though.)
> 
>> The SPE profiling feature can tag SVE instructions with additional
>> properties like predication or the effective vector length.
>>
>> Decode the new operation type bits in the SPE decoder to allow the perf
>> tool to correctly report about SVE instructions.
> 
> 
> I don't know anything about SPE, so just commenting on a few minor
> things that catch my eye here.

Many thanks for taking a look!
Please note that I actually missed a prior submission by Wei, so the
code changes here will end up in:
https://lore.kernel.org/patchwork/patch/1288413/

But your two points below magically apply to his patch as well, so....

> 
>> Signed-off-by: Andre Przywara <andre.przywara@arm.com>
>> ---
>>  .../arm-spe-decoder/arm-spe-pkt-decoder.c     | 48 ++++++++++++++++++-
>>  1 file changed, 47 insertions(+), 1 deletion(-)
>>
>> diff --git a/tools/perf/util/arm-spe-decoder/arm-spe-pkt-decoder.c b/tools/perf/util/arm-spe-decoder/arm-spe-pkt-decoder.c
>> index a033f34846a6..f0c369259554 100644
>> --- a/tools/perf/util/arm-spe-decoder/arm-spe-pkt-decoder.c
>> +++ b/tools/perf/util/arm-spe-decoder/arm-spe-pkt-decoder.c
>> @@ -372,8 +372,35 @@ int arm_spe_pkt_desc(const struct arm_spe_pkt *packet, char *buf,
>>  	}
>>  	case ARM_SPE_OP_TYPE:
>>  		switch (idx) {
>> -		case 0:	return snprintf(buf, buf_len, "%s", payload & 0x1 ?
>> +		case 0: {
>> +			size_t blen = buf_len;
>> +
>> +			if ((payload & 0x89) == 0x08) {
>> +				ret = snprintf(buf, buf_len, "SVE");
>> +				buf += ret;
>> +				blen -= ret;
> 
> (Nit: can ret be < 0 ?  I've never been 100% clear on this myself for
> the s*printf() family -- if this assumption is widespread in perf tool
> a lready that I guess just go with the flow.)

Yeah, some parts of the code in here check for -1, actually, but doing
this on every call to snprintf would push this current code over the
edge - and I cowardly avoided a refactoring ;-)

Please note that his is perf userland, and also we are printing constant
strings here.
Although admittedly this starts to sounds like an excuse now ...

> I wonder if this snprintf+increment+decrement sequence could be wrapped
> up as a helper, rather than having to be repeated all over the place.

Yes, I was hoping nobody would notice ;-)

>> +				if (payload & 0x2)
>> +					ret = snprintf(buf, buf_len, " FP");
>> +				else
>> +					ret = snprintf(buf, buf_len, " INT");
>> +				buf += ret;
>> +				blen -= ret;
>> +				if (payload & 0x4) {
>> +					ret = snprintf(buf, buf_len, " PRED");
>> +					buf += ret;
>> +					blen -= ret;
>> +				}
>> +				/* Bits [7..4] encode the vector length */
>> +				ret = snprintf(buf, buf_len, " EVLEN%d",
>> +					       32 << ((payload >> 4) & 0x7));
> 
> Isn't this just extracting 3 bits (0x7)? 

Ah, right, the comment is wrong. It's actually bits [6:4].

> And what unit are we aiming
> for here: is it the number of bytes per vector, or something else?  I'm
> confused by the fact that this will go up in steps of 32, which doesn't
> seem to match up to the architecure.

So this is how SPE encodes the effective vector length in its payload:
the format is described in section "D10.2.7 Operation Type packet" in a
(recent) ARMv8 ARM. I put the above statement in a C file and ran all
input values through it, it produced the exact *bit* length values as in
the spec.

Is there any particular pattern you are concerned about?
I admit this is somewhat hackish, I can do an extra function to put some
comments in there.

> 
> I notice that bit 7 has to be zero to get into this if() though.
> 
>> +				buf += ret;
>> +				blen -= ret;
>> +				return buf_len - blen;
>> +			}
>> +
>> +			return snprintf(buf, buf_len, "%s", payload & 0x1 ?
>>  					"COND-SELECT" : "INSN-OTHER");
>> +			}
>>  		case 1:	{
>>  			size_t blen = buf_len;
>>  
>> @@ -403,6 +430,25 @@ int arm_spe_pkt_desc(const struct arm_spe_pkt *packet, char *buf,
>>  				ret = snprintf(buf, buf_len, " NV-SYSREG");
>>  				buf += ret;
>>  				blen -= ret;
>> +			} else if ((payload & 0x0a) == 0x08) {
>> +				ret = snprintf(buf, buf_len, " SVE");
>> +				buf += ret;
>> +				blen -= ret;
>> +				if (payload & 0x4) {
>> +					ret = snprintf(buf, buf_len, " PRED");
>> +					buf += ret;
>> +					blen -= ret;
>> +				}
>> +				if (payload & 0x80) {
>> +					ret = snprintf(buf, buf_len, " SG");
>> +					buf += ret;
>> +					blen -= ret;
>> +				}
>> +				/* Bits [7..4] encode the vector length */
>> +				ret = snprintf(buf, buf_len, " EVLEN%d",
>> +					       32 << ((payload >> 4) & 0x7));
> 
> Same comment as above.  Maybe have a common helper for decoding the
> vector length bits so it can be fixed in a single place?

Yup. Although I wonder if this is the smallest of the problems with this
function going forward.

Cheers,
Andre

> 
>> +				buf += ret;
>> +				blen -= ret;
>>  			} else if (payload & 0x4) {
>>  				ret = snprintf(buf, buf_len, " SIMD-FP");
>>  				buf += ret;
> 
> Cheers
> ---Dave
> 


^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [PATCH 5/5] perf: arm_spe: Decode SVE events
  2020-09-28 13:59     ` André Przywara
@ 2020-09-28 14:47       ` Dave Martin
  2020-09-29  2:19         ` Leo Yan
  0 siblings, 1 reply; 22+ messages in thread
From: Dave Martin @ 2020-09-28 14:47 UTC (permalink / raw)
  To: André Przywara
  Cc: Mark Rutland, Wei Li, Suzuki K Poulose, Peter Zijlstra,
	Catalin Marinas, Jiri Olsa, linux-kernel,
	Arnaldo Carvalho de Melo, Alexander Shishkin, Ingo Molnar,
	James Clark, Leo Yan, Namhyung Kim, Will Deacon, Tan Xiaojun,
	linux-arm-kernel

On Mon, Sep 28, 2020 at 02:59:34PM +0100, André Przywara wrote:
> On 28/09/2020 14:21, Dave Martin wrote:
> 
> Hi Dave,
> 
> > On Tue, Sep 22, 2020 at 11:12:25AM +0100, Andre Przywara wrote:
> >> The Scalable Vector Extension (SVE) is an ARMv8 architecture extension
> >> that introduces very long vector operations (up to 2048 bits).
> > 
> > (8192, in fact, though don't expect to see that on real hardware any
> > time soon...  qemu and the Arm fast model can do it, though.)
> > 
> >> The SPE profiling feature can tag SVE instructions with additional
> >> properties like predication or the effective vector length.
> >>
> >> Decode the new operation type bits in the SPE decoder to allow the perf
> >> tool to correctly report about SVE instructions.
> > 
> > 
> > I don't know anything about SPE, so just commenting on a few minor
> > things that catch my eye here.
> 
> Many thanks for taking a look!
> Please note that I actually missed a prior submission by Wei, so the
> code changes here will end up in:
> https://lore.kernel.org/patchwork/patch/1288413/
> 
> But your two points below magically apply to his patch as well, so....
> 
> > 
> >> Signed-off-by: Andre Przywara <andre.przywara@arm.com>
> >> ---
> >>  .../arm-spe-decoder/arm-spe-pkt-decoder.c     | 48 ++++++++++++++++++-
> >>  1 file changed, 47 insertions(+), 1 deletion(-)
> >>
> >> diff --git a/tools/perf/util/arm-spe-decoder/arm-spe-pkt-decoder.c b/tools/perf/util/arm-spe-decoder/arm-spe-pkt-decoder.c
> >> index a033f34846a6..f0c369259554 100644
> >> --- a/tools/perf/util/arm-spe-decoder/arm-spe-pkt-decoder.c
> >> +++ b/tools/perf/util/arm-spe-decoder/arm-spe-pkt-decoder.c
> >> @@ -372,8 +372,35 @@ int arm_spe_pkt_desc(const struct arm_spe_pkt *packet, char *buf,
> >>  	}
> >>  	case ARM_SPE_OP_TYPE:
> >>  		switch (idx) {
> >> -		case 0:	return snprintf(buf, buf_len, "%s", payload & 0x1 ?
> >> +		case 0: {
> >> +			size_t blen = buf_len;
> >> +
> >> +			if ((payload & 0x89) == 0x08) {
> >> +				ret = snprintf(buf, buf_len, "SVE");
> >> +				buf += ret;
> >> +				blen -= ret;
> > 
> > (Nit: can ret be < 0 ?  I've never been 100% clear on this myself for
> > the s*printf() family -- if this assumption is widespread in perf tool
> > a lready that I guess just go with the flow.)
> 
> Yeah, some parts of the code in here check for -1, actually, but doing
> this on every call to snprintf would push this current code over the
> edge - and I cowardly avoided a refactoring ;-)
> 
> Please note that his is perf userland, and also we are printing constant
> strings here.
> Although admittedly this starts to sounds like an excuse now ...
> 
> > I wonder if this snprintf+increment+decrement sequence could be wrapped
> > up as a helper, rather than having to be repeated all over the place.
> 
> Yes, I was hoping nobody would notice ;-)

It's probably not worth losing sleep over.

snprintf(3) says, under NOTES:

	Until glibc 2.0.6, they would return -1 when the output was
	truncated.

which is probably ancient enough history that we don't care.  C11 does
say that a negative return value can happen "if an encoding error
occurred".  _Probably_ not a problem if perf tool never calls
setlocale(), but ...


> >> +				if (payload & 0x2)
> >> +					ret = snprintf(buf, buf_len, " FP");
> >> +				else
> >> +					ret = snprintf(buf, buf_len, " INT");
> >> +				buf += ret;
> >> +				blen -= ret;
> >> +				if (payload & 0x4) {
> >> +					ret = snprintf(buf, buf_len, " PRED");
> >> +					buf += ret;
> >> +					blen -= ret;
> >> +				}
> >> +				/* Bits [7..4] encode the vector length */
> >> +				ret = snprintf(buf, buf_len, " EVLEN%d",
> >> +					       32 << ((payload >> 4) & 0x7));
> > 
> > Isn't this just extracting 3 bits (0x7)? 
> 
> Ah, right, the comment is wrong. It's actually bits [6:4].
> 
> > And what unit are we aiming
> > for here: is it the number of bytes per vector, or something else?  I'm
> > confused by the fact that this will go up in steps of 32, which doesn't
> > seem to match up to the architecure.
> 
> So this is how SPE encodes the effective vector length in its payload:
> the format is described in section "D10.2.7 Operation Type packet" in a
> (recent) ARMv8 ARM. I put the above statement in a C file and ran all
> input values through it, it produced the exact *bit* length values as in
> the spec.
> 
> Is there any particular pattern you are concerned about?
> I admit this is somewhat hackish, I can do an extra function to put some
> comments in there.

Mostly I'm curious because the encoding doesn't match the SVE
architecture: SVE requires 4 bits to specify the vector length, not 3.
This might have been a deliberate limitation in the SPE spec., but it
raises questions about what should happen when 3 bits is not enough.

For SVE, valid vector lengths are 16 bytes * n
or equivalently 128 bits * n), where 1 <= n <= 16.

The code here though cannot print EVLEN16 or EVLEN48 etc.  This might
not be a bug, but I'd like to understand where it comes from...

> 
> > 
> > I notice that bit 7 has to be zero to get into this if() though.
> > 
> >> +				buf += ret;
> >> +				blen -= ret;
> >> +				return buf_len - blen;
> >> +			}
> >> +
> >> +			return snprintf(buf, buf_len, "%s", payload & 0x1 ?
> >>  					"COND-SELECT" : "INSN-OTHER");
> >> +			}
> >>  		case 1:	{
> >>  			size_t blen = buf_len;
> >>  
> >> @@ -403,6 +430,25 @@ int arm_spe_pkt_desc(const struct arm_spe_pkt *packet, char *buf,
> >>  				ret = snprintf(buf, buf_len, " NV-SYSREG");
> >>  				buf += ret;
> >>  				blen -= ret;
> >> +			} else if ((payload & 0x0a) == 0x08) {
> >> +				ret = snprintf(buf, buf_len, " SVE");
> >> +				buf += ret;
> >> +				blen -= ret;
> >> +				if (payload & 0x4) {
> >> +					ret = snprintf(buf, buf_len, " PRED");
> >> +					buf += ret;
> >> +					blen -= ret;
> >> +				}
> >> +				if (payload & 0x80) {
> >> +					ret = snprintf(buf, buf_len, " SG");
> >> +					buf += ret;
> >> +					blen -= ret;
> >> +				}
> >> +				/* Bits [7..4] encode the vector length */
> >> +				ret = snprintf(buf, buf_len, " EVLEN%d",
> >> +					       32 << ((payload >> 4) & 0x7));
> > 
> > Same comment as above.  Maybe have a common helper for decoding the
> > vector length bits so it can be fixed in a single place?
> 
> Yup. Although I wonder if this is the smallest of the problems with this
> function going forward.
> 
> Cheers,
> Andre

Fair enough.

Cheers
---Dave

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [PATCH 5/5] perf: arm_spe: Decode SVE events
  2020-09-28 14:47       ` Dave Martin
@ 2020-09-29  2:19         ` Leo Yan
  2020-09-29 14:03           ` Dave Martin
  2020-09-30 10:34           ` Dave Martin
  0 siblings, 2 replies; 22+ messages in thread
From: Leo Yan @ 2020-09-29  2:19 UTC (permalink / raw)
  To: Dave Martin
  Cc: André Przywara, Mark Rutland, Wei Li, Suzuki K Poulose,
	Peter Zijlstra, Catalin Marinas, Jiri Olsa, linux-kernel,
	Arnaldo Carvalho de Melo, Alexander Shishkin, Ingo Molnar,
	James Clark, Namhyung Kim, Will Deacon, Tan Xiaojun,
	linux-arm-kernel, Al Grant

On Mon, Sep 28, 2020 at 03:47:56PM +0100, Dave Martin wrote:
> On Mon, Sep 28, 2020 at 02:59:34PM +0100, André Przywara wrote:
> > On 28/09/2020 14:21, Dave Martin wrote:
> > 
> > Hi Dave,
> > 
> > > On Tue, Sep 22, 2020 at 11:12:25AM +0100, Andre Przywara wrote:
> > >> The Scalable Vector Extension (SVE) is an ARMv8 architecture extension
> > >> that introduces very long vector operations (up to 2048 bits).
> > > 
> > > (8192, in fact, though don't expect to see that on real hardware any
> > > time soon...  qemu and the Arm fast model can do it, though.)
> > > 
> > >> The SPE profiling feature can tag SVE instructions with additional
> > >> properties like predication or the effective vector length.
> > >>
> > >> Decode the new operation type bits in the SPE decoder to allow the perf
> > >> tool to correctly report about SVE instructions.
> > > 
> > > 
> > > I don't know anything about SPE, so just commenting on a few minor
> > > things that catch my eye here.
> > 
> > Many thanks for taking a look!
> > Please note that I actually missed a prior submission by Wei, so the
> > code changes here will end up in:
> > https://lore.kernel.org/patchwork/patch/1288413/
> > 
> > But your two points below magically apply to his patch as well, so....
> > 
> > > 
> > >> Signed-off-by: Andre Przywara <andre.przywara@arm.com>
> > >> ---
> > >>  .../arm-spe-decoder/arm-spe-pkt-decoder.c     | 48 ++++++++++++++++++-
> > >>  1 file changed, 47 insertions(+), 1 deletion(-)
> > >>
> > >> diff --git a/tools/perf/util/arm-spe-decoder/arm-spe-pkt-decoder.c b/tools/perf/util/arm-spe-decoder/arm-spe-pkt-decoder.c
> > >> index a033f34846a6..f0c369259554 100644
> > >> --- a/tools/perf/util/arm-spe-decoder/arm-spe-pkt-decoder.c
> > >> +++ b/tools/perf/util/arm-spe-decoder/arm-spe-pkt-decoder.c
> > >> @@ -372,8 +372,35 @@ int arm_spe_pkt_desc(const struct arm_spe_pkt *packet, char *buf,
> > >>  	}
> > >>  	case ARM_SPE_OP_TYPE:
> > >>  		switch (idx) {
> > >> -		case 0:	return snprintf(buf, buf_len, "%s", payload & 0x1 ?
> > >> +		case 0: {
> > >> +			size_t blen = buf_len;
> > >> +
> > >> +			if ((payload & 0x89) == 0x08) {
> > >> +				ret = snprintf(buf, buf_len, "SVE");
> > >> +				buf += ret;
> > >> +				blen -= ret;
> > > 
> > > (Nit: can ret be < 0 ?  I've never been 100% clear on this myself for
> > > the s*printf() family -- if this assumption is widespread in perf tool
> > > a lready that I guess just go with the flow.)
> > 
> > Yeah, some parts of the code in here check for -1, actually, but doing
> > this on every call to snprintf would push this current code over the
> > edge - and I cowardly avoided a refactoring ;-)
> > 
> > Please note that his is perf userland, and also we are printing constant
> > strings here.
> > Although admittedly this starts to sounds like an excuse now ...
> > 
> > > I wonder if this snprintf+increment+decrement sequence could be wrapped
> > > up as a helper, rather than having to be repeated all over the place.
> > 
> > Yes, I was hoping nobody would notice ;-)
> 
> It's probably not worth losing sleep over.
> 
> snprintf(3) says, under NOTES:
> 
> 	Until glibc 2.0.6, they would return -1 when the output was
> 	truncated.
> 
> which is probably ancient enough history that we don't care.  C11 does
> say that a negative return value can happen "if an encoding error
> occurred".  _Probably_ not a problem if perf tool never calls
> setlocale(), but ...

I have one patch which tried to fix the snprintf+increment sequence
[1], to be honest, the change seems urgly for me.  I agree it's better
to use a helper to wrap up.

[1] https://lore.kernel.org/patchwork/patch/1288410/

> > >> +				if (payload & 0x2)
> > >> +					ret = snprintf(buf, buf_len, " FP");
> > >> +				else
> > >> +					ret = snprintf(buf, buf_len, " INT");
> > >> +				buf += ret;
> > >> +				blen -= ret;
> > >> +				if (payload & 0x4) {
> > >> +					ret = snprintf(buf, buf_len, " PRED");
> > >> +					buf += ret;
> > >> +					blen -= ret;
> > >> +				}
> > >> +				/* Bits [7..4] encode the vector length */
> > >> +				ret = snprintf(buf, buf_len, " EVLEN%d",
> > >> +					       32 << ((payload >> 4) & 0x7));
> > > 
> > > Isn't this just extracting 3 bits (0x7)? 
> > 
> > Ah, right, the comment is wrong. It's actually bits [6:4].
> > 
> > > And what unit are we aiming
> > > for here: is it the number of bytes per vector, or something else?  I'm
> > > confused by the fact that this will go up in steps of 32, which doesn't
> > > seem to match up to the architecure.
> > 
> > So this is how SPE encodes the effective vector length in its payload:
> > the format is described in section "D10.2.7 Operation Type packet" in a
> > (recent) ARMv8 ARM. I put the above statement in a C file and ran all
> > input values through it, it produced the exact *bit* length values as in
> > the spec.
> > 
> > Is there any particular pattern you are concerned about?
> > I admit this is somewhat hackish, I can do an extra function to put some
> > comments in there.
> 
> Mostly I'm curious because the encoding doesn't match the SVE
> architecture: SVE requires 4 bits to specify the vector length, not 3.
> This might have been a deliberate limitation in the SPE spec., but it
> raises questions about what should happen when 3 bits is not enough.
> 
> For SVE, valid vector lengths are 16 bytes * n
> or equivalently 128 bits * n), where 1 <= n <= 16.
> 
> The code here though cannot print EVLEN16 or EVLEN48 etc.  This might
> not be a bug, but I'd like to understand where it comes from...

In the SPE's spec, the defined values for EVL are:

  0b'000 -> EVLEN: 32 bits.
  0b'001 -> EVLEN: 64 bits.
  0b'010 -> EVLEN: 128 bits.
  0b'011 -> EVLEN: 256 bits.
  0b'100 -> EVLEN: 512 bits.
  0b'101 -> EVLEN: 1024 bits.
  0b'110 -> EVLEN: 2048 bits.

Note that 0b'111 is reserved.  In theory, I think SPE Operation packet
can support up to 4196 bits (32 << 7) when the EVL field is 0b'111; but
it's impossible to express vector length for 8192 bits as you mentioned.

Thanks,
Leo

> > > I notice that bit 7 has to be zero to get into this if() though.
> > > 
> > >> +				buf += ret;
> > >> +				blen -= ret;
> > >> +				return buf_len - blen;
> > >> +			}
> > >> +
> > >> +			return snprintf(buf, buf_len, "%s", payload & 0x1 ?
> > >>  					"COND-SELECT" : "INSN-OTHER");
> > >> +			}
> > >>  		case 1:	{
> > >>  			size_t blen = buf_len;
> > >>  
> > >> @@ -403,6 +430,25 @@ int arm_spe_pkt_desc(const struct arm_spe_pkt *packet, char *buf,
> > >>  				ret = snprintf(buf, buf_len, " NV-SYSREG");
> > >>  				buf += ret;
> > >>  				blen -= ret;
> > >> +			} else if ((payload & 0x0a) == 0x08) {
> > >> +				ret = snprintf(buf, buf_len, " SVE");
> > >> +				buf += ret;
> > >> +				blen -= ret;
> > >> +				if (payload & 0x4) {
> > >> +					ret = snprintf(buf, buf_len, " PRED");
> > >> +					buf += ret;
> > >> +					blen -= ret;
> > >> +				}
> > >> +				if (payload & 0x80) {
> > >> +					ret = snprintf(buf, buf_len, " SG");
> > >> +					buf += ret;
> > >> +					blen -= ret;
> > >> +				}
> > >> +				/* Bits [7..4] encode the vector length */
> > >> +				ret = snprintf(buf, buf_len, " EVLEN%d",
> > >> +					       32 << ((payload >> 4) & 0x7));
> > > 
> > > Same comment as above.  Maybe have a common helper for decoding the
> > > vector length bits so it can be fixed in a single place?
> > 
> > Yup. Although I wonder if this is the smallest of the problems with this
> > function going forward.
> > 
> > Cheers,
> > Andre
> 
> Fair enough.
> 
> Cheers
> ---Dave

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [PATCH 5/5] perf: arm_spe: Decode SVE events
  2020-09-29  2:19         ` Leo Yan
@ 2020-09-29 14:03           ` Dave Martin
  2020-09-30 10:34           ` Dave Martin
  1 sibling, 0 replies; 22+ messages in thread
From: Dave Martin @ 2020-09-29 14:03 UTC (permalink / raw)
  To: Leo Yan
  Cc: Mark Rutland, Al Grant, Will Deacon, Suzuki K Poulose,
	Peter Zijlstra, André Przywara, Jiri Olsa, linux-kernel,
	Arnaldo Carvalho de Melo, Alexander Shishkin, Ingo Molnar,
	James Clark, Catalin Marinas, Namhyung Kim, Wei Li, Tan Xiaojun,
	linux-arm-kernel

On Tue, Sep 29, 2020 at 10:19:02AM +0800, Leo Yan wrote:
> On Mon, Sep 28, 2020 at 03:47:56PM +0100, Dave Martin wrote:
> > On Mon, Sep 28, 2020 at 02:59:34PM +0100, André Przywara wrote:
> > > On 28/09/2020 14:21, Dave Martin wrote:
> > > 
> > > Hi Dave,
> > > 
> > > > On Tue, Sep 22, 2020 at 11:12:25AM +0100, Andre Przywara wrote:
> > > >> The Scalable Vector Extension (SVE) is an ARMv8 architecture extension
> > > >> that introduces very long vector operations (up to 2048 bits).
> > > > 
> > > > (8192, in fact, though don't expect to see that on real hardware any
> > > > time soon...  qemu and the Arm fast model can do it, though.)

[...]

> > Mostly I'm curious because the encoding doesn't match the SVE
> > architecture: SVE requires 4 bits to specify the vector length, not 3.
> > This might have been a deliberate limitation in the SPE spec., but it
> > raises questions about what should happen when 3 bits is not enough.
> > 
> > For SVE, valid vector lengths are 16 bytes * n
> > or equivalently 128 bits * n), where 1 <= n <= 16.
> > 
> > The code here though cannot print EVLEN16 or EVLEN48 etc.  This might
> > not be a bug, but I'd like to understand where it comes from...
> 
> In the SPE's spec, the defined values for EVL are:
> 
>   0b'000 -> EVLEN: 32 bits.
>   0b'001 -> EVLEN: 64 bits.
>   0b'010 -> EVLEN: 128 bits.
>   0b'011 -> EVLEN: 256 bits.
>   0b'100 -> EVLEN: 512 bits.
>   0b'101 -> EVLEN: 1024 bits.
>   0b'110 -> EVLEN: 2048 bits.
> 
> Note that 0b'111 is reserved.  In theory, I think SPE Operation packet
> can support up to 4196 bits (32 << 7) when the EVL field is 0b'111; but

OK, having looked at the spec I can now confirm that this look correct.
I was expecting a more direct correspondence between the SVE ISA and
these events, but it looks like SPE may report on a finer granularity
than whole instructions, hence showing effective vector lengths smaller
than 32; also SPE rounds the reported effective vector length up to a
power of two, which allows the full range of lengths to be reported via
the 3-bit EVL field.

> it's impossible to express vector length for 8192 bits as you mentioned.

Yes, ignore my comment about 8192-bit vectors: I was confusing myself
(the Linux API extensions support up to 8192 _bytes_ per vector in order
to have some expansion room just in case; however the SVE architecture
limits vectors to at most 2048 bits).

So I don't see any obvious issues.

It might be a good idea to explicitly reject the encoding 0b111, since
we can't be certain what it is going to mean -- however, I don't have a
strong opinion on this.

Cheers
---Dave

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [PATCH 5/5] perf: arm_spe: Decode SVE events
  2020-09-29  2:19         ` Leo Yan
  2020-09-29 14:03           ` Dave Martin
@ 2020-09-30 10:34           ` Dave Martin
  2020-09-30 11:04             ` Leo Yan
  1 sibling, 1 reply; 22+ messages in thread
From: Dave Martin @ 2020-09-30 10:34 UTC (permalink / raw)
  To: Leo Yan
  Cc: Mark Rutland, Al Grant, Will Deacon, Suzuki K Poulose,
	Peter Zijlstra, André Przywara, Jiri Olsa, linux-kernel,
	Arnaldo Carvalho de Melo, Alexander Shishkin, Ingo Molnar,
	James Clark, Catalin Marinas, Namhyung Kim, Wei Li, Tan Xiaojun,
	linux-arm-kernel

On Tue, Sep 29, 2020 at 10:19:02AM +0800, Leo Yan wrote:
> On Mon, Sep 28, 2020 at 03:47:56PM +0100, Dave Martin wrote:
> > On Mon, Sep 28, 2020 at 02:59:34PM +0100, André Przywara wrote:
> > > On 28/09/2020 14:21, Dave Martin wrote:
> > > 
> > > Hi Dave,
> > > 
> > > > On Tue, Sep 22, 2020 at 11:12:25AM +0100, Andre Przywara wrote:
> > > >> The Scalable Vector Extension (SVE) is an ARMv8 architecture extension
> > > >> that introduces very long vector operations (up to 2048 bits).
> > > > 
> > > > (8192, in fact, though don't expect to see that on real hardware any
> > > > time soon...  qemu and the Arm fast model can do it, though.)
> > > > 
> > > >> The SPE profiling feature can tag SVE instructions with additional
> > > >> properties like predication or the effective vector length.
> > > >>
> > > >> Decode the new operation type bits in the SPE decoder to allow the perf
> > > >> tool to correctly report about SVE instructions.
> > > > 
> > > > 
> > > > I don't know anything about SPE, so just commenting on a few minor
> > > > things that catch my eye here.
> > > 
> > > Many thanks for taking a look!
> > > Please note that I actually missed a prior submission by Wei, so the
> > > code changes here will end up in:
> > > https://lore.kernel.org/patchwork/patch/1288413/
> > > 
> > > But your two points below magically apply to his patch as well, so....
> > > 
> > > > 
> > > >> Signed-off-by: Andre Przywara <andre.przywara@arm.com>
> > > >> ---
> > > >>  .../arm-spe-decoder/arm-spe-pkt-decoder.c     | 48 ++++++++++++++++++-
> > > >>  1 file changed, 47 insertions(+), 1 deletion(-)
> > > >>
> > > >> diff --git a/tools/perf/util/arm-spe-decoder/arm-spe-pkt-decoder.c b/tools/perf/util/arm-spe-decoder/arm-spe-pkt-decoder.c
> > > >> index a033f34846a6..f0c369259554 100644
> > > >> --- a/tools/perf/util/arm-spe-decoder/arm-spe-pkt-decoder.c
> > > >> +++ b/tools/perf/util/arm-spe-decoder/arm-spe-pkt-decoder.c
> > > >> @@ -372,8 +372,35 @@ int arm_spe_pkt_desc(const struct arm_spe_pkt *packet, char *buf,
> > > >>  	}
> > > >>  	case ARM_SPE_OP_TYPE:
> > > >>  		switch (idx) {
> > > >> -		case 0:	return snprintf(buf, buf_len, "%s", payload & 0x1 ?
> > > >> +		case 0: {
> > > >> +			size_t blen = buf_len;
> > > >> +
> > > >> +			if ((payload & 0x89) == 0x08) {
> > > >> +				ret = snprintf(buf, buf_len, "SVE");
> > > >> +				buf += ret;
> > > >> +				blen -= ret;
> > > > 
> > > > (Nit: can ret be < 0 ?  I've never been 100% clear on this myself for
> > > > the s*printf() family -- if this assumption is widespread in perf tool
> > > > a lready that I guess just go with the flow.)
> > > 
> > > Yeah, some parts of the code in here check for -1, actually, but doing
> > > this on every call to snprintf would push this current code over the
> > > edge - and I cowardly avoided a refactoring ;-)
> > > 
> > > Please note that his is perf userland, and also we are printing constant
> > > strings here.
> > > Although admittedly this starts to sounds like an excuse now ...
> > > 
> > > > I wonder if this snprintf+increment+decrement sequence could be wrapped
> > > > up as a helper, rather than having to be repeated all over the place.
> > > 
> > > Yes, I was hoping nobody would notice ;-)
> > 
> > It's probably not worth losing sleep over.
> > 
> > snprintf(3) says, under NOTES:
> > 
> > 	Until glibc 2.0.6, they would return -1 when the output was
> > 	truncated.
> > 
> > which is probably ancient enough history that we don't care.  C11 does
> > say that a negative return value can happen "if an encoding error
> > occurred".  _Probably_ not a problem if perf tool never calls
> > setlocale(), but ...
> 
> I have one patch which tried to fix the snprintf+increment sequence
> [1], to be honest, the change seems urgly for me.  I agree it's better
> to use a helper to wrap up.
> 
> [1] https://lore.kernel.org/patchwork/patch/1288410/

Sure, putting explicit checks all over the place makes a lot of noise in
the code.

I was wondering whether something along the following lines would work:

	/* ... */

	if (payload & SVE_EVT_PKT_GEN_EXCEPTION)
		buf_appendf_err(&buf, &buf_len, &ret, " EXCEPTION-GEN");
	if (payload & SVE_EVT_PKT_ARCH_RETIRED)
		buf_appendf_err(&buf, &buf_len, &ret, " RETIRED");
	if (payload & SVE_EVT_PKT_L1D_ACCESS)
		buf_appendf_err(&buf, &buf_len, &ret, " L1D-ACCESS");

	/* ... */

	if (ret)
		return ret;

[...]

Best to keep such refactoring independent of this series though.

Cheers
---Dave

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [PATCH 5/5] perf: arm_spe: Decode SVE events
  2020-09-30 10:34           ` Dave Martin
@ 2020-09-30 11:04             ` Leo Yan
  2020-10-05 10:15               ` Dave Martin
  0 siblings, 1 reply; 22+ messages in thread
From: Leo Yan @ 2020-09-30 11:04 UTC (permalink / raw)
  To: Dave Martin
  Cc: Mark Rutland, Al Grant, Will Deacon, Suzuki K Poulose,
	Peter Zijlstra, André Przywara, Jiri Olsa, linux-kernel,
	Arnaldo Carvalho de Melo, Alexander Shishkin, Ingo Molnar,
	James Clark, Catalin Marinas, Namhyung Kim, Wei Li, Tan Xiaojun,
	linux-arm-kernel

Hi Dave,

On Wed, Sep 30, 2020 at 11:34:11AM +0100, Dave Martin wrote:

[...]

> > > > >> diff --git a/tools/perf/util/arm-spe-decoder/arm-spe-pkt-decoder.c b/tools/perf/util/arm-spe-decoder/arm-spe-pkt-decoder.c
> > > > >> index a033f34846a6..f0c369259554 100644
> > > > >> --- a/tools/perf/util/arm-spe-decoder/arm-spe-pkt-decoder.c
> > > > >> +++ b/tools/perf/util/arm-spe-decoder/arm-spe-pkt-decoder.c
> > > > >> @@ -372,8 +372,35 @@ int arm_spe_pkt_desc(const struct arm_spe_pkt *packet, char *buf,
> > > > >>  	}
> > > > >>  	case ARM_SPE_OP_TYPE:
> > > > >>  		switch (idx) {
> > > > >> -		case 0:	return snprintf(buf, buf_len, "%s", payload & 0x1 ?
> > > > >> +		case 0: {
> > > > >> +			size_t blen = buf_len;
> > > > >> +
> > > > >> +			if ((payload & 0x89) == 0x08) {
> > > > >> +				ret = snprintf(buf, buf_len, "SVE");
> > > > >> +				buf += ret;
> > > > >> +				blen -= ret;
> > > > > 
> > > > > (Nit: can ret be < 0 ?  I've never been 100% clear on this myself for
> > > > > the s*printf() family -- if this assumption is widespread in perf tool
> > > > > a lready that I guess just go with the flow.)
> > > > 
> > > > Yeah, some parts of the code in here check for -1, actually, but doing
> > > > this on every call to snprintf would push this current code over the
> > > > edge - and I cowardly avoided a refactoring ;-)
> > > > 
> > > > Please note that his is perf userland, and also we are printing constant
> > > > strings here.
> > > > Although admittedly this starts to sounds like an excuse now ...
> > > > 
> > > > > I wonder if this snprintf+increment+decrement sequence could be wrapped
> > > > > up as a helper, rather than having to be repeated all over the place.
> > > > 
> > > > Yes, I was hoping nobody would notice ;-)
> > > 
> > > It's probably not worth losing sleep over.
> > > 
> > > snprintf(3) says, under NOTES:
> > > 
> > > 	Until glibc 2.0.6, they would return -1 when the output was
> > > 	truncated.
> > > 
> > > which is probably ancient enough history that we don't care.  C11 does
> > > say that a negative return value can happen "if an encoding error
> > > occurred".  _Probably_ not a problem if perf tool never calls
> > > setlocale(), but ...
> > 
> > I have one patch which tried to fix the snprintf+increment sequence
> > [1], to be honest, the change seems urgly for me.  I agree it's better
> > to use a helper to wrap up.
> > 
> > [1] https://lore.kernel.org/patchwork/patch/1288410/
> 
> Sure, putting explicit checks all over the place makes a lot of noise in
> the code.
> 
> I was wondering whether something along the following lines would work:
> 
> 	/* ... */
> 
> 	if (payload & SVE_EVT_PKT_GEN_EXCEPTION)
> 		buf_appendf_err(&buf, &buf_len, &ret, " EXCEPTION-GEN");
> 	if (payload & SVE_EVT_PKT_ARCH_RETIRED)
> 		buf_appendf_err(&buf, &buf_len, &ret, " RETIRED");
> 	if (payload & SVE_EVT_PKT_L1D_ACCESS)
> 		buf_appendf_err(&buf, &buf_len, &ret, " L1D-ACCESS");
> 
> 	/* ... */
> 
> 	if (ret)
> 		return ret;
> 
> [...]

I have sent out the patch v2 [1] and Cc'ed you; I used a similiar API
definition with your suggestion:

  static int arm_spe_pkt_snprintf(char **buf_p, size_t *blen,
 				  const char *fmt, ...)

Only a difference is when return from arm_spe_pkt_snprintf(), will check
the return value and directly bail out when detect failure.  Your input
will be considered for next spin.

> Best to keep such refactoring independent of this series though.

Yeah, the patch set [2] is quite heavy; after get some reviewing,
maybe need to consider to split into 2 or even 3 small patch sets.

Thanks a lot for your suggestions!

Leo

[1] https://lore.kernel.org/patchwork/patch/1314603/
[2] https://lore.kernel.org/patchwork/cover/1314599/

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [PATCH 5/5] perf: arm_spe: Decode SVE events
  2020-09-30 11:04             ` Leo Yan
@ 2020-10-05 10:15               ` Dave Martin
  0 siblings, 0 replies; 22+ messages in thread
From: Dave Martin @ 2020-10-05 10:15 UTC (permalink / raw)
  To: Leo Yan
  Cc: Mark Rutland, Al Grant, linux-arm-kernel, Suzuki K Poulose,
	Peter Zijlstra, André Przywara, Jiri Olsa, linux-kernel,
	Arnaldo Carvalho de Melo, Alexander Shishkin, Ingo Molnar,
	James Clark, Catalin Marinas, Namhyung Kim, Will Deacon,
	Tan Xiaojun, Wei Li

On Wed, Sep 30, 2020 at 07:04:53PM +0800, Leo Yan wrote:

[...]

> > > > > >> diff --git a/tools/perf/util/arm-spe-decoder/arm-spe-pkt-decoder.c b/tools/perf/util/arm-spe-decoder/arm-spe-pkt-decoder.c
> > > > > >> index a033f34846a6..f0c369259554 100644
> > > > > >> --- a/tools/perf/util/arm-spe-decoder/arm-spe-pkt-decoder.c
> > > > > >> +++ b/tools/perf/util/arm-spe-decoder/arm-spe-pkt-decoder.c
> > > > > >> @@ -372,8 +372,35 @@ int arm_spe_pkt_desc(const struct arm_spe_pkt *packet, char *buf,
> > > > > >>  	}
> > > > > >>  	case ARM_SPE_OP_TYPE:
> > > > > >>  		switch (idx) {
> > > > > >> -		case 0:	return snprintf(buf, buf_len, "%s", payload & 0x1 ?
> > > > > >> +		case 0: {
> > > > > >> +			size_t blen = buf_len;
> > > > > >> +
> > > > > >> +			if ((payload & 0x89) == 0x08) {
> > > > > >> +				ret = snprintf(buf, buf_len, "SVE");
> > > > > >> +				buf += ret;
> > > > > >> +				blen -= ret;
> > > > > > 
> > > > > > (Nit: can ret be < 0 ?  I've never been 100% clear on this myself for
> > > > > > the s*printf() family -- if this assumption is widespread in perf tool
> > > > > > a lready that I guess just go with the flow.)
> > > > > 
> > > > > Yeah, some parts of the code in here check for -1, actually, but doing
> > > > > this on every call to snprintf would push this current code over the
> > > > > edge - and I cowardly avoided a refactoring ;-)
> > > > > 
> > > > > Please note that his is perf userland, and also we are printing constant
> > > > > strings here.
> > > > > Although admittedly this starts to sounds like an excuse now ...
> > > > > 
> > > > > > I wonder if this snprintf+increment+decrement sequence could be wrapped
> > > > > > up as a helper, rather than having to be repeated all over the place.
> > > > > 
> > > > > Yes, I was hoping nobody would notice ;-)
> > > > 
> > > > It's probably not worth losing sleep over.
> > > > 
> > > > snprintf(3) says, under NOTES:
> > > > 
> > > > 	Until glibc 2.0.6, they would return -1 when the output was
> > > > 	truncated.
> > > > 
> > > > which is probably ancient enough history that we don't care.  C11 does
> > > > say that a negative return value can happen "if an encoding error
> > > > occurred".  _Probably_ not a problem if perf tool never calls
> > > > setlocale(), but ...
> > > 
> > > I have one patch which tried to fix the snprintf+increment sequence
> > > [1], to be honest, the change seems urgly for me.  I agree it's better
> > > to use a helper to wrap up.
> > > 
> > > [1] https://lore.kernel.org/patchwork/patch/1288410/
> > 
> > Sure, putting explicit checks all over the place makes a lot of noise in
> > the code.
> > 
> > I was wondering whether something along the following lines would work:
> > 
> > 	/* ... */
> > 
> > 	if (payload & SVE_EVT_PKT_GEN_EXCEPTION)
> > 		buf_appendf_err(&buf, &buf_len, &ret, " EXCEPTION-GEN");
> > 	if (payload & SVE_EVT_PKT_ARCH_RETIRED)
> > 		buf_appendf_err(&buf, &buf_len, &ret, " RETIRED");
> > 	if (payload & SVE_EVT_PKT_L1D_ACCESS)
> > 		buf_appendf_err(&buf, &buf_len, &ret, " L1D-ACCESS");
> > 
> > 	/* ... */
> > 
> > 	if (ret)
> > 		return ret;
> > 
> > [...]
> 
> I have sent out the patch v2 [1] and Cc'ed you; I used a similiar API
> definition with your suggestion:
> 
>   static int arm_spe_pkt_snprintf(char **buf_p, size_t *blen,
>  				  const char *fmt, ...)
> 
> Only a difference is when return from arm_spe_pkt_snprintf(), will check
> the return value and directly bail out when detect failure.  Your input
> will be considered for next spin.
> 
> > Best to keep such refactoring independent of this series though.
> 
> Yeah, the patch set [2] is quite heavy; after get some reviewing,
> maybe need to consider to split into 2 or even 3 small patch sets.
> 
> Thanks a lot for your suggestions!
>
> Leo

No problem, your approach seems reasonable to me.

Cheers
---Dave

> [1] https://lore.kernel.org/patchwork/patch/1314603/
> [2] https://lore.kernel.org/patchwork/cover/1314599/
> 
> _______________________________________________
> linux-arm-kernel mailing list
> linux-arm-kernel@lists.infradead.org
> http://lists.infradead.org/mailman/listinfo/linux-arm-kernel

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [PATCH 4/5] perf: arm_spe: Decode memory tagging properties
       [not found]     ` <20201013145103.GE1063281@kernel.org>
@ 2020-10-13 14:52       ` Arnaldo Carvalho de Melo
  0 siblings, 0 replies; 22+ messages in thread
From: Arnaldo Carvalho de Melo @ 2020-10-13 14:52 UTC (permalink / raw)
  To: Leo Yan
  Cc: Andre Przywara, Will Deacon, Catalin Marinas, Peter Zijlstra,
	Ingo Molnar, Mark Rutland, Alexander Shishkin, Jiri Olsa,
	Namhyung Kim, Suzuki K Poulose, Tan Xiaojun, James Clark,
	linux-arm-kernel, linux-kernel

Em Tue, Oct 13, 2020 at 11:51:03AM -0300, Arnaldo Carvalho de Melo escreveu:
> Em Sun, Sep 27, 2020 at 11:19:18AM +0800, Leo Yan escreveu:
> > On Tue, Sep 22, 2020 at 11:12:24AM +0100, Andre Przywara wrote:
> > > When SPE records a physical address, it can additionally tag the event
> > > with information from the Memory Tagging architecture extension.
> > > 
> > > Decode the two additional fields in the SPE event payload.
> > > 
> > > Signed-off-by: Andre Przywara <andre.przywara@arm.com>
> > > ---
> > >  .../util/arm-spe-decoder/arm-spe-pkt-decoder.c  | 17 ++++++++++++-----
> > >  1 file changed, 12 insertions(+), 5 deletions(-)
> > > 
> > > diff --git a/tools/perf/util/arm-spe-decoder/arm-spe-pkt-decoder.c b/tools/perf/util/arm-spe-decoder/arm-spe-pkt-decoder.c
> > > index 943e4155b246..a033f34846a6 100644
> > > --- a/tools/perf/util/arm-spe-decoder/arm-spe-pkt-decoder.c
> > > +++ b/tools/perf/util/arm-spe-decoder/arm-spe-pkt-decoder.c
> > > @@ -8,13 +8,14 @@
> > >  #include <string.h>
> > >  #include <endian.h>
> > >  #include <byteswap.h>
> > > +#include <linux/bits.h>
> > >  
> > >  #include "arm-spe-pkt-decoder.h"
> > >  
> > > -#define BIT(n)		(1ULL << (n))
> > > -
> > >  #define NS_FLAG		BIT(63)
> > >  #define EL_FLAG		(BIT(62) | BIT(61))
> > > +#define CH_FLAG		BIT(62)
> > > +#define PAT_FLAG	GENMASK_ULL(59, 56)
> > >  
> > >  #define SPE_HEADER0_PAD			0x0
> > >  #define SPE_HEADER0_END			0x1
> > > @@ -447,10 +448,16 @@ int arm_spe_pkt_desc(const struct arm_spe_pkt *packet, char *buf,
> > >  			return snprintf(buf, buf_len, "%s 0x%llx el%d ns=%d",
> > >  				        (idx == 1) ? "TGT" : "PC", payload, el, ns);
> > >  		case 2:	return snprintf(buf, buf_len, "VA 0x%llx", payload);
> > > -		case 3:	ns = !!(packet->payload & NS_FLAG);
> > > +		case 3:	{
> > > +			int ch = !!(packet->payload & CH_FLAG);
> > > +			int pat = (packet->payload & PAT_FLAG) >> 56;
> > > +
> > > +			ns = !!(packet->payload & NS_FLAG);
> > >  			payload &= ~(0xffULL << 56);
> > > -			return snprintf(buf, buf_len, "PA 0x%llx ns=%d",
> > > -					payload, ns);
> > > +			return snprintf(buf, buf_len,
> > > +					"PA 0x%llx ns=%d ch=%d, pat=%x",
> > > +					payload, ns, ch, pat);
> > > +			}
> > 
> > Reviewed-by: Leo Yan <leo.yan@linaro.org>
> 
> Thanks, applied.

I take that back, I'm applying Leo's series that Andre reviewed instead.

- Arnaldo

^ permalink raw reply	[flat|nested] 22+ messages in thread

end of thread, other threads:[~2020-10-13 14:52 UTC | newest]

Thread overview: 22+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2020-09-22 10:12 [PATCH 0/5] perf: arm64: Support ARMv8.3-SPE extensions Andre Przywara
2020-09-22 10:12 ` [PATCH 1/5] arm64: spe: Allow new bits in SPE filter register Andre Przywara
2020-09-27  2:51   ` Leo Yan
2020-09-22 10:12 ` [PATCH 2/5] perf: arm_spe: Add new event packet bits Andre Przywara
2020-09-27  3:03   ` Leo Yan
2020-09-22 10:12 ` [PATCH 3/5] perf: arm_spe: Add nested virt event decoding Andre Przywara
2020-09-27  3:11   ` Leo Yan
2020-09-22 10:12 ` [PATCH 4/5] perf: arm_spe: Decode memory tagging properties Andre Przywara
2020-09-27  3:19   ` Leo Yan
     [not found]     ` <20201013145103.GE1063281@kernel.org>
2020-10-13 14:52       ` Arnaldo Carvalho de Melo
2020-09-22 10:12 ` [PATCH 5/5] perf: arm_spe: Decode SVE events Andre Przywara
2020-09-27  3:30   ` Leo Yan
2020-09-28 10:15     ` André Przywara
2020-09-28 11:08       ` Leo Yan
2020-09-28 13:21   ` Dave Martin
2020-09-28 13:59     ` André Przywara
2020-09-28 14:47       ` Dave Martin
2020-09-29  2:19         ` Leo Yan
2020-09-29 14:03           ` Dave Martin
2020-09-30 10:34           ` Dave Martin
2020-09-30 11:04             ` Leo Yan
2020-10-05 10:15               ` Dave Martin

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).