All of lore.kernel.org
 help / color / mirror / Atom feed
* [PATCH v2 0/2] perf: arm64: Kernel support for Dwarf unwinding through SVE functions
@ 2022-05-17 10:07 ` James Clark
  0 siblings, 0 replies; 16+ messages in thread
From: James Clark @ 2022-05-17 10:07 UTC (permalink / raw)
  To: linux-arm-kernel, linux-kernel, linux-perf-users, broonie, acme
  Cc: german.gomez, leo.yan, mathieu.poirier, john.garry, James Clark,
	Catalin Marinas, Will Deacon, Jonathan Corbet, Mark Rutland,
	linux-doc

Changes since v1:

  * Add Mark's review tag
  * Clarify in docs that it's the SVE register length
  * Split patchset into kernel side and Perf tool changes

When SVE registers are pushed onto the stack the VG register is required to
unwind because the stack offsets would vary by the SVE register width at the
time when the sample was taken.

These first two patches add support for sampling the VG register to the kernel
and the docs. There is another patchset to add support to userspace perf.

A small change is also required to libunwind or libdw depending on which
unwinder is used, and these will be published later. Without these changes Perf
continues to work with both libraries, although the VG register is still not
used for unwinding. 

Thanks
James

James Clark (2):
  perf: arm64: Add SVE vector granule register to user regs
  arm64/sve: Add Perf extensions documentation

 Documentation/arm64/sve.rst             | 20 +++++++++++++++++
 arch/arm64/include/uapi/asm/perf_regs.h |  7 +++++-
 arch/arm64/kernel/perf_regs.c           | 30 +++++++++++++++++++++++--
 drivers/perf/arm_pmu.c                  |  2 +-
 4 files changed, 55 insertions(+), 4 deletions(-)

-- 
2.28.0


_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel

^ permalink raw reply	[flat|nested] 16+ messages in thread

* [PATCH v2 0/2] perf: arm64: Kernel support for Dwarf unwinding through SVE functions
@ 2022-05-17 10:07 ` James Clark
  0 siblings, 0 replies; 16+ messages in thread
From: James Clark @ 2022-05-17 10:07 UTC (permalink / raw)
  To: linux-arm-kernel, linux-kernel, linux-perf-users, broonie, acme
  Cc: german.gomez, leo.yan, mathieu.poirier, john.garry, James Clark,
	Catalin Marinas, Will Deacon, Jonathan Corbet, Mark Rutland,
	linux-doc

Changes since v1:

  * Add Mark's review tag
  * Clarify in docs that it's the SVE register length
  * Split patchset into kernel side and Perf tool changes

When SVE registers are pushed onto the stack the VG register is required to
unwind because the stack offsets would vary by the SVE register width at the
time when the sample was taken.

These first two patches add support for sampling the VG register to the kernel
and the docs. There is another patchset to add support to userspace perf.

A small change is also required to libunwind or libdw depending on which
unwinder is used, and these will be published later. Without these changes Perf
continues to work with both libraries, although the VG register is still not
used for unwinding. 

Thanks
James

James Clark (2):
  perf: arm64: Add SVE vector granule register to user regs
  arm64/sve: Add Perf extensions documentation

 Documentation/arm64/sve.rst             | 20 +++++++++++++++++
 arch/arm64/include/uapi/asm/perf_regs.h |  7 +++++-
 arch/arm64/kernel/perf_regs.c           | 30 +++++++++++++++++++++++--
 drivers/perf/arm_pmu.c                  |  2 +-
 4 files changed, 55 insertions(+), 4 deletions(-)

-- 
2.28.0


^ permalink raw reply	[flat|nested] 16+ messages in thread

* [PATCH v2 1/2] perf: arm64: Add SVE vector granule register to user regs
  2022-05-17 10:07 ` James Clark
@ 2022-05-17 10:07   ` James Clark
  -1 siblings, 0 replies; 16+ messages in thread
From: James Clark @ 2022-05-17 10:07 UTC (permalink / raw)
  To: linux-arm-kernel, linux-kernel, linux-perf-users, broonie, acme
  Cc: german.gomez, leo.yan, mathieu.poirier, john.garry, James Clark,
	Catalin Marinas, Will Deacon, Jonathan Corbet, Mark Rutland,
	linux-doc

Dwarf based unwinding in a function that pushes SVE registers onto
the stack requires the unwinder to know the length of the SVE register
to calculate the stack offsets correctly. This was added to the Arm
specific Dwarf spec as the VG pseudo register[1].

Add the vector length at position 46 if it's requested by userspace and
SVE is supported. If it's not supported then fail to open the event.

The vector length must be on each sample because it can be changed
at runtime via a prctl or ptrace call. Also by adding it as a register
rather than a separate attribute, minimal changes will be required in an
unwinder that already indexes into the register list.

[1]: https://github.com/ARM-software/abi-aa/blob/main/aadwarf64/aadwarf64.rst

Reviewed-by: Mark Brown <broonie@kernel.org>
Signed-off-by: James Clark <james.clark@arm.com>
---
 arch/arm64/include/uapi/asm/perf_regs.h |  7 +++++-
 arch/arm64/kernel/perf_regs.c           | 30 +++++++++++++++++++++++--
 drivers/perf/arm_pmu.c                  |  2 +-
 3 files changed, 35 insertions(+), 4 deletions(-)

diff --git a/arch/arm64/include/uapi/asm/perf_regs.h b/arch/arm64/include/uapi/asm/perf_regs.h
index d54daafa89e3..fd157f46727e 100644
--- a/arch/arm64/include/uapi/asm/perf_regs.h
+++ b/arch/arm64/include/uapi/asm/perf_regs.h
@@ -36,6 +36,11 @@ enum perf_event_arm_regs {
 	PERF_REG_ARM64_LR,
 	PERF_REG_ARM64_SP,
 	PERF_REG_ARM64_PC,
-	PERF_REG_ARM64_MAX,
+
+	/* Extended/pseudo registers */
+	PERF_REG_ARM64_VG = 46, // SVE Vector Granule
+
+	PERF_REG_ARM64_MAX = PERF_REG_ARM64_PC + 1,
+	PERF_REG_ARM64_EXTENDED_MAX = PERF_REG_ARM64_VG + 1
 };
 #endif /* _ASM_ARM64_PERF_REGS_H */
diff --git a/arch/arm64/kernel/perf_regs.c b/arch/arm64/kernel/perf_regs.c
index f6f58e6265df..b4eece3eb17d 100644
--- a/arch/arm64/kernel/perf_regs.c
+++ b/arch/arm64/kernel/perf_regs.c
@@ -9,9 +9,27 @@
 #include <asm/perf_regs.h>
 #include <asm/ptrace.h>
 
+static u64 perf_ext_regs_value(int idx)
+{
+	switch (idx) {
+	case PERF_REG_ARM64_VG:
+		if (WARN_ON_ONCE(!system_supports_sve()))
+			return 0;
+
+		/*
+		 * Vector granule is current length in bits of SVE registers
+		 * divided by 64.
+		 */
+		return (task_get_sve_vl(current) * 8) / 64;
+	default:
+		WARN_ON_ONCE(true);
+		return 0;
+	}
+}
+
 u64 perf_reg_value(struct pt_regs *regs, int idx)
 {
-	if (WARN_ON_ONCE((u32)idx >= PERF_REG_ARM64_MAX))
+	if (WARN_ON_ONCE((u32)idx >= PERF_REG_ARM64_EXTENDED_MAX))
 		return 0;
 
 	/*
@@ -51,6 +69,9 @@ u64 perf_reg_value(struct pt_regs *regs, int idx)
 	if ((u32)idx == PERF_REG_ARM64_PC)
 		return regs->pc;
 
+	if ((u32)idx >= PERF_REG_ARM64_MAX)
+		return perf_ext_regs_value(idx);
+
 	return regs->regs[idx];
 }
 
@@ -58,7 +79,12 @@ u64 perf_reg_value(struct pt_regs *regs, int idx)
 
 int perf_reg_validate(u64 mask)
 {
-	if (!mask || mask & REG_RESERVED)
+	u64 reserved_mask = REG_RESERVED;
+
+	if (system_supports_sve())
+		reserved_mask &= ~(1ULL << PERF_REG_ARM64_VG);
+
+	if (!mask || mask & reserved_mask)
 		return -EINVAL;
 
 	return 0;
diff --git a/drivers/perf/arm_pmu.c b/drivers/perf/arm_pmu.c
index 59d3980b8ca2..3f07df5a7e95 100644
--- a/drivers/perf/arm_pmu.c
+++ b/drivers/perf/arm_pmu.c
@@ -894,7 +894,7 @@ static struct arm_pmu *__armpmu_alloc(gfp_t flags)
 		 * pmu::filter_match callback and pmu::event_init group
 		 * validation).
 		 */
-		.capabilities	= PERF_PMU_CAP_HETEROGENEOUS_CPUS,
+		.capabilities	= PERF_PMU_CAP_HETEROGENEOUS_CPUS | PERF_PMU_CAP_EXTENDED_REGS,
 	};
 
 	pmu->attr_groups[ARMPMU_ATTR_GROUP_COMMON] =
-- 
2.28.0


_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel

^ permalink raw reply related	[flat|nested] 16+ messages in thread

* [PATCH v2 1/2] perf: arm64: Add SVE vector granule register to user regs
@ 2022-05-17 10:07   ` James Clark
  0 siblings, 0 replies; 16+ messages in thread
From: James Clark @ 2022-05-17 10:07 UTC (permalink / raw)
  To: linux-arm-kernel, linux-kernel, linux-perf-users, broonie, acme
  Cc: german.gomez, leo.yan, mathieu.poirier, john.garry, James Clark,
	Catalin Marinas, Will Deacon, Jonathan Corbet, Mark Rutland,
	linux-doc

Dwarf based unwinding in a function that pushes SVE registers onto
the stack requires the unwinder to know the length of the SVE register
to calculate the stack offsets correctly. This was added to the Arm
specific Dwarf spec as the VG pseudo register[1].

Add the vector length at position 46 if it's requested by userspace and
SVE is supported. If it's not supported then fail to open the event.

The vector length must be on each sample because it can be changed
at runtime via a prctl or ptrace call. Also by adding it as a register
rather than a separate attribute, minimal changes will be required in an
unwinder that already indexes into the register list.

[1]: https://github.com/ARM-software/abi-aa/blob/main/aadwarf64/aadwarf64.rst

Reviewed-by: Mark Brown <broonie@kernel.org>
Signed-off-by: James Clark <james.clark@arm.com>
---
 arch/arm64/include/uapi/asm/perf_regs.h |  7 +++++-
 arch/arm64/kernel/perf_regs.c           | 30 +++++++++++++++++++++++--
 drivers/perf/arm_pmu.c                  |  2 +-
 3 files changed, 35 insertions(+), 4 deletions(-)

diff --git a/arch/arm64/include/uapi/asm/perf_regs.h b/arch/arm64/include/uapi/asm/perf_regs.h
index d54daafa89e3..fd157f46727e 100644
--- a/arch/arm64/include/uapi/asm/perf_regs.h
+++ b/arch/arm64/include/uapi/asm/perf_regs.h
@@ -36,6 +36,11 @@ enum perf_event_arm_regs {
 	PERF_REG_ARM64_LR,
 	PERF_REG_ARM64_SP,
 	PERF_REG_ARM64_PC,
-	PERF_REG_ARM64_MAX,
+
+	/* Extended/pseudo registers */
+	PERF_REG_ARM64_VG = 46, // SVE Vector Granule
+
+	PERF_REG_ARM64_MAX = PERF_REG_ARM64_PC + 1,
+	PERF_REG_ARM64_EXTENDED_MAX = PERF_REG_ARM64_VG + 1
 };
 #endif /* _ASM_ARM64_PERF_REGS_H */
diff --git a/arch/arm64/kernel/perf_regs.c b/arch/arm64/kernel/perf_regs.c
index f6f58e6265df..b4eece3eb17d 100644
--- a/arch/arm64/kernel/perf_regs.c
+++ b/arch/arm64/kernel/perf_regs.c
@@ -9,9 +9,27 @@
 #include <asm/perf_regs.h>
 #include <asm/ptrace.h>
 
+static u64 perf_ext_regs_value(int idx)
+{
+	switch (idx) {
+	case PERF_REG_ARM64_VG:
+		if (WARN_ON_ONCE(!system_supports_sve()))
+			return 0;
+
+		/*
+		 * Vector granule is current length in bits of SVE registers
+		 * divided by 64.
+		 */
+		return (task_get_sve_vl(current) * 8) / 64;
+	default:
+		WARN_ON_ONCE(true);
+		return 0;
+	}
+}
+
 u64 perf_reg_value(struct pt_regs *regs, int idx)
 {
-	if (WARN_ON_ONCE((u32)idx >= PERF_REG_ARM64_MAX))
+	if (WARN_ON_ONCE((u32)idx >= PERF_REG_ARM64_EXTENDED_MAX))
 		return 0;
 
 	/*
@@ -51,6 +69,9 @@ u64 perf_reg_value(struct pt_regs *regs, int idx)
 	if ((u32)idx == PERF_REG_ARM64_PC)
 		return regs->pc;
 
+	if ((u32)idx >= PERF_REG_ARM64_MAX)
+		return perf_ext_regs_value(idx);
+
 	return regs->regs[idx];
 }
 
@@ -58,7 +79,12 @@ u64 perf_reg_value(struct pt_regs *regs, int idx)
 
 int perf_reg_validate(u64 mask)
 {
-	if (!mask || mask & REG_RESERVED)
+	u64 reserved_mask = REG_RESERVED;
+
+	if (system_supports_sve())
+		reserved_mask &= ~(1ULL << PERF_REG_ARM64_VG);
+
+	if (!mask || mask & reserved_mask)
 		return -EINVAL;
 
 	return 0;
diff --git a/drivers/perf/arm_pmu.c b/drivers/perf/arm_pmu.c
index 59d3980b8ca2..3f07df5a7e95 100644
--- a/drivers/perf/arm_pmu.c
+++ b/drivers/perf/arm_pmu.c
@@ -894,7 +894,7 @@ static struct arm_pmu *__armpmu_alloc(gfp_t flags)
 		 * pmu::filter_match callback and pmu::event_init group
 		 * validation).
 		 */
-		.capabilities	= PERF_PMU_CAP_HETEROGENEOUS_CPUS,
+		.capabilities	= PERF_PMU_CAP_HETEROGENEOUS_CPUS | PERF_PMU_CAP_EXTENDED_REGS,
 	};
 
 	pmu->attr_groups[ARMPMU_ATTR_GROUP_COMMON] =
-- 
2.28.0


^ permalink raw reply related	[flat|nested] 16+ messages in thread

* [PATCH v2 2/2] arm64/sve: Add Perf extensions documentation
  2022-05-17 10:07 ` James Clark
@ 2022-05-17 10:07   ` James Clark
  -1 siblings, 0 replies; 16+ messages in thread
From: James Clark @ 2022-05-17 10:07 UTC (permalink / raw)
  To: linux-arm-kernel, linux-kernel, linux-perf-users, broonie, acme
  Cc: german.gomez, leo.yan, mathieu.poirier, john.garry, James Clark,
	Catalin Marinas, Will Deacon, Jonathan Corbet, Mark Rutland,
	linux-doc

Document that the VG register is available in Perf samples

Signed-off-by: James Clark <james.clark@arm.com>
---
 Documentation/arm64/sve.rst | 20 ++++++++++++++++++++
 1 file changed, 20 insertions(+)

diff --git a/Documentation/arm64/sve.rst b/Documentation/arm64/sve.rst
index 9d9a4de5bc34..b34a1467057f 100644
--- a/Documentation/arm64/sve.rst
+++ b/Documentation/arm64/sve.rst
@@ -402,6 +402,24 @@ The regset data starts with struct user_sve_header, containing:
 * Modifying the system default vector length does not affect the vector length
   of any existing process or thread that does not make an execve() call.
 
+10.  Perf extensions
+--------------------------------
+
+* The arm64 specific DWARF standard [5] added the VG (Vector Granule) register
+  at index 46. This register is used for DWARF unwinding when variable length
+  SVE registers are pushed onto the stack.
+
+* Its value is equivalent to the current SVE vector length (VL) in bits divided
+  by 64.
+
+* The value is included in Perf samples in the regs[46] field if
+  PERF_SAMPLE_REGS_USER is set and the sample_regs_user mask has bit 46 set.
+
+* The value is the current value at the time the sample was taken, and it can
+  change over time.
+
+* If the system doesn't support SVE when perf_event_open is called with these
+  settings, the event will fail to open.
 
 Appendix A.  SVE programmer's model (informative)
 =================================================
@@ -543,3 +561,5 @@ References
     http://infocenter.arm.com/help/topic/com.arm.doc.ihi0055c/IHI0055C_beta_aapcs64.pdf
     http://infocenter.arm.com/help/topic/com.arm.doc.subset.swdev.abi/index.html
     Procedure Call Standard for the ARM 64-bit Architecture (AArch64)
+
+[5] https://github.com/ARM-software/abi-aa/blob/main/aadwarf64/aadwarf64.rst
-- 
2.28.0


_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel

^ permalink raw reply related	[flat|nested] 16+ messages in thread

* [PATCH v2 2/2] arm64/sve: Add Perf extensions documentation
@ 2022-05-17 10:07   ` James Clark
  0 siblings, 0 replies; 16+ messages in thread
From: James Clark @ 2022-05-17 10:07 UTC (permalink / raw)
  To: linux-arm-kernel, linux-kernel, linux-perf-users, broonie, acme
  Cc: german.gomez, leo.yan, mathieu.poirier, john.garry, James Clark,
	Catalin Marinas, Will Deacon, Jonathan Corbet, Mark Rutland,
	linux-doc

Document that the VG register is available in Perf samples

Signed-off-by: James Clark <james.clark@arm.com>
---
 Documentation/arm64/sve.rst | 20 ++++++++++++++++++++
 1 file changed, 20 insertions(+)

diff --git a/Documentation/arm64/sve.rst b/Documentation/arm64/sve.rst
index 9d9a4de5bc34..b34a1467057f 100644
--- a/Documentation/arm64/sve.rst
+++ b/Documentation/arm64/sve.rst
@@ -402,6 +402,24 @@ The regset data starts with struct user_sve_header, containing:
 * Modifying the system default vector length does not affect the vector length
   of any existing process or thread that does not make an execve() call.
 
+10.  Perf extensions
+--------------------------------
+
+* The arm64 specific DWARF standard [5] added the VG (Vector Granule) register
+  at index 46. This register is used for DWARF unwinding when variable length
+  SVE registers are pushed onto the stack.
+
+* Its value is equivalent to the current SVE vector length (VL) in bits divided
+  by 64.
+
+* The value is included in Perf samples in the regs[46] field if
+  PERF_SAMPLE_REGS_USER is set and the sample_regs_user mask has bit 46 set.
+
+* The value is the current value at the time the sample was taken, and it can
+  change over time.
+
+* If the system doesn't support SVE when perf_event_open is called with these
+  settings, the event will fail to open.
 
 Appendix A.  SVE programmer's model (informative)
 =================================================
@@ -543,3 +561,5 @@ References
     http://infocenter.arm.com/help/topic/com.arm.doc.ihi0055c/IHI0055C_beta_aapcs64.pdf
     http://infocenter.arm.com/help/topic/com.arm.doc.subset.swdev.abi/index.html
     Procedure Call Standard for the ARM 64-bit Architecture (AArch64)
+
+[5] https://github.com/ARM-software/abi-aa/blob/main/aadwarf64/aadwarf64.rst
-- 
2.28.0


^ permalink raw reply related	[flat|nested] 16+ messages in thread

* Re: [PATCH v2 2/2] arm64/sve: Add Perf extensions documentation
  2022-05-17 10:07   ` James Clark
@ 2022-05-17 11:07     ` Mark Brown
  -1 siblings, 0 replies; 16+ messages in thread
From: Mark Brown @ 2022-05-17 11:07 UTC (permalink / raw)
  To: James Clark
  Cc: linux-arm-kernel, linux-kernel, linux-perf-users, acme,
	german.gomez, leo.yan, mathieu.poirier, john.garry,
	Catalin Marinas, Will Deacon, Jonathan Corbet, Mark Rutland,
	linux-doc

[-- Attachment #1: Type: text/plain, Size: 228 bytes --]

On Tue, May 17, 2022 at 11:07:43AM +0100, James Clark wrote:
> Document that the VG register is available in Perf samples
> 
> Signed-off-by: James Clark <james.clark@arm.com>

Reviewed-by: Mark Brown <broonie@kernel.org>

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 488 bytes --]

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [PATCH v2 2/2] arm64/sve: Add Perf extensions documentation
@ 2022-05-17 11:07     ` Mark Brown
  0 siblings, 0 replies; 16+ messages in thread
From: Mark Brown @ 2022-05-17 11:07 UTC (permalink / raw)
  To: James Clark
  Cc: linux-arm-kernel, linux-kernel, linux-perf-users, acme,
	german.gomez, leo.yan, mathieu.poirier, john.garry,
	Catalin Marinas, Will Deacon, Jonathan Corbet, Mark Rutland,
	linux-doc


[-- Attachment #1.1: Type: text/plain, Size: 228 bytes --]

On Tue, May 17, 2022 at 11:07:43AM +0100, James Clark wrote:
> Document that the VG register is available in Perf samples
> 
> Signed-off-by: James Clark <james.clark@arm.com>

Reviewed-by: Mark Brown <broonie@kernel.org>

[-- Attachment #1.2: signature.asc --]
[-- Type: application/pgp-signature, Size: 488 bytes --]

[-- Attachment #2: Type: text/plain, Size: 176 bytes --]

_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [PATCH v2 0/2] perf: arm64: Kernel support for Dwarf unwinding through SVE functions
  2022-05-17 10:07 ` James Clark
@ 2022-06-04  2:07   ` Leo Yan
  -1 siblings, 0 replies; 16+ messages in thread
From: Leo Yan @ 2022-06-04  2:07 UTC (permalink / raw)
  To: James Clark
  Cc: linux-arm-kernel, linux-kernel, linux-perf-users, broonie, acme,
	german.gomez, mathieu.poirier, john.garry, Catalin Marinas,
	Will Deacon, Jonathan Corbet, Mark Rutland, linux-doc

On Tue, May 17, 2022 at 11:07:41AM +0100, James Clark wrote:
> Changes since v1:
> 
>   * Add Mark's review tag
>   * Clarify in docs that it's the SVE register length
>   * Split patchset into kernel side and Perf tool changes
> 
> When SVE registers are pushed onto the stack the VG register is required to
> unwind because the stack offsets would vary by the SVE register width at the
> time when the sample was taken.
> 
> These first two patches add support for sampling the VG register to the kernel
> and the docs. There is another patchset to add support to userspace perf.

Hi Catalin, Will,

Since James is on vacation, just want to ping if you could pick up
this two patches?  Mark.B has given review tags for this patch set.

I did this is because there has another patch set in perf tool to
enable SVE registsers [1], which is dependent on this patch set's
merging.

Thanks,
Leo

[1] https://lore.kernel.org/lkml/20220525154114.718321-1-james.clark@arm.com/

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [PATCH v2 0/2] perf: arm64: Kernel support for Dwarf unwinding through SVE functions
@ 2022-06-04  2:07   ` Leo Yan
  0 siblings, 0 replies; 16+ messages in thread
From: Leo Yan @ 2022-06-04  2:07 UTC (permalink / raw)
  To: James Clark
  Cc: linux-arm-kernel, linux-kernel, linux-perf-users, broonie, acme,
	german.gomez, mathieu.poirier, john.garry, Catalin Marinas,
	Will Deacon, Jonathan Corbet, Mark Rutland, linux-doc

On Tue, May 17, 2022 at 11:07:41AM +0100, James Clark wrote:
> Changes since v1:
> 
>   * Add Mark's review tag
>   * Clarify in docs that it's the SVE register length
>   * Split patchset into kernel side and Perf tool changes
> 
> When SVE registers are pushed onto the stack the VG register is required to
> unwind because the stack offsets would vary by the SVE register width at the
> time when the sample was taken.
> 
> These first two patches add support for sampling the VG register to the kernel
> and the docs. There is another patchset to add support to userspace perf.

Hi Catalin, Will,

Since James is on vacation, just want to ping if you could pick up
this two patches?  Mark.B has given review tags for this patch set.

I did this is because there has another patch set in perf tool to
enable SVE registsers [1], which is dependent on this patch set's
merging.

Thanks,
Leo

[1] https://lore.kernel.org/lkml/20220525154114.718321-1-james.clark@arm.com/

_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [PATCH v2 0/2] perf: arm64: Kernel support for Dwarf unwinding through SVE functions
  2022-06-04  2:07   ` Leo Yan
@ 2022-06-04  2:14     ` Leo Yan
  -1 siblings, 0 replies; 16+ messages in thread
From: Leo Yan @ 2022-06-04  2:14 UTC (permalink / raw)
  To: James Clark
  Cc: linux-arm-kernel, linux-kernel, linux-perf-users, broonie, acme,
	german.gomez, mathieu.poirier, john.garry, Catalin Marinas,
	Will Deacon, Jonathan Corbet, Mark Rutland, linux-doc

On Sat, Jun 04, 2022 at 10:07:21AM +0800, Leo Yan wrote:
> On Tue, May 17, 2022 at 11:07:41AM +0100, James Clark wrote:
> > Changes since v1:
> > 
> >   * Add Mark's review tag
> >   * Clarify in docs that it's the SVE register length
> >   * Split patchset into kernel side and Perf tool changes
> > 
> > When SVE registers are pushed onto the stack the VG register is required to
> > unwind because the stack offsets would vary by the SVE register width at the
> > time when the sample was taken.
> > 
> > These first two patches add support for sampling the VG register to the kernel
> > and the docs. There is another patchset to add support to userspace perf.
> 
> Hi Catalin, Will,
> 
> Since James is on vacation, just want to ping if you could pick up
> this two patches?  Mark.B has given review tags for this patch set.

Sorry, or the kernel patch should be via arm tree and the document patch
via Jonathan's tree?

> I did this is because there has another patch set in perf tool to
> enable SVE registsers [1], which is dependent on this patch set's
> merging.
> 
> Thanks,
> Leo
> 
> [1] https://lore.kernel.org/lkml/20220525154114.718321-1-james.clark@arm.com/

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [PATCH v2 0/2] perf: arm64: Kernel support for Dwarf unwinding through SVE functions
@ 2022-06-04  2:14     ` Leo Yan
  0 siblings, 0 replies; 16+ messages in thread
From: Leo Yan @ 2022-06-04  2:14 UTC (permalink / raw)
  To: James Clark
  Cc: linux-arm-kernel, linux-kernel, linux-perf-users, broonie, acme,
	german.gomez, mathieu.poirier, john.garry, Catalin Marinas,
	Will Deacon, Jonathan Corbet, Mark Rutland, linux-doc

On Sat, Jun 04, 2022 at 10:07:21AM +0800, Leo Yan wrote:
> On Tue, May 17, 2022 at 11:07:41AM +0100, James Clark wrote:
> > Changes since v1:
> > 
> >   * Add Mark's review tag
> >   * Clarify in docs that it's the SVE register length
> >   * Split patchset into kernel side and Perf tool changes
> > 
> > When SVE registers are pushed onto the stack the VG register is required to
> > unwind because the stack offsets would vary by the SVE register width at the
> > time when the sample was taken.
> > 
> > These first two patches add support for sampling the VG register to the kernel
> > and the docs. There is another patchset to add support to userspace perf.
> 
> Hi Catalin, Will,
> 
> Since James is on vacation, just want to ping if you could pick up
> this two patches?  Mark.B has given review tags for this patch set.

Sorry, or the kernel patch should be via arm tree and the document patch
via Jonathan's tree?

> I did this is because there has another patch set in perf tool to
> enable SVE registsers [1], which is dependent on this patch set's
> merging.
> 
> Thanks,
> Leo
> 
> [1] https://lore.kernel.org/lkml/20220525154114.718321-1-james.clark@arm.com/

_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [PATCH v2 1/2] perf: arm64: Add SVE vector granule register to user regs
  2022-05-17 10:07   ` James Clark
@ 2022-06-27 11:13     ` Will Deacon
  -1 siblings, 0 replies; 16+ messages in thread
From: Will Deacon @ 2022-06-27 11:13 UTC (permalink / raw)
  To: James Clark
  Cc: linux-arm-kernel, linux-kernel, linux-perf-users, broonie, acme,
	german.gomez, leo.yan, mathieu.poirier, john.garry,
	Catalin Marinas, Jonathan Corbet, Mark Rutland, linux-doc

On Tue, May 17, 2022 at 11:07:42AM +0100, James Clark wrote:
> Dwarf based unwinding in a function that pushes SVE registers onto
> the stack requires the unwinder to know the length of the SVE register
> to calculate the stack offsets correctly. This was added to the Arm
> specific Dwarf spec as the VG pseudo register[1].
> 
> Add the vector length at position 46 if it's requested by userspace and
> SVE is supported. If it's not supported then fail to open the event.
> 
> The vector length must be on each sample because it can be changed
> at runtime via a prctl or ptrace call. Also by adding it as a register
> rather than a separate attribute, minimal changes will be required in an
> unwinder that already indexes into the register list.
> 
> [1]: https://github.com/ARM-software/abi-aa/blob/main/aadwarf64/aadwarf64.rst
> 
> Reviewed-by: Mark Brown <broonie@kernel.org>
> Signed-off-by: James Clark <james.clark@arm.com>
> ---
>  arch/arm64/include/uapi/asm/perf_regs.h |  7 +++++-
>  arch/arm64/kernel/perf_regs.c           | 30 +++++++++++++++++++++++--
>  drivers/perf/arm_pmu.c                  |  2 +-
>  3 files changed, 35 insertions(+), 4 deletions(-)
> 
> diff --git a/arch/arm64/include/uapi/asm/perf_regs.h b/arch/arm64/include/uapi/asm/perf_regs.h
> index d54daafa89e3..fd157f46727e 100644
> --- a/arch/arm64/include/uapi/asm/perf_regs.h
> +++ b/arch/arm64/include/uapi/asm/perf_regs.h
> @@ -36,6 +36,11 @@ enum perf_event_arm_regs {
>  	PERF_REG_ARM64_LR,
>  	PERF_REG_ARM64_SP,
>  	PERF_REG_ARM64_PC,
> -	PERF_REG_ARM64_MAX,
> +
> +	/* Extended/pseudo registers */
> +	PERF_REG_ARM64_VG = 46, // SVE Vector Granule
> +
> +	PERF_REG_ARM64_MAX = PERF_REG_ARM64_PC + 1,
> +	PERF_REG_ARM64_EXTENDED_MAX = PERF_REG_ARM64_VG + 1

I think you can leave PERF_REG_ARM64_MAX alone and just add:

	PERF_REG_ARM64_VG = 46,
	PERF_REG_ARM64_EXTENDED_MAX,

no?

>  };
>  #endif /* _ASM_ARM64_PERF_REGS_H */
> diff --git a/arch/arm64/kernel/perf_regs.c b/arch/arm64/kernel/perf_regs.c
> index f6f58e6265df..b4eece3eb17d 100644
> --- a/arch/arm64/kernel/perf_regs.c
> +++ b/arch/arm64/kernel/perf_regs.c
> @@ -9,9 +9,27 @@
>  #include <asm/perf_regs.h>
>  #include <asm/ptrace.h>
>  
> +static u64 perf_ext_regs_value(int idx)
> +{
> +	switch (idx) {
> +	case PERF_REG_ARM64_VG:
> +		if (WARN_ON_ONCE(!system_supports_sve()))
> +			return 0;
> +
> +		/*
> +		 * Vector granule is current length in bits of SVE registers
> +		 * divided by 64.
> +		 */
> +		return (task_get_sve_vl(current) * 8) / 64;

Is 'current' the right thing to use here? We pass the regs everywhere else,
so I'd prefer to stick with that if possible.

> +	default:
> +		WARN_ON_ONCE(true);
> +		return 0;
> +	}
> +}
> +
>  u64 perf_reg_value(struct pt_regs *regs, int idx)
>  {
> -	if (WARN_ON_ONCE((u32)idx >= PERF_REG_ARM64_MAX))
> +	if (WARN_ON_ONCE((u32)idx >= PERF_REG_ARM64_EXTENDED_MAX))
>  		return 0;
>  
>  	/*
> @@ -51,6 +69,9 @@ u64 perf_reg_value(struct pt_regs *regs, int idx)
>  	if ((u32)idx == PERF_REG_ARM64_PC)
>  		return regs->pc;
>  
> +	if ((u32)idx >= PERF_REG_ARM64_MAX)
> +		return perf_ext_regs_value(idx);
> +
>  	return regs->regs[idx];
>  }
>  
> @@ -58,7 +79,12 @@ u64 perf_reg_value(struct pt_regs *regs, int idx)
>  
>  int perf_reg_validate(u64 mask)
>  {
> -	if (!mask || mask & REG_RESERVED)
> +	u64 reserved_mask = REG_RESERVED;
> +
> +	if (system_supports_sve())
> +		reserved_mask &= ~(1ULL << PERF_REG_ARM64_VG);
> +
> +	if (!mask || mask & reserved_mask)
>  		return -EINVAL;
>  
>  	return 0;
> diff --git a/drivers/perf/arm_pmu.c b/drivers/perf/arm_pmu.c
> index 59d3980b8ca2..3f07df5a7e95 100644
> --- a/drivers/perf/arm_pmu.c
> +++ b/drivers/perf/arm_pmu.c
> @@ -894,7 +894,7 @@ static struct arm_pmu *__armpmu_alloc(gfp_t flags)
>  		 * pmu::filter_match callback and pmu::event_init group
>  		 * validation).
>  		 */
> -		.capabilities	= PERF_PMU_CAP_HETEROGENEOUS_CPUS,
> +		.capabilities	= PERF_PMU_CAP_HETEROGENEOUS_CPUS | PERF_PMU_CAP_EXTENDED_REGS,

How does userspace know this capability is available? Should we advertise
the set of extended registers that we support, rather than make this a
one-trick pony for the vector length?

Also, you don't appear to #define PERF_REG_EXTENDED_MASK so I don't
understand how userspace is supposed to interact with this. Won't
has_extended_regs() always return false?

Will

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [PATCH v2 1/2] perf: arm64: Add SVE vector granule register to user regs
@ 2022-06-27 11:13     ` Will Deacon
  0 siblings, 0 replies; 16+ messages in thread
From: Will Deacon @ 2022-06-27 11:13 UTC (permalink / raw)
  To: James Clark
  Cc: linux-arm-kernel, linux-kernel, linux-perf-users, broonie, acme,
	german.gomez, leo.yan, mathieu.poirier, john.garry,
	Catalin Marinas, Jonathan Corbet, Mark Rutland, linux-doc

On Tue, May 17, 2022 at 11:07:42AM +0100, James Clark wrote:
> Dwarf based unwinding in a function that pushes SVE registers onto
> the stack requires the unwinder to know the length of the SVE register
> to calculate the stack offsets correctly. This was added to the Arm
> specific Dwarf spec as the VG pseudo register[1].
> 
> Add the vector length at position 46 if it's requested by userspace and
> SVE is supported. If it's not supported then fail to open the event.
> 
> The vector length must be on each sample because it can be changed
> at runtime via a prctl or ptrace call. Also by adding it as a register
> rather than a separate attribute, minimal changes will be required in an
> unwinder that already indexes into the register list.
> 
> [1]: https://github.com/ARM-software/abi-aa/blob/main/aadwarf64/aadwarf64.rst
> 
> Reviewed-by: Mark Brown <broonie@kernel.org>
> Signed-off-by: James Clark <james.clark@arm.com>
> ---
>  arch/arm64/include/uapi/asm/perf_regs.h |  7 +++++-
>  arch/arm64/kernel/perf_regs.c           | 30 +++++++++++++++++++++++--
>  drivers/perf/arm_pmu.c                  |  2 +-
>  3 files changed, 35 insertions(+), 4 deletions(-)
> 
> diff --git a/arch/arm64/include/uapi/asm/perf_regs.h b/arch/arm64/include/uapi/asm/perf_regs.h
> index d54daafa89e3..fd157f46727e 100644
> --- a/arch/arm64/include/uapi/asm/perf_regs.h
> +++ b/arch/arm64/include/uapi/asm/perf_regs.h
> @@ -36,6 +36,11 @@ enum perf_event_arm_regs {
>  	PERF_REG_ARM64_LR,
>  	PERF_REG_ARM64_SP,
>  	PERF_REG_ARM64_PC,
> -	PERF_REG_ARM64_MAX,
> +
> +	/* Extended/pseudo registers */
> +	PERF_REG_ARM64_VG = 46, // SVE Vector Granule
> +
> +	PERF_REG_ARM64_MAX = PERF_REG_ARM64_PC + 1,
> +	PERF_REG_ARM64_EXTENDED_MAX = PERF_REG_ARM64_VG + 1

I think you can leave PERF_REG_ARM64_MAX alone and just add:

	PERF_REG_ARM64_VG = 46,
	PERF_REG_ARM64_EXTENDED_MAX,

no?

>  };
>  #endif /* _ASM_ARM64_PERF_REGS_H */
> diff --git a/arch/arm64/kernel/perf_regs.c b/arch/arm64/kernel/perf_regs.c
> index f6f58e6265df..b4eece3eb17d 100644
> --- a/arch/arm64/kernel/perf_regs.c
> +++ b/arch/arm64/kernel/perf_regs.c
> @@ -9,9 +9,27 @@
>  #include <asm/perf_regs.h>
>  #include <asm/ptrace.h>
>  
> +static u64 perf_ext_regs_value(int idx)
> +{
> +	switch (idx) {
> +	case PERF_REG_ARM64_VG:
> +		if (WARN_ON_ONCE(!system_supports_sve()))
> +			return 0;
> +
> +		/*
> +		 * Vector granule is current length in bits of SVE registers
> +		 * divided by 64.
> +		 */
> +		return (task_get_sve_vl(current) * 8) / 64;

Is 'current' the right thing to use here? We pass the regs everywhere else,
so I'd prefer to stick with that if possible.

> +	default:
> +		WARN_ON_ONCE(true);
> +		return 0;
> +	}
> +}
> +
>  u64 perf_reg_value(struct pt_regs *regs, int idx)
>  {
> -	if (WARN_ON_ONCE((u32)idx >= PERF_REG_ARM64_MAX))
> +	if (WARN_ON_ONCE((u32)idx >= PERF_REG_ARM64_EXTENDED_MAX))
>  		return 0;
>  
>  	/*
> @@ -51,6 +69,9 @@ u64 perf_reg_value(struct pt_regs *regs, int idx)
>  	if ((u32)idx == PERF_REG_ARM64_PC)
>  		return regs->pc;
>  
> +	if ((u32)idx >= PERF_REG_ARM64_MAX)
> +		return perf_ext_regs_value(idx);
> +
>  	return regs->regs[idx];
>  }
>  
> @@ -58,7 +79,12 @@ u64 perf_reg_value(struct pt_regs *regs, int idx)
>  
>  int perf_reg_validate(u64 mask)
>  {
> -	if (!mask || mask & REG_RESERVED)
> +	u64 reserved_mask = REG_RESERVED;
> +
> +	if (system_supports_sve())
> +		reserved_mask &= ~(1ULL << PERF_REG_ARM64_VG);
> +
> +	if (!mask || mask & reserved_mask)
>  		return -EINVAL;
>  
>  	return 0;
> diff --git a/drivers/perf/arm_pmu.c b/drivers/perf/arm_pmu.c
> index 59d3980b8ca2..3f07df5a7e95 100644
> --- a/drivers/perf/arm_pmu.c
> +++ b/drivers/perf/arm_pmu.c
> @@ -894,7 +894,7 @@ static struct arm_pmu *__armpmu_alloc(gfp_t flags)
>  		 * pmu::filter_match callback and pmu::event_init group
>  		 * validation).
>  		 */
> -		.capabilities	= PERF_PMU_CAP_HETEROGENEOUS_CPUS,
> +		.capabilities	= PERF_PMU_CAP_HETEROGENEOUS_CPUS | PERF_PMU_CAP_EXTENDED_REGS,

How does userspace know this capability is available? Should we advertise
the set of extended registers that we support, rather than make this a
one-trick pony for the vector length?

Also, you don't appear to #define PERF_REG_EXTENDED_MASK so I don't
understand how userspace is supposed to interact with this. Won't
has_extended_regs() always return false?

Will

_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [PATCH v2 1/2] perf: arm64: Add SVE vector granule register to user regs
  2022-06-27 11:13     ` Will Deacon
@ 2022-07-19  8:51       ` James Clark
  -1 siblings, 0 replies; 16+ messages in thread
From: James Clark @ 2022-07-19  8:51 UTC (permalink / raw)
  To: Will Deacon
  Cc: linux-arm-kernel, linux-kernel, linux-perf-users, broonie, acme,
	german.gomez, leo.yan, mathieu.poirier, john.garry,
	Catalin Marinas, Jonathan Corbet, Mark Rutland, linux-doc,
	Suzuki K Poulose, Anshuman Khandual



On 27/06/2022 12:13, Will Deacon wrote:
> On Tue, May 17, 2022 at 11:07:42AM +0100, James Clark wrote:
>> Dwarf based unwinding in a function that pushes SVE registers onto
>> the stack requires the unwinder to know the length of the SVE register
>> to calculate the stack offsets correctly. This was added to the Arm
>> specific Dwarf spec as the VG pseudo register[1].
>>
>> Add the vector length at position 46 if it's requested by userspace and
>> SVE is supported. If it's not supported then fail to open the event.
>>
>> The vector length must be on each sample because it can be changed
>> at runtime via a prctl or ptrace call. Also by adding it as a register
>> rather than a separate attribute, minimal changes will be required in an
>> unwinder that already indexes into the register list.
>>
>> [1]: https://github.com/ARM-software/abi-aa/blob/main/aadwarf64/aadwarf64.rst
>>
>> Reviewed-by: Mark Brown <broonie@kernel.org>
>> Signed-off-by: James Clark <james.clark@arm.com>
>> ---
>>  arch/arm64/include/uapi/asm/perf_regs.h |  7 +++++-
>>  arch/arm64/kernel/perf_regs.c           | 30 +++++++++++++++++++++++--
>>  drivers/perf/arm_pmu.c                  |  2 +-
>>  3 files changed, 35 insertions(+), 4 deletions(-)
>>
>> diff --git a/arch/arm64/include/uapi/asm/perf_regs.h b/arch/arm64/include/uapi/asm/perf_regs.h
>> index d54daafa89e3..fd157f46727e 100644
>> --- a/arch/arm64/include/uapi/asm/perf_regs.h
>> +++ b/arch/arm64/include/uapi/asm/perf_regs.h
>> @@ -36,6 +36,11 @@ enum perf_event_arm_regs {
>>  	PERF_REG_ARM64_LR,
>>  	PERF_REG_ARM64_SP,
>>  	PERF_REG_ARM64_PC,
>> -	PERF_REG_ARM64_MAX,
>> +
>> +	/* Extended/pseudo registers */
>> +	PERF_REG_ARM64_VG = 46, // SVE Vector Granule
>> +
>> +	PERF_REG_ARM64_MAX = PERF_REG_ARM64_PC + 1,
>> +	PERF_REG_ARM64_EXTENDED_MAX = PERF_REG_ARM64_VG + 1

Hi Will,

Apologies for the late reply, I was on a long break.

> 
> I think you can leave PERF_REG_ARM64_MAX alone and just add:
> 
> 	PERF_REG_ARM64_VG = 46,
> 	PERF_REG_ARM64_EXTENDED_MAX,
> 
> no?

Yes that would be the same. I was going for consistency with 
arch/powerpc/include/uapi/asm/perf_regs.h which did it this way,
possibly so that ..._MAX could always be found at the end.

But I'm happy to simplify it. Although this change has already
made it onto Perf's copy of this header for v5.19-rc5. It can
still be changed but I'm not sure if that fact changes your
preference for this?

> 
>>  };
>>  #endif /* _ASM_ARM64_PERF_REGS_H */
>> diff --git a/arch/arm64/kernel/perf_regs.c b/arch/arm64/kernel/perf_regs.c
>> index f6f58e6265df..b4eece3eb17d 100644
>> --- a/arch/arm64/kernel/perf_regs.c
>> +++ b/arch/arm64/kernel/perf_regs.c
>> @@ -9,9 +9,27 @@
>>  #include <asm/perf_regs.h>
>>  #include <asm/ptrace.h>
>>  
>> +static u64 perf_ext_regs_value(int idx)
>> +{
>> +	switch (idx) {
>> +	case PERF_REG_ARM64_VG:
>> +		if (WARN_ON_ONCE(!system_supports_sve()))
>> +			return 0;
>> +
>> +		/*
>> +		 * Vector granule is current length in bits of SVE registers
>> +		 * divided by 64.
>> +		 */
>> +		return (task_get_sve_vl(current) * 8) / 64;
> 
> Is 'current' the right thing to use here? We pass the regs everywhere else,
> so I'd prefer to stick with that if possible.

I did try that to start with but it required either a change to 'struct pt_regs'
which didn't feel right, or adding an additional struct to be passed around in
the core perf code.

Ultimately for userspace samples, 'current' is used anyway:

  	regs_user->regs = task_pt_regs(current);
	regs_user->abi = perf_reg_abi(current);

The reason prepare_sample() and then output_sample() are called separately is
to get the total size of the sample ahead of outputting it, but
this is already taken into account with registers without storing them because
the number of set bits are counted, not the number of saved registers.

I also referenced get_ext_regs_value() in arch/powerpc/perf/perf_regs.c which
doesn't change any function signatures or structs and just grabs whichever
current extra register was needed.

So TLDR: yes it would be possible to save the register somewhere and
pass it around, but it would be the saved version of 'current' anyway,
and it would require much bigger changes.

I can make the change if you like if it's easier to judge by seeing what it
would look like?

> 
>> +	default:
>> +		WARN_ON_ONCE(true);
>> +		return 0;
>> +	}
>> +}
>> +
>>  u64 perf_reg_value(struct pt_regs *regs, int idx)
>>  {
>> -	if (WARN_ON_ONCE((u32)idx >= PERF_REG_ARM64_MAX))
>> +	if (WARN_ON_ONCE((u32)idx >= PERF_REG_ARM64_EXTENDED_MAX))
>>  		return 0;
>>  
>>  	/*
>> @@ -51,6 +69,9 @@ u64 perf_reg_value(struct pt_regs *regs, int idx)
>>  	if ((u32)idx == PERF_REG_ARM64_PC)
>>  		return regs->pc;
>>  
>> +	if ((u32)idx >= PERF_REG_ARM64_MAX)
>> +		return perf_ext_regs_value(idx);
>> +
>>  	return regs->regs[idx];
>>  }
>>  
>> @@ -58,7 +79,12 @@ u64 perf_reg_value(struct pt_regs *regs, int idx)
>>  
>>  int perf_reg_validate(u64 mask)
>>  {
>> -	if (!mask || mask & REG_RESERVED)
>> +	u64 reserved_mask = REG_RESERVED;
>> +
>> +	if (system_supports_sve())
>> +		reserved_mask &= ~(1ULL << PERF_REG_ARM64_VG);
>> +
>> +	if (!mask || mask & reserved_mask)
>>  		return -EINVAL;
>>  
>>  	return 0;
>> diff --git a/drivers/perf/arm_pmu.c b/drivers/perf/arm_pmu.c
>> index 59d3980b8ca2..3f07df5a7e95 100644
>> --- a/drivers/perf/arm_pmu.c
>> +++ b/drivers/perf/arm_pmu.c
>> @@ -894,7 +894,7 @@ static struct arm_pmu *__armpmu_alloc(gfp_t flags)
>>  		 * pmu::filter_match callback and pmu::event_init group
>>  		 * validation).
>>  		 */
>> -		.capabilities	= PERF_PMU_CAP_HETEROGENEOUS_CPUS,
>> +		.capabilities	= PERF_PMU_CAP_HETEROGENEOUS_CPUS | PERF_PMU_CAP_EXTENDED_REGS,
> 
> How does userspace know this capability is available? 

It doesn't, this change is only needed kernel side because if a register
matching PERF_REG_EXTENDED_MASK is requested, the PMU has to have this
capability otherwise the perf_event_open is rejected.

> Should we advertise
> the set of extended registers that we support, rather than make this a
> one-trick pony for the vector length?

Maybe, this is another copy of what was done for extended registers on
Power and x86 where the event is attempted to be opened and if it works it's
assumed that it's supported:

	/*
	 * check if the pmu supports perf extended regs, before
	 * returning the register mask to sample.
	 */
	fd = sys_perf_event_open(&attr, 0, -1, -1, 0);
	if (fd != -1) {
		close(fd);
		mask |= extended_mask;
	}

I did something similar in user space for this change:

  https://www.spinics.net/lists/linux-perf-users/msg19569.html

Actually I also check HWCAP_SVE, which is not strictly necessary. I suppose
we could report the bitmask of which extended registers are supported. But
for me just trying to open the event works just as well.

> 
> Also, you don't appear to #define PERF_REG_EXTENDED_MASK so I don't
> understand how userspace is supposed to interact with this. Won't
> has_extended_regs() always return false?

Oops, nice catch. Yes I think I accidentally dropped this part but didn't
notice because it still works without it. If I add this, then the event doesn't
open unless PERF_PMU_CAP_EXTENDED_REGS is added, which is what I wanted:

  #define PERF_REG_EXTENDED_MASK (~((1ULL << PERF_REG_ARM64_VG) - 1))

So the only reason to use PERF_PMU_CAP_EXTENDED_REGS is for consistency and to
avoid changing PERF_REG_ARM64_MAX which could potentially break userspace
programs depending on it.

James

> 
> Will

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [PATCH v2 1/2] perf: arm64: Add SVE vector granule register to user regs
@ 2022-07-19  8:51       ` James Clark
  0 siblings, 0 replies; 16+ messages in thread
From: James Clark @ 2022-07-19  8:51 UTC (permalink / raw)
  To: Will Deacon
  Cc: linux-arm-kernel, linux-kernel, linux-perf-users, broonie, acme,
	german.gomez, leo.yan, mathieu.poirier, john.garry,
	Catalin Marinas, Jonathan Corbet, Mark Rutland, linux-doc,
	Suzuki K Poulose, Anshuman Khandual



On 27/06/2022 12:13, Will Deacon wrote:
> On Tue, May 17, 2022 at 11:07:42AM +0100, James Clark wrote:
>> Dwarf based unwinding in a function that pushes SVE registers onto
>> the stack requires the unwinder to know the length of the SVE register
>> to calculate the stack offsets correctly. This was added to the Arm
>> specific Dwarf spec as the VG pseudo register[1].
>>
>> Add the vector length at position 46 if it's requested by userspace and
>> SVE is supported. If it's not supported then fail to open the event.
>>
>> The vector length must be on each sample because it can be changed
>> at runtime via a prctl or ptrace call. Also by adding it as a register
>> rather than a separate attribute, minimal changes will be required in an
>> unwinder that already indexes into the register list.
>>
>> [1]: https://github.com/ARM-software/abi-aa/blob/main/aadwarf64/aadwarf64.rst
>>
>> Reviewed-by: Mark Brown <broonie@kernel.org>
>> Signed-off-by: James Clark <james.clark@arm.com>
>> ---
>>  arch/arm64/include/uapi/asm/perf_regs.h |  7 +++++-
>>  arch/arm64/kernel/perf_regs.c           | 30 +++++++++++++++++++++++--
>>  drivers/perf/arm_pmu.c                  |  2 +-
>>  3 files changed, 35 insertions(+), 4 deletions(-)
>>
>> diff --git a/arch/arm64/include/uapi/asm/perf_regs.h b/arch/arm64/include/uapi/asm/perf_regs.h
>> index d54daafa89e3..fd157f46727e 100644
>> --- a/arch/arm64/include/uapi/asm/perf_regs.h
>> +++ b/arch/arm64/include/uapi/asm/perf_regs.h
>> @@ -36,6 +36,11 @@ enum perf_event_arm_regs {
>>  	PERF_REG_ARM64_LR,
>>  	PERF_REG_ARM64_SP,
>>  	PERF_REG_ARM64_PC,
>> -	PERF_REG_ARM64_MAX,
>> +
>> +	/* Extended/pseudo registers */
>> +	PERF_REG_ARM64_VG = 46, // SVE Vector Granule
>> +
>> +	PERF_REG_ARM64_MAX = PERF_REG_ARM64_PC + 1,
>> +	PERF_REG_ARM64_EXTENDED_MAX = PERF_REG_ARM64_VG + 1

Hi Will,

Apologies for the late reply, I was on a long break.

> 
> I think you can leave PERF_REG_ARM64_MAX alone and just add:
> 
> 	PERF_REG_ARM64_VG = 46,
> 	PERF_REG_ARM64_EXTENDED_MAX,
> 
> no?

Yes that would be the same. I was going for consistency with 
arch/powerpc/include/uapi/asm/perf_regs.h which did it this way,
possibly so that ..._MAX could always be found at the end.

But I'm happy to simplify it. Although this change has already
made it onto Perf's copy of this header for v5.19-rc5. It can
still be changed but I'm not sure if that fact changes your
preference for this?

> 
>>  };
>>  #endif /* _ASM_ARM64_PERF_REGS_H */
>> diff --git a/arch/arm64/kernel/perf_regs.c b/arch/arm64/kernel/perf_regs.c
>> index f6f58e6265df..b4eece3eb17d 100644
>> --- a/arch/arm64/kernel/perf_regs.c
>> +++ b/arch/arm64/kernel/perf_regs.c
>> @@ -9,9 +9,27 @@
>>  #include <asm/perf_regs.h>
>>  #include <asm/ptrace.h>
>>  
>> +static u64 perf_ext_regs_value(int idx)
>> +{
>> +	switch (idx) {
>> +	case PERF_REG_ARM64_VG:
>> +		if (WARN_ON_ONCE(!system_supports_sve()))
>> +			return 0;
>> +
>> +		/*
>> +		 * Vector granule is current length in bits of SVE registers
>> +		 * divided by 64.
>> +		 */
>> +		return (task_get_sve_vl(current) * 8) / 64;
> 
> Is 'current' the right thing to use here? We pass the regs everywhere else,
> so I'd prefer to stick with that if possible.

I did try that to start with but it required either a change to 'struct pt_regs'
which didn't feel right, or adding an additional struct to be passed around in
the core perf code.

Ultimately for userspace samples, 'current' is used anyway:

  	regs_user->regs = task_pt_regs(current);
	regs_user->abi = perf_reg_abi(current);

The reason prepare_sample() and then output_sample() are called separately is
to get the total size of the sample ahead of outputting it, but
this is already taken into account with registers without storing them because
the number of set bits are counted, not the number of saved registers.

I also referenced get_ext_regs_value() in arch/powerpc/perf/perf_regs.c which
doesn't change any function signatures or structs and just grabs whichever
current extra register was needed.

So TLDR: yes it would be possible to save the register somewhere and
pass it around, but it would be the saved version of 'current' anyway,
and it would require much bigger changes.

I can make the change if you like if it's easier to judge by seeing what it
would look like?

> 
>> +	default:
>> +		WARN_ON_ONCE(true);
>> +		return 0;
>> +	}
>> +}
>> +
>>  u64 perf_reg_value(struct pt_regs *regs, int idx)
>>  {
>> -	if (WARN_ON_ONCE((u32)idx >= PERF_REG_ARM64_MAX))
>> +	if (WARN_ON_ONCE((u32)idx >= PERF_REG_ARM64_EXTENDED_MAX))
>>  		return 0;
>>  
>>  	/*
>> @@ -51,6 +69,9 @@ u64 perf_reg_value(struct pt_regs *regs, int idx)
>>  	if ((u32)idx == PERF_REG_ARM64_PC)
>>  		return regs->pc;
>>  
>> +	if ((u32)idx >= PERF_REG_ARM64_MAX)
>> +		return perf_ext_regs_value(idx);
>> +
>>  	return regs->regs[idx];
>>  }
>>  
>> @@ -58,7 +79,12 @@ u64 perf_reg_value(struct pt_regs *regs, int idx)
>>  
>>  int perf_reg_validate(u64 mask)
>>  {
>> -	if (!mask || mask & REG_RESERVED)
>> +	u64 reserved_mask = REG_RESERVED;
>> +
>> +	if (system_supports_sve())
>> +		reserved_mask &= ~(1ULL << PERF_REG_ARM64_VG);
>> +
>> +	if (!mask || mask & reserved_mask)
>>  		return -EINVAL;
>>  
>>  	return 0;
>> diff --git a/drivers/perf/arm_pmu.c b/drivers/perf/arm_pmu.c
>> index 59d3980b8ca2..3f07df5a7e95 100644
>> --- a/drivers/perf/arm_pmu.c
>> +++ b/drivers/perf/arm_pmu.c
>> @@ -894,7 +894,7 @@ static struct arm_pmu *__armpmu_alloc(gfp_t flags)
>>  		 * pmu::filter_match callback and pmu::event_init group
>>  		 * validation).
>>  		 */
>> -		.capabilities	= PERF_PMU_CAP_HETEROGENEOUS_CPUS,
>> +		.capabilities	= PERF_PMU_CAP_HETEROGENEOUS_CPUS | PERF_PMU_CAP_EXTENDED_REGS,
> 
> How does userspace know this capability is available? 

It doesn't, this change is only needed kernel side because if a register
matching PERF_REG_EXTENDED_MASK is requested, the PMU has to have this
capability otherwise the perf_event_open is rejected.

> Should we advertise
> the set of extended registers that we support, rather than make this a
> one-trick pony for the vector length?

Maybe, this is another copy of what was done for extended registers on
Power and x86 where the event is attempted to be opened and if it works it's
assumed that it's supported:

	/*
	 * check if the pmu supports perf extended regs, before
	 * returning the register mask to sample.
	 */
	fd = sys_perf_event_open(&attr, 0, -1, -1, 0);
	if (fd != -1) {
		close(fd);
		mask |= extended_mask;
	}

I did something similar in user space for this change:

  https://www.spinics.net/lists/linux-perf-users/msg19569.html

Actually I also check HWCAP_SVE, which is not strictly necessary. I suppose
we could report the bitmask of which extended registers are supported. But
for me just trying to open the event works just as well.

> 
> Also, you don't appear to #define PERF_REG_EXTENDED_MASK so I don't
> understand how userspace is supposed to interact with this. Won't
> has_extended_regs() always return false?

Oops, nice catch. Yes I think I accidentally dropped this part but didn't
notice because it still works without it. If I add this, then the event doesn't
open unless PERF_PMU_CAP_EXTENDED_REGS is added, which is what I wanted:

  #define PERF_REG_EXTENDED_MASK (~((1ULL << PERF_REG_ARM64_VG) - 1))

So the only reason to use PERF_PMU_CAP_EXTENDED_REGS is for consistency and to
avoid changing PERF_REG_ARM64_MAX which could potentially break userspace
programs depending on it.

James

> 
> Will

_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel

^ permalink raw reply	[flat|nested] 16+ messages in thread

end of thread, other threads:[~2022-07-19  8:52 UTC | newest]

Thread overview: 16+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2022-05-17 10:07 [PATCH v2 0/2] perf: arm64: Kernel support for Dwarf unwinding through SVE functions James Clark
2022-05-17 10:07 ` James Clark
2022-05-17 10:07 ` [PATCH v2 1/2] perf: arm64: Add SVE vector granule register to user regs James Clark
2022-05-17 10:07   ` James Clark
2022-06-27 11:13   ` Will Deacon
2022-06-27 11:13     ` Will Deacon
2022-07-19  8:51     ` James Clark
2022-07-19  8:51       ` James Clark
2022-05-17 10:07 ` [PATCH v2 2/2] arm64/sve: Add Perf extensions documentation James Clark
2022-05-17 10:07   ` James Clark
2022-05-17 11:07   ` Mark Brown
2022-05-17 11:07     ` Mark Brown
2022-06-04  2:07 ` [PATCH v2 0/2] perf: arm64: Kernel support for Dwarf unwinding through SVE functions Leo Yan
2022-06-04  2:07   ` Leo Yan
2022-06-04  2:14   ` Leo Yan
2022-06-04  2:14     ` Leo Yan

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.