linux-arm-kernel.lists.infradead.org archive mirror
 help / color / mirror / Atom feed
* [PATCH V3 0/7] arm64/perf: Enable branch stack sampling
@ 2022-09-29  7:58 Anshuman Khandual
  2022-09-29  7:58 ` [PATCH V3 1/7] arm64/perf: Add BRBE registers and fields Anshuman Khandual
                   ` (6 more replies)
  0 siblings, 7 replies; 18+ messages in thread
From: Anshuman Khandual @ 2022-09-29  7:58 UTC (permalink / raw)
  To: linux-kernel, linux-perf-users, linux-arm-kernel, peterz, acme,
	mark.rutland, will, catalin.marinas
  Cc: Anshuman Khandual, James Clark, Rob Herring, Marc Zyngier,
	Suzuki Poulose, Ingo Molnar

This series enables perf branch stack sampling support on arm64 platform
via a new arch feature called Branch Record Buffer Extension (BRBE). All
relevant register definitions could be accessed here.

https://developer.arm.com/documentation/ddi0601/2021-12/AArch64-Registers

This series applies on v6.0-rc5 after the BRBE related perf ABI changes series
(V7) that was posted earlier, and a branch sample filter helper patch.

https://lore.kernel.org/all/20220824044822.70230-1-anshuman.khandual@arm.com/

https://lore.kernel.org/all/20220906084414.396220-1-anshuman.khandual@arm.com/

Changes in V3:

- Moved brbe_stack from the stack and now dynamically allocated
- Return PERF_BR_PRIV_UNKNOWN instead of -1 in brbe_fetch_perf_priv()
- Moved BRBIDR0, BRBCR, BRBFCR registers and fields into tools/sysreg
- Created dummy BRBINF_EL1 field definitions in tools/sysreg
- Dropped ARMPMU_EVT_PRIV framework which cached perfmon_capable()
- Both exception and exception return branche records are now captured
  only if the event has PERF_SAMPLE_BRANCH_KERNEL which would already
  been checked in generic perf via perf_allow_kernel()

Changes in V2:

https://lore.kernel.org/all/20220908051046.465307-1-anshuman.khandual@arm.com/

- Dropped branch sample filter helpers consolidation patch from this series 
- Added new hw_perf_event.flags element ARMPMU_EVT_PRIV to cache perfmon_capable()
- Use cached perfmon_capable() while configuring BRBE branch record filters

Changes in V1:

https://lore.kernel.org/linux-arm-kernel/20220613100119.684673-1-anshuman.khandual@arm.com/

- Added CONFIG_PERF_EVENTS wrapper for all branch sample filter helpers
- Process new perf branch types via PERF_BR_EXTEND_ABI

Changes in RFC V2:

https://lore.kernel.org/linux-arm-kernel/20220412115455.293119-1-anshuman.khandual@arm.com/

- Added branch_sample_priv() while consolidating other branch sample filter helpers
- Changed all SYS_BRBXXXN_EL1 register definition encodings per Marc
- Changed the BRBE driver as per proposed BRBE related perf ABI changes (V5)
- Added documentation for struct arm_pmu changes, updated commit message
- Updated commit message for BRBE detection infrastructure patch
- PERF_SAMPLE_BRANCH_KERNEL gets checked during arm event init (outside the driver)
- Branch privilege state capture mechanism has now moved inside the driver

Changes in RFC V1:

https://lore.kernel.org/all/1642998653-21377-1-git-send-email-anshuman.khandual@arm.com/

Cc: Catalin Marinas <catalin.marinas@arm.com>
Cc: Will Deacon <will@kernel.org>
Cc: Mark Rutland <mark.rutland@arm.com>
Cc: James Clark <james.clark@arm.com>
Cc: Rob Herring <robh@kernel.org>
Cc: Marc Zyngier <maz@kernel.org>
Cc: Suzuki Poulose <suzuki.poulose@arm.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: Arnaldo Carvalho de Melo <acme@kernel.org>
Cc: linux-arm-kernel@lists.infradead.org
Cc: linux-perf-users@vger.kernel.org
Cc: linux-kernel@vger.kernel.org


Anshuman Khandual (7):
  arm64/perf: Add BRBE registers and fields
  arm64/perf: Update struct arm_pmu for BRBE
  arm64/perf: Update struct pmu_hw_events for BRBE
  driver/perf/arm_pmu_platform: Add support for BRBE attributes detection
  arm64/perf: Drive BRBE from perf event states
  arm64/perf: Add BRBE driver
  arm64/perf: Enable branch stack sampling

 arch/arm64/include/asm/sysreg.h | 107 ++++++++
 arch/arm64/kernel/perf_event.c  |  48 ++++
 arch/arm64/tools/sysreg         | 159 ++++++++++++
 drivers/perf/Kconfig            |  11 +
 drivers/perf/Makefile           |   1 +
 drivers/perf/arm_pmu.c          |  73 +++++-
 drivers/perf/arm_pmu_brbe.c     | 447 ++++++++++++++++++++++++++++++++
 drivers/perf/arm_pmu_brbe.h     | 259 ++++++++++++++++++
 drivers/perf/arm_pmu_platform.c |  34 +++
 include/linux/perf/arm_pmu.h    |  67 +++++
 10 files changed, 1203 insertions(+), 3 deletions(-)
 create mode 100644 drivers/perf/arm_pmu_brbe.c
 create mode 100644 drivers/perf/arm_pmu_brbe.h

-- 
2.25.1


_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel

^ permalink raw reply	[flat|nested] 18+ messages in thread

* [PATCH V3 1/7] arm64/perf: Add BRBE registers and fields
  2022-09-29  7:58 [PATCH V3 0/7] arm64/perf: Enable branch stack sampling Anshuman Khandual
@ 2022-09-29  7:58 ` Anshuman Khandual
  2022-09-29 11:29   ` Mark Brown
  2022-09-29  7:58 ` [PATCH V3 2/7] arm64/perf: Update struct arm_pmu for BRBE Anshuman Khandual
                   ` (5 subsequent siblings)
  6 siblings, 1 reply; 18+ messages in thread
From: Anshuman Khandual @ 2022-09-29  7:58 UTC (permalink / raw)
  To: linux-kernel, linux-perf-users, linux-arm-kernel, peterz, acme,
	mark.rutland, will, catalin.marinas
  Cc: Anshuman Khandual, Marc Zyngier

This adds BRBE related register definitions and various other related field
macros there in. These will be used subsequently in a BRBE driver which is
being added later on.

Cc: Catalin Marinas <catalin.marinas@arm.com>
Cc: Will Deacon <will@kernel.org>
Cc: Marc Zyngier <maz@kernel.org>
Cc: linux-arm-kernel@lists.infradead.org
Cc: linux-kernel@vger.kernel.org
Signed-off-by: Anshuman Khandual <anshuman.khandual@arm.com>
---
 arch/arm64/include/asm/sysreg.h | 107 +++++++++++++++++++++
 arch/arm64/tools/sysreg         | 159 ++++++++++++++++++++++++++++++++
 2 files changed, 266 insertions(+)

diff --git a/arch/arm64/include/asm/sysreg.h b/arch/arm64/include/asm/sysreg.h
index 818df938a7ad..1cf9345730f0 100644
--- a/arch/arm64/include/asm/sysreg.h
+++ b/arch/arm64/include/asm/sysreg.h
@@ -161,6 +161,109 @@
 #define SYS_DBGDTRTX_EL0		sys_reg(2, 3, 0, 5, 0)
 #define SYS_DBGVCR32_EL2		sys_reg(2, 4, 0, 7, 0)
 
+#define __SYS_BRBINFO(n)		sys_reg(2, 1, 8, ((n) & 0xf), ((((n) & 0x10)) >> 2 + 0))
+#define __SYS_BRBSRC(n)			sys_reg(2, 1, 8, ((n) & 0xf), ((((n) & 0x10)) >> 2 + 1))
+#define __SYS_BRBTGT(n)			sys_reg(2, 1, 8, ((n) & 0xf), ((((n) & 0x10)) >> 2 + 2))
+
+#define SYS_BRBINF0_EL1			__SYS_BRBINFO(0)
+#define SYS_BRBINF1_EL1			__SYS_BRBINFO(1)
+#define SYS_BRBINF2_EL1			__SYS_BRBINFO(2)
+#define SYS_BRBINF3_EL1			__SYS_BRBINFO(3)
+#define SYS_BRBINF4_EL1			__SYS_BRBINFO(4)
+#define SYS_BRBINF5_EL1			__SYS_BRBINFO(5)
+#define SYS_BRBINF6_EL1			__SYS_BRBINFO(6)
+#define SYS_BRBINF7_EL1			__SYS_BRBINFO(7)
+#define SYS_BRBINF8_EL1			__SYS_BRBINFO(8)
+#define SYS_BRBINF9_EL1			__SYS_BRBINFO(9)
+#define SYS_BRBINF10_EL1		__SYS_BRBINFO(10)
+#define SYS_BRBINF11_EL1		__SYS_BRBINFO(11)
+#define SYS_BRBINF12_EL1		__SYS_BRBINFO(12)
+#define SYS_BRBINF13_EL1		__SYS_BRBINFO(13)
+#define SYS_BRBINF14_EL1		__SYS_BRBINFO(14)
+#define SYS_BRBINF15_EL1		__SYS_BRBINFO(15)
+#define SYS_BRBINF16_EL1		__SYS_BRBINFO(16)
+#define SYS_BRBINF17_EL1		__SYS_BRBINFO(17)
+#define SYS_BRBINF18_EL1		__SYS_BRBINFO(18)
+#define SYS_BRBINF19_EL1		__SYS_BRBINFO(19)
+#define SYS_BRBINF20_EL1		__SYS_BRBINFO(20)
+#define SYS_BRBINF21_EL1		__SYS_BRBINFO(21)
+#define SYS_BRBINF22_EL1		__SYS_BRBINFO(22)
+#define SYS_BRBINF23_EL1		__SYS_BRBINFO(23)
+#define SYS_BRBINF24_EL1		__SYS_BRBINFO(24)
+#define SYS_BRBINF25_EL1		__SYS_BRBINFO(25)
+#define SYS_BRBINF26_EL1		__SYS_BRBINFO(26)
+#define SYS_BRBINF27_EL1		__SYS_BRBINFO(27)
+#define SYS_BRBINF28_EL1		__SYS_BRBINFO(28)
+#define SYS_BRBINF29_EL1		__SYS_BRBINFO(29)
+#define SYS_BRBINF30_EL1		__SYS_BRBINFO(30)
+#define SYS_BRBINF31_EL1		__SYS_BRBINFO(31)
+
+#define SYS_BRBSRC0_EL1			__SYS_BRBSRC(0)
+#define SYS_BRBSRC1_EL1			__SYS_BRBSRC(1)
+#define SYS_BRBSRC2_EL1			__SYS_BRBSRC(2)
+#define SYS_BRBSRC3_EL1			__SYS_BRBSRC(3)
+#define SYS_BRBSRC4_EL1			__SYS_BRBSRC(4)
+#define SYS_BRBSRC5_EL1			__SYS_BRBSRC(5)
+#define SYS_BRBSRC6_EL1			__SYS_BRBSRC(6)
+#define SYS_BRBSRC7_EL1			__SYS_BRBSRC(7)
+#define SYS_BRBSRC8_EL1			__SYS_BRBSRC(8)
+#define SYS_BRBSRC9_EL1			__SYS_BRBSRC(9)
+#define SYS_BRBSRC10_EL1		__SYS_BRBSRC(10)
+#define SYS_BRBSRC11_EL1		__SYS_BRBSRC(11)
+#define SYS_BRBSRC12_EL1		__SYS_BRBSRC(12)
+#define SYS_BRBSRC13_EL1		__SYS_BRBSRC(13)
+#define SYS_BRBSRC14_EL1		__SYS_BRBSRC(14)
+#define SYS_BRBSRC15_EL1		__SYS_BRBSRC(15)
+#define SYS_BRBSRC16_EL1		__SYS_BRBSRC(16)
+#define SYS_BRBSRC17_EL1		__SYS_BRBSRC(17)
+#define SYS_BRBSRC18_EL1		__SYS_BRBSRC(18)
+#define SYS_BRBSRC19_EL1		__SYS_BRBSRC(19)
+#define SYS_BRBSRC20_EL1		__SYS_BRBSRC(20)
+#define SYS_BRBSRC21_EL1		__SYS_BRBSRC(21)
+#define SYS_BRBSRC22_EL1		__SYS_BRBSRC(22)
+#define SYS_BRBSRC23_EL1		__SYS_BRBSRC(23)
+#define SYS_BRBSRC24_EL1		__SYS_BRBSRC(24)
+#define SYS_BRBSRC25_EL1		__SYS_BRBSRC(25)
+#define SYS_BRBSRC26_EL1		__SYS_BRBSRC(26)
+#define SYS_BRBSRC27_EL1		__SYS_BRBSRC(27)
+#define SYS_BRBSRC28_EL1		__SYS_BRBSRC(28)
+#define SYS_BRBSRC29_EL1		__SYS_BRBSRC(29)
+#define SYS_BRBSRC30_EL1		__SYS_BRBSRC(30)
+#define SYS_BRBSRC31_EL1		__SYS_BRBSRC(31)
+
+#define SYS_BRBTGT0_EL1			__SYS_BRBTGT(0)
+#define SYS_BRBTGT1_EL1			__SYS_BRBTGT(1)
+#define SYS_BRBTGT2_EL1			__SYS_BRBTGT(2)
+#define SYS_BRBTGT3_EL1			__SYS_BRBTGT(3)
+#define SYS_BRBTGT4_EL1			__SYS_BRBTGT(4)
+#define SYS_BRBTGT5_EL1			__SYS_BRBTGT(5)
+#define SYS_BRBTGT6_EL1			__SYS_BRBTGT(6)
+#define SYS_BRBTGT7_EL1			__SYS_BRBTGT(7)
+#define SYS_BRBTGT8_EL1			__SYS_BRBTGT(8)
+#define SYS_BRBTGT9_EL1			__SYS_BRBTGT(9)
+#define SYS_BRBTGT10_EL1		__SYS_BRBTGT(10)
+#define SYS_BRBTGT11_EL1		__SYS_BRBTGT(11)
+#define SYS_BRBTGT12_EL1		__SYS_BRBTGT(12)
+#define SYS_BRBTGT13_EL1		__SYS_BRBTGT(13)
+#define SYS_BRBTGT14_EL1		__SYS_BRBTGT(14)
+#define SYS_BRBTGT15_EL1		__SYS_BRBTGT(15)
+#define SYS_BRBTGT16_EL1		__SYS_BRBTGT(16)
+#define SYS_BRBTGT17_EL1		__SYS_BRBTGT(17)
+#define SYS_BRBTGT18_EL1		__SYS_BRBTGT(18)
+#define SYS_BRBTGT19_EL1		__SYS_BRBTGT(19)
+#define SYS_BRBTGT20_EL1		__SYS_BRBTGT(20)
+#define SYS_BRBTGT21_EL1		__SYS_BRBTGT(21)
+#define SYS_BRBTGT22_EL1		__SYS_BRBTGT(22)
+#define SYS_BRBTGT23_EL1		__SYS_BRBTGT(23)
+#define SYS_BRBTGT24_EL1		__SYS_BRBTGT(24)
+#define SYS_BRBTGT25_EL1		__SYS_BRBTGT(25)
+#define SYS_BRBTGT26_EL1		__SYS_BRBTGT(26)
+#define SYS_BRBTGT27_EL1		__SYS_BRBTGT(27)
+#define SYS_BRBTGT28_EL1		__SYS_BRBTGT(28)
+#define SYS_BRBTGT29_EL1		__SYS_BRBTGT(29)
+#define SYS_BRBTGT30_EL1		__SYS_BRBTGT(30)
+#define SYS_BRBTGT31_EL1		__SYS_BRBTGT(31)
+
 #define SYS_MIDR_EL1			sys_reg(3, 0, 0, 0, 0)
 #define SYS_MPIDR_EL1			sys_reg(3, 0, 0, 0, 5)
 #define SYS_REVIDR_EL1			sys_reg(3, 0, 0, 0, 6)
@@ -826,6 +929,7 @@
 #define ID_AA64MMFR2_CNP_SHIFT		0
 
 /* id_aa64dfr0 */
+#define ID_AA64DFR0_BRBE_SHIFT		52
 #define ID_AA64DFR0_MTPMU_SHIFT		48
 #define ID_AA64DFR0_TRBE_SHIFT		44
 #define ID_AA64DFR0_TRACE_FILT_SHIFT	40
@@ -848,6 +952,9 @@
 #define ID_AA64DFR0_PMSVER_8_2		0x1
 #define ID_AA64DFR0_PMSVER_8_3		0x2
 
+#define ID_AA64DFR0_BRBE		0x1
+#define ID_AA64DFR0_BRBE_V1P1		0x2
+
 #define ID_DFR0_PERFMON_SHIFT		24
 
 #define ID_DFR0_PERFMON_8_0		0x3
diff --git a/arch/arm64/tools/sysreg b/arch/arm64/tools/sysreg
index 9ae483ec1e56..b7e945e95f05 100644
--- a/arch/arm64/tools/sysreg
+++ b/arch/arm64/tools/sysreg
@@ -46,6 +46,165 @@
 # feature that introduces them (eg, FEAT_LS64_ACCDATA introduces enumeration
 # item ACCDATA) though it may be more taseful to do something else.
 
+
+# This is just a dummy register declaration to get all common field masks and
+# shifts for accessing given BRBINF contents.
+Sysreg	BRBINF_EL1	2	1	8	0	0
+Res0	63:47
+Field	46	CCU
+Field	45:32	CC
+Res0	31:18
+Field	17	LASTFAILED
+Field	16	TX
+Res0	15:14
+Enum	13:8		TYPE
+	0b000000	UNCOND_DIR
+	0b000001	INDIR
+	0b000010	DIR_LINK
+	0b000011	INDIR_LINK
+	0b000101	RET_SUB
+	0b000111	RET_EXCPT
+	0b001000	COND_DIR
+	0b100001	DEBUG_HALT
+	0b100010	CALL
+	0b100011	TRAP
+	0b100100	SERROR
+	0b100110	INST_DEBUG
+	0b100111	DATA_DEBUG
+	0b101010	ALGN_FAULT
+	0b101011	INST_FAULT
+	0b101100	DATA_FAULT
+	0b101110	IRQ
+	0b101111	FIQ
+	0b111001	DEBUG_EXIT
+EndEnum
+Enum	7:6	EL
+	0b00	EL0
+	0b01	EL1
+	0b10	EL2
+EndEnum
+Field	5	MPRED
+Res0	4:2
+Enum	1:0	VALID
+	0b00	NONE
+	0b01	TARGET
+	0b10	SOURCE
+	0b11	FULL
+EndEnum
+EndSysreg
+
+Sysreg	BRBCR_EL1	2	1	9	0	0
+Res0	63:24
+Field	23 	EXCEPTION
+Field	22 	ERTN
+Res0	21:9
+Field	8 	FZP
+Field	7	ZERO
+Enum	6:5	TS
+	0b1	VIRTUAL
+	0b10	GST_PHYSICAL
+	0b11	PHYSICAL
+EndEnum
+Field	4	MPRED
+Field	3	CC
+Field	2	ZERO1
+Field	1	E1BRE
+Field	0	E0BRE
+EndSysreg
+
+Sysreg	BRBFCR_EL1	2	1	9	0	1
+Res0	63:30
+Enum	29:28	BANK
+	0b0	FIRST
+	0b1	SECOND
+EndEnum
+Res0	27:23
+Field	22	CONDDIR
+Field	21	DIRCALL
+Field	20	INDCALL
+Field	19	RTN
+Field	18	INDIRECT
+Field	17	DIRECT
+Field	16	ENL
+Res0	15:8
+Field	7	PAUSED
+Field	6	LASTFAILED
+Res0	5:0
+EndSysreg
+
+Sysreg	BRBTS_EL1	2	1	9	0	2
+Field	63:0	TS
+EndSysreg
+
+Sysreg	BRBINFINJ_EL1	2	1	9	1	0
+Res0	63:47
+Field	46	CCU
+Field	45:32	CC
+Res0	31:18
+Field	17	LASTFAILED
+Field	16	TX
+Res0	15:14
+Enum	13:8		TYPE
+	0b000000	UNCOND_DIR
+	0b000001	INDIR
+	0b000010	DIR_LINK
+	0b000011	INDIR_LINK
+	0b000100	RET_SUB
+	0b000100	RET_SUB
+	0b000111	RET_EXCPT
+	0b001000	COND_DIR
+	0b100001	DEBUG_HALT
+	0b100010	CALL
+	0b100011	TRAP
+	0b100100	SERROR
+	0b100110	INST_DEBUG
+	0b100111	DATA_DEBUG
+	0b101010	ALGN_FAULT
+	0b101011	INST_FAULT
+	0b101100	DATA_FAULT
+	0b101110	IRQ
+	0b101111	FIQ
+	0b111001	DEBUG_EXIT
+EndEnum
+Enum	7:6	EL
+	0b00	EL0
+	0b01	EL1
+	0b10	EL2
+EndEnum
+Field	5	MPRED
+Res0	4:2
+Enum	1:0	VALID
+	0b00	NONE
+	0b01	TARGET
+	0b10	SOURCE
+	0b00	FULL
+EndEnum
+EndSysreg
+
+Sysreg	BRBSRCINJ_EL1	2	1	9	1	1
+Field	63:0 ADDRESS
+EndSysreg
+
+Sysreg	BRBTGTINJ_EL1	2	1	9	1	2
+Field	63:0 ADDRESS
+EndSysreg
+
+Sysreg	BRBIDR0_EL1	2	1	9	2	0
+Res0	63:16
+Enum	15:12	CC
+	0b101	20_BIT
+EndEnum
+Enum	11:8	FORMAT
+	0b0	0
+EndEnum
+Enum	7:0		NUMREC
+	0b1000		8
+	0b10000		16
+	0b100000	32
+	0b1000000	64
+EndEnum
+EndSysreg
+
 Sysreg	ID_AA64ZFR0_EL1	3	0	0	4	4
 Res0	63:60
 Enum	59:56	F64MM
-- 
2.25.1


_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel

^ permalink raw reply related	[flat|nested] 18+ messages in thread

* [PATCH V3 2/7] arm64/perf: Update struct arm_pmu for BRBE
  2022-09-29  7:58 [PATCH V3 0/7] arm64/perf: Enable branch stack sampling Anshuman Khandual
  2022-09-29  7:58 ` [PATCH V3 1/7] arm64/perf: Add BRBE registers and fields Anshuman Khandual
@ 2022-09-29  7:58 ` Anshuman Khandual
  2022-09-29  7:58 ` [PATCH V3 3/7] arm64/perf: Update struct pmu_hw_events " Anshuman Khandual
                   ` (4 subsequent siblings)
  6 siblings, 0 replies; 18+ messages in thread
From: Anshuman Khandual @ 2022-09-29  7:58 UTC (permalink / raw)
  To: linux-kernel, linux-perf-users, linux-arm-kernel, peterz, acme,
	mark.rutland, will, catalin.marinas
  Cc: Anshuman Khandual, Ingo Molnar

Although BRBE is an armv8 speciifc HW feature, abstracting out its various
function callbacks at the struct arm_pmu level is preferred, as it cleaner
, easier to follow and maintain.

Besides some helpers i.e brbe_supported(), brbe_probe() and brbe_reset()
might not fit seamlessly, when tried to be embedded via existing arm_pmu
helpers in the armv8 implementation.

Updates the struct arm_pmu to include all required helpers that will drive
BRBE functionality for a given PMU implementation. These are the following.

- brbe_filter	: Convert perf event filters into BRBE HW filters
- brbe_probe	: Probe BRBE HW and capture its attributes
- brbe_enable	: Enable BRBE HW with a given config
- brbe_disable	: Disable BRBE HW
- brbe_read	: Read BRBE buffer for captured branch records
- brbe_reset	: Reset BRBE buffer
- brbe_supported: Whether BRBE is supported or not

A BRBE driver implementation needs to provide these functionalities.

Cc: Will Deacon <will@kernel.org>
Cc: Mark Rutland <mark.rutland@arm.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: Catalin Marinas <catalin.marinas@arm.com>
Cc: linux-arm-kernel@lists.infradead.org
Cc: linux-perf-users@vger.kernel.org
Cc: linux-kernel@vger.kernel.org
Signed-off-by: Anshuman Khandual <anshuman.khandual@arm.com>
---
 arch/arm64/kernel/perf_event.c | 36 ++++++++++++++++++++++++++++++++++
 include/linux/perf/arm_pmu.h   | 21 ++++++++++++++++++++
 2 files changed, 57 insertions(+)

diff --git a/arch/arm64/kernel/perf_event.c b/arch/arm64/kernel/perf_event.c
index cb69ff1e6138..e7013699171f 100644
--- a/arch/arm64/kernel/perf_event.c
+++ b/arch/arm64/kernel/perf_event.c
@@ -1025,6 +1025,35 @@ static int armv8pmu_filter_match(struct perf_event *event)
 	return evtype != ARMV8_PMUV3_PERFCTR_CHAIN;
 }
 
+static void armv8pmu_brbe_filter(struct pmu_hw_events *hw_event, struct perf_event *event)
+{
+}
+
+static void armv8pmu_brbe_enable(struct pmu_hw_events *hw_event)
+{
+}
+
+static void armv8pmu_brbe_disable(struct pmu_hw_events *hw_event)
+{
+}
+
+static void armv8pmu_brbe_read(struct pmu_hw_events *hw_event, struct perf_event *event)
+{
+}
+
+static void armv8pmu_brbe_probe(struct pmu_hw_events *hw_event)
+{
+}
+
+static void armv8pmu_brbe_reset(struct pmu_hw_events *hw_event)
+{
+}
+
+static bool armv8pmu_brbe_supported(struct perf_event *event)
+{
+	return false;
+}
+
 static void armv8pmu_reset(void *info)
 {
 	struct arm_pmu *cpu_pmu = (struct arm_pmu *)info;
@@ -1257,6 +1286,13 @@ static int armv8_pmu_init(struct arm_pmu *cpu_pmu, char *name,
 
 	cpu_pmu->pmu.event_idx		= armv8pmu_user_event_idx;
 
+	cpu_pmu->brbe_filter		= armv8pmu_brbe_filter;
+	cpu_pmu->brbe_enable		= armv8pmu_brbe_enable;
+	cpu_pmu->brbe_disable		= armv8pmu_brbe_disable;
+	cpu_pmu->brbe_read		= armv8pmu_brbe_read;
+	cpu_pmu->brbe_probe		= armv8pmu_brbe_probe;
+	cpu_pmu->brbe_reset		= armv8pmu_brbe_reset;
+	cpu_pmu->brbe_supported		= armv8pmu_brbe_supported;
 	cpu_pmu->name			= name;
 	cpu_pmu->map_event		= map_event;
 	cpu_pmu->attr_groups[ARMPMU_ATTR_GROUP_EVENTS] = events ?
diff --git a/include/linux/perf/arm_pmu.h b/include/linux/perf/arm_pmu.h
index 0407a38b470a..3d427ac0ca45 100644
--- a/include/linux/perf/arm_pmu.h
+++ b/include/linux/perf/arm_pmu.h
@@ -100,6 +100,27 @@ struct arm_pmu {
 	void		(*reset)(void *);
 	int		(*map_event)(struct perf_event *event);
 	int		(*filter_match)(struct perf_event *event);
+
+	/* Convert perf event filters into BRBE HW filters */
+	void		(*brbe_filter)(struct pmu_hw_events *hw_events, struct perf_event *event);
+
+	/* Probe BRBE HW and capture its attributes */
+	void		(*brbe_probe)(struct pmu_hw_events *hw_events);
+
+	/* Enable BRBE HW with a given config */
+	void		(*brbe_enable)(struct pmu_hw_events *hw_events);
+
+	/* Disable BRBE HW */
+	void		(*brbe_disable)(struct pmu_hw_events *hw_events);
+
+	/* Process BRBE buffer for captured branch records */
+	void		(*brbe_read)(struct pmu_hw_events *hw_events, struct perf_event *event);
+
+	/* Reset BRBE buffer */
+	void		(*brbe_reset)(struct pmu_hw_events *hw_events);
+
+	/* Check whether BRBE is supported */
+	bool		(*brbe_supported)(struct perf_event *event);
 	int		num_events;
 	bool		secure_access; /* 32-bit ARM only */
 #define ARMV8_PMUV3_MAX_COMMON_EVENTS		0x40
-- 
2.25.1


_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel

^ permalink raw reply related	[flat|nested] 18+ messages in thread

* [PATCH V3 3/7] arm64/perf: Update struct pmu_hw_events for BRBE
  2022-09-29  7:58 [PATCH V3 0/7] arm64/perf: Enable branch stack sampling Anshuman Khandual
  2022-09-29  7:58 ` [PATCH V3 1/7] arm64/perf: Add BRBE registers and fields Anshuman Khandual
  2022-09-29  7:58 ` [PATCH V3 2/7] arm64/perf: Update struct arm_pmu for BRBE Anshuman Khandual
@ 2022-09-29  7:58 ` Anshuman Khandual
  2022-09-29  7:58 ` [PATCH V3 4/7] driver/perf/arm_pmu_platform: Add support for BRBE attributes detection Anshuman Khandual
                   ` (3 subsequent siblings)
  6 siblings, 0 replies; 18+ messages in thread
From: Anshuman Khandual @ 2022-09-29  7:58 UTC (permalink / raw)
  To: linux-kernel, linux-perf-users, linux-arm-kernel, peterz, acme,
	mark.rutland, will, catalin.marinas
  Cc: Anshuman Khandual

A single perf event instance BRBE related contexts and data will be tracked
in struct pmu_hw_events. Hence update the structure to accommodate required
details related to BRBE.

Cc: Will Deacon <will@kernel.org>
Cc: Mark Rutland <mark.rutland@arm.com>
Cc: linux-arm-kernel@lists.infradead.org
Cc: linux-kernel@vger.kernel.org
Signed-off-by: Anshuman Khandual <anshuman.khandual@arm.com>
---
 drivers/perf/arm_pmu.c       |  1 +
 include/linux/perf/arm_pmu.h | 26 ++++++++++++++++++++++++++
 2 files changed, 27 insertions(+)

diff --git a/drivers/perf/arm_pmu.c b/drivers/perf/arm_pmu.c
index 59d3980b8ca2..16fda9a1229f 100644
--- a/drivers/perf/arm_pmu.c
+++ b/drivers/perf/arm_pmu.c
@@ -905,6 +905,7 @@ static struct arm_pmu *__armpmu_alloc(gfp_t flags)
 
 		events = per_cpu_ptr(pmu->hw_events, cpu);
 		raw_spin_lock_init(&events->pmu_lock);
+		events->branches = kmalloc(sizeof(struct brbe_records), flags);
 		events->percpu_pmu = pmu;
 	}
 
diff --git a/include/linux/perf/arm_pmu.h b/include/linux/perf/arm_pmu.h
index 3d427ac0ca45..ffce43ceb670 100644
--- a/include/linux/perf/arm_pmu.h
+++ b/include/linux/perf/arm_pmu.h
@@ -43,6 +43,16 @@
 	},								\
 }
 
+/*
+ * Maximum branch records in BRBE
+ */
+#define BRBE_MAX_ENTRIES 64
+
+struct brbe_records {
+	struct perf_branch_stack	brbe_stack;
+	struct perf_branch_entry	brbe_entries[BRBE_MAX_ENTRIES];
+};
+
 /* The events for a given PMU register set. */
 struct pmu_hw_events {
 	/*
@@ -69,6 +79,22 @@ struct pmu_hw_events {
 	struct arm_pmu		*percpu_pmu;
 
 	int irq;
+
+	/* Detected BRBE attributes */
+	bool				v1p1;
+	int				brbe_cc;
+	int				brbe_nr;
+
+	/* Evaluated BRBE configuration */
+	u64				brbfcr;
+	u64				brbcr;
+
+	/* Tracked BRBE context */
+	unsigned int			brbe_users;
+	void				*brbe_context;
+
+	/* Captured BRBE buffer - copied as is into perf_sample_data */
+	struct brbe_records		*branches;
 };
 
 enum armpmu_attr_groups {
-- 
2.25.1


_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel

^ permalink raw reply related	[flat|nested] 18+ messages in thread

* [PATCH V3 4/7] driver/perf/arm_pmu_platform: Add support for BRBE attributes detection
  2022-09-29  7:58 [PATCH V3 0/7] arm64/perf: Enable branch stack sampling Anshuman Khandual
                   ` (2 preceding siblings ...)
  2022-09-29  7:58 ` [PATCH V3 3/7] arm64/perf: Update struct pmu_hw_events " Anshuman Khandual
@ 2022-09-29  7:58 ` Anshuman Khandual
  2022-10-06 13:37   ` James Clark
  2022-09-29  7:58 ` [PATCH V3 5/7] arm64/perf: Drive BRBE from perf event states Anshuman Khandual
                   ` (2 subsequent siblings)
  6 siblings, 1 reply; 18+ messages in thread
From: Anshuman Khandual @ 2022-09-29  7:58 UTC (permalink / raw)
  To: linux-kernel, linux-perf-users, linux-arm-kernel, peterz, acme,
	mark.rutland, will, catalin.marinas
  Cc: Anshuman Khandual

This adds arm pmu infrastrure to probe BRBE implementation's attributes via
driver exported callbacks later. The actual BRBE feature detection will be
added by the driver itself.

CPU specific BRBE entries, cycle count, format support gets detected during
PMU init. This information gets saved in per-cpu struct pmu_hw_events which
later helps in operating BRBE during a perf event context.

Cc: Will Deacon <will@kernel.org>
Cc: Mark Rutland <mark.rutland@arm.com>
Cc: linux-arm-kernel@lists.infradead.org
Cc: linux-kernel@vger.kernel.org
Signed-off-by: Anshuman Khandual <anshuman.khandual@arm.com>
---
 drivers/perf/arm_pmu_platform.c | 34 +++++++++++++++++++++++++++++++++
 1 file changed, 34 insertions(+)

diff --git a/drivers/perf/arm_pmu_platform.c b/drivers/perf/arm_pmu_platform.c
index 933b96e243b8..acdc445081aa 100644
--- a/drivers/perf/arm_pmu_platform.c
+++ b/drivers/perf/arm_pmu_platform.c
@@ -172,6 +172,36 @@ static int armpmu_request_irqs(struct arm_pmu *armpmu)
 	return err;
 }
 
+static void arm_brbe_probe_cpu(void *info)
+{
+	struct pmu_hw_events *hw_events;
+	struct arm_pmu *armpmu = info;
+
+	/*
+	 * Return from here, if BRBE driver has not been
+	 * implemented for this PMU. This helps prevent
+	 * kernel crash later when brbe_probe() will be
+	 * called on the PMU.
+	 */
+	if (!armpmu->brbe_probe)
+		return;
+
+	hw_events = per_cpu_ptr(armpmu->hw_events, smp_processor_id());
+	armpmu->brbe_probe(hw_events);
+}
+
+static int armpmu_request_brbe(struct arm_pmu *armpmu)
+{
+	int cpu, err = 0;
+
+	for_each_cpu(cpu, &armpmu->supported_cpus) {
+		err = smp_call_function_single(cpu, arm_brbe_probe_cpu, armpmu, 1);
+		if (err)
+			return err;
+	}
+	return err;
+}
+
 static void armpmu_free_irqs(struct arm_pmu *armpmu)
 {
 	int cpu;
@@ -229,6 +259,10 @@ int arm_pmu_device_probe(struct platform_device *pdev,
 	if (ret)
 		goto out_free_irqs;
 
+	ret = armpmu_request_brbe(pmu);
+	if (ret)
+		goto out_free_irqs;
+
 	ret = armpmu_register(pmu);
 	if (ret) {
 		dev_err(dev, "failed to register PMU devices!\n");
-- 
2.25.1


_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel

^ permalink raw reply related	[flat|nested] 18+ messages in thread

* [PATCH V3 5/7] arm64/perf: Drive BRBE from perf event states
  2022-09-29  7:58 [PATCH V3 0/7] arm64/perf: Enable branch stack sampling Anshuman Khandual
                   ` (3 preceding siblings ...)
  2022-09-29  7:58 ` [PATCH V3 4/7] driver/perf/arm_pmu_platform: Add support for BRBE attributes detection Anshuman Khandual
@ 2022-09-29  7:58 ` Anshuman Khandual
  2022-09-29  7:58 ` [PATCH V3 6/7] arm64/perf: Add BRBE driver Anshuman Khandual
  2022-09-29  7:58 ` [PATCH V3 7/7] arm64/perf: Enable branch stack sampling Anshuman Khandual
  6 siblings, 0 replies; 18+ messages in thread
From: Anshuman Khandual @ 2022-09-29  7:58 UTC (permalink / raw)
  To: linux-kernel, linux-perf-users, linux-arm-kernel, peterz, acme,
	mark.rutland, will, catalin.marinas
  Cc: Anshuman Khandual, Ingo Molnar

Branch stack sampling rides along the normal perf event and all the branch
records get captured during the PMU interrupt. This just changes perf event
handling on the arm64 platform to accommodate required BRBE operations that
will enable branch stack sampling support.

Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: Arnaldo Carvalho de Melo <acme@kernel.org>
Cc: Mark Rutland <mark.rutland@arm.com>
Cc: Will Deacon <will@kernel.org>
Cc: Catalin Marinas <catalin.marinas@arm.com>
Cc: linux-perf-users@vger.kernel.org
Cc: linux-kernel@vger.kernel.org
Cc: linux-arm-kernel@lists.infradead.org
Signed-off-by: Anshuman Khandual <anshuman.khandual@arm.com>
---
 arch/arm64/kernel/perf_event.c |  6 +++++
 drivers/perf/arm_pmu.c         | 40 ++++++++++++++++++++++++++++++++++
 2 files changed, 46 insertions(+)

diff --git a/arch/arm64/kernel/perf_event.c b/arch/arm64/kernel/perf_event.c
index e7013699171f..6793b25c3f21 100644
--- a/arch/arm64/kernel/perf_event.c
+++ b/arch/arm64/kernel/perf_event.c
@@ -874,6 +874,12 @@ static irqreturn_t armv8pmu_handle_irq(struct arm_pmu *cpu_pmu)
 		if (!armpmu_event_set_period(event))
 			continue;
 
+		if (has_branch_stack(event)) {
+			cpu_pmu->brbe_read(cpuc, event);
+			data.br_stack = &cpuc->branches->brbe_stack;
+			cpu_pmu->brbe_reset(cpuc);
+		}
+
 		/*
 		 * Perf event overflow will queue the processing of the event as
 		 * an irq_work which will be taken care of in the handling of
diff --git a/drivers/perf/arm_pmu.c b/drivers/perf/arm_pmu.c
index 16fda9a1229f..93b36933124f 100644
--- a/drivers/perf/arm_pmu.c
+++ b/drivers/perf/arm_pmu.c
@@ -271,12 +271,22 @@ armpmu_stop(struct perf_event *event, int flags)
 {
 	struct arm_pmu *armpmu = to_arm_pmu(event->pmu);
 	struct hw_perf_event *hwc = &event->hw;
+	struct pmu_hw_events *hw_events = this_cpu_ptr(armpmu->hw_events);
 
 	/*
 	 * ARM pmu always has to update the counter, so ignore
 	 * PERF_EF_UPDATE, see comments in armpmu_start().
 	 */
 	if (!(hwc->state & PERF_HES_STOPPED)) {
+		if (has_branch_stack(event)) {
+			WARN_ON_ONCE(!hw_events->brbe_users);
+			hw_events->brbe_users--;
+			if (!hw_events->brbe_users) {
+				hw_events->brbe_context = NULL;
+				armpmu->brbe_disable(hw_events);
+			}
+		}
+
 		armpmu->disable(event);
 		armpmu_event_update(event);
 		hwc->state |= PERF_HES_STOPPED | PERF_HES_UPTODATE;
@@ -287,6 +297,7 @@ static void armpmu_start(struct perf_event *event, int flags)
 {
 	struct arm_pmu *armpmu = to_arm_pmu(event->pmu);
 	struct hw_perf_event *hwc = &event->hw;
+	struct pmu_hw_events *hw_events = this_cpu_ptr(armpmu->hw_events);
 
 	/*
 	 * ARM pmu always has to reprogram the period, so ignore
@@ -304,6 +315,14 @@ static void armpmu_start(struct perf_event *event, int flags)
 	 * happened since disabling.
 	 */
 	armpmu_event_set_period(event);
+	if (has_branch_stack(event)) {
+		if (event->ctx->task && hw_events->brbe_context != event->ctx) {
+			armpmu->brbe_reset(hw_events);
+			hw_events->brbe_context = event->ctx;
+		}
+		armpmu->brbe_enable(hw_events);
+		hw_events->brbe_users++;
+	}
 	armpmu->enable(event);
 }
 
@@ -349,6 +368,10 @@ armpmu_add(struct perf_event *event, int flags)
 	hw_events->events[idx] = event;
 
 	hwc->state = PERF_HES_STOPPED | PERF_HES_UPTODATE;
+
+	if (has_branch_stack(event))
+		armpmu->brbe_filter(hw_events, event);
+
 	if (flags & PERF_EF_START)
 		armpmu_start(event, PERF_EF_RELOAD);
 
@@ -443,6 +466,7 @@ __hw_perf_event_init(struct perf_event *event)
 {
 	struct arm_pmu *armpmu = to_arm_pmu(event->pmu);
 	struct hw_perf_event *hwc = &event->hw;
+	struct pmu_hw_events *hw_events = this_cpu_ptr(armpmu->hw_events);
 	int mapping;
 
 	hwc->flags = 0;
@@ -492,6 +516,9 @@ __hw_perf_event_init(struct perf_event *event)
 		local64_set(&hwc->period_left, hwc->sample_period);
 	}
 
+	if (has_branch_stack(event))
+		armpmu->brbe_filter(hw_events, event);
+
 	return validate_group(event);
 }
 
@@ -520,6 +547,18 @@ static int armpmu_event_init(struct perf_event *event)
 	return __hw_perf_event_init(event);
 }
 
+static void armpmu_sched_task(struct perf_event_context *ctx, bool sched_in)
+{
+	struct arm_pmu *armpmu = to_arm_pmu(ctx->pmu);
+	struct pmu_hw_events *hw_events = this_cpu_ptr(armpmu->hw_events);
+
+	if (!hw_events->brbe_users)
+		return;
+
+	if (sched_in)
+		armpmu->brbe_reset(hw_events);
+}
+
 static void armpmu_enable(struct pmu *pmu)
 {
 	struct arm_pmu *armpmu = to_arm_pmu(pmu);
@@ -877,6 +916,7 @@ static struct arm_pmu *__armpmu_alloc(gfp_t flags)
 	}
 
 	pmu->pmu = (struct pmu) {
+		.sched_task	= armpmu_sched_task,
 		.pmu_enable	= armpmu_enable,
 		.pmu_disable	= armpmu_disable,
 		.event_init	= armpmu_event_init,
-- 
2.25.1


_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel

^ permalink raw reply related	[flat|nested] 18+ messages in thread

* [PATCH V3 6/7] arm64/perf: Add BRBE driver
  2022-09-29  7:58 [PATCH V3 0/7] arm64/perf: Enable branch stack sampling Anshuman Khandual
                   ` (4 preceding siblings ...)
  2022-09-29  7:58 ` [PATCH V3 5/7] arm64/perf: Drive BRBE from perf event states Anshuman Khandual
@ 2022-09-29  7:58 ` Anshuman Khandual
  2022-09-29  7:58 ` [PATCH V3 7/7] arm64/perf: Enable branch stack sampling Anshuman Khandual
  6 siblings, 0 replies; 18+ messages in thread
From: Anshuman Khandual @ 2022-09-29  7:58 UTC (permalink / raw)
  To: linux-kernel, linux-perf-users, linux-arm-kernel, peterz, acme,
	mark.rutland, will, catalin.marinas
  Cc: Anshuman Khandual, Ingo Molnar

This adds a BRBE driver which implements all the required helper functions
for struct arm_pmu. Following functions are defined by this driver which
will configure, enable, capture, reset and disable BRBE buffer HW as and
when requested via perf branch stack sampling framework.

- arm64_pmu_brbe_filter()
- arm64_pmu_brbe_enable()
- arm64_pmu_brbe_disable()
- arm64_pmu_brbe_read()
- arm64_pmu_brbe_probe()
- arm64_pmu_brbe_reset()
- arm64_pmu_brbe_supported()

Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: Arnaldo Carvalho de Melo <acme@kernel.org>
Cc: Mark Rutland <mark.rutland@arm.com>
Cc: Will Deacon <will@kernel.org>
Cc: Catalin Marinas <catalin.marinas@arm.com>
Cc: linux-arm-kernel@lists.infradead.org
Cc: linux-perf-users@vger.kernel.org
Cc: linux-kernel@vger.kernel.org
Signed-off-by: Anshuman Khandual <anshuman.khandual@arm.com>
---
 arch/arm64/kernel/perf_event.c |   8 +-
 drivers/perf/Kconfig           |  11 +
 drivers/perf/Makefile          |   1 +
 drivers/perf/arm_pmu_brbe.c    | 447 +++++++++++++++++++++++++++++++++
 drivers/perf/arm_pmu_brbe.h    | 259 +++++++++++++++++++
 include/linux/perf/arm_pmu.h   |  20 ++
 6 files changed, 745 insertions(+), 1 deletion(-)
 create mode 100644 drivers/perf/arm_pmu_brbe.c
 create mode 100644 drivers/perf/arm_pmu_brbe.h

diff --git a/arch/arm64/kernel/perf_event.c b/arch/arm64/kernel/perf_event.c
index 6793b25c3f21..6917f9b100e4 100644
--- a/arch/arm64/kernel/perf_event.c
+++ b/arch/arm64/kernel/perf_event.c
@@ -1033,31 +1033,37 @@ static int armv8pmu_filter_match(struct perf_event *event)
 
 static void armv8pmu_brbe_filter(struct pmu_hw_events *hw_event, struct perf_event *event)
 {
+	arm64_pmu_brbe_filter(hw_event, event);
 }
 
 static void armv8pmu_brbe_enable(struct pmu_hw_events *hw_event)
 {
+	arm64_pmu_brbe_enable(hw_event);
 }
 
 static void armv8pmu_brbe_disable(struct pmu_hw_events *hw_event)
 {
+	arm64_pmu_brbe_disable(hw_event);
 }
 
 static void armv8pmu_brbe_read(struct pmu_hw_events *hw_event, struct perf_event *event)
 {
+	arm64_pmu_brbe_read(hw_event, event);
 }
 
 static void armv8pmu_brbe_probe(struct pmu_hw_events *hw_event)
 {
+	arm64_pmu_brbe_probe(hw_event);
 }
 
 static void armv8pmu_brbe_reset(struct pmu_hw_events *hw_event)
 {
+	arm64_pmu_brbe_reset(hw_event);
 }
 
 static bool armv8pmu_brbe_supported(struct perf_event *event)
 {
-	return false;
+	return arm64_pmu_brbe_supported(event);
 }
 
 static void armv8pmu_reset(void *info)
diff --git a/drivers/perf/Kconfig b/drivers/perf/Kconfig
index 1e2d69453771..9fa34a1d3a23 100644
--- a/drivers/perf/Kconfig
+++ b/drivers/perf/Kconfig
@@ -183,6 +183,17 @@ config APPLE_M1_CPU_PMU
 	  Provides support for the non-architectural CPU PMUs present on
 	  the Apple M1 SoCs and derivatives.
 
+config ARM_BRBE_PMU
+	tristate "Enable support for Branch Record Buffer Extension (BRBE)"
+	depends on ARM64 && ARM_PMU
+	default y
+	help
+	  Enable perf support for Branch Record Buffer Extension (BRBE) which
+	  records all branches taken in an execution path. This supports some
+	  branch types and privilege based filtering. It captured additional
+	  relevant information such as cycle count, misprediction and branch
+	  type, branch privilege level etc.
+
 source "drivers/perf/hisilicon/Kconfig"
 
 config MARVELL_CN10K_DDR_PMU
diff --git a/drivers/perf/Makefile b/drivers/perf/Makefile
index 57a279c61df5..b81fc134d95f 100644
--- a/drivers/perf/Makefile
+++ b/drivers/perf/Makefile
@@ -20,3 +20,4 @@ obj-$(CONFIG_ARM_DMC620_PMU) += arm_dmc620_pmu.o
 obj-$(CONFIG_MARVELL_CN10K_TAD_PMU) += marvell_cn10k_tad_pmu.o
 obj-$(CONFIG_MARVELL_CN10K_DDR_PMU) += marvell_cn10k_ddr_pmu.o
 obj-$(CONFIG_APPLE_M1_CPU_PMU) += apple_m1_cpu_pmu.o
+obj-$(CONFIG_ARM_BRBE_PMU) += arm_pmu_brbe.o
diff --git a/drivers/perf/arm_pmu_brbe.c b/drivers/perf/arm_pmu_brbe.c
new file mode 100644
index 000000000000..38be8f05e3d5
--- /dev/null
+++ b/drivers/perf/arm_pmu_brbe.c
@@ -0,0 +1,447 @@
+// SPDX-License-Identifier: GPL-2.0-only
+/*
+ * Branch Record Buffer Extension Driver.
+ *
+ * Copyright (C) 2021 ARM Limited
+ *
+ * Author: Anshuman Khandual <anshuman.khandual@arm.com>
+ */
+#include "arm_pmu_brbe.h"
+
+#define BRBFCR_BRANCH_ALL	(BRBFCR_EL1_DIRECT | BRBFCR_EL1_INDIRECT | \
+				 BRBFCR_EL1_RTN | BRBFCR_EL1_INDCALL | \
+				 BRBFCR_EL1_DIRCALL | BRBFCR_EL1_CONDDIR)
+
+#define BRBE_FCR_MASK (BRBFCR_BRANCH_ALL)
+#define BRBE_CR_MASK  (BRBCR_EL1_EXCEPTION | BRBCR_EL1_ERTN | BRBCR_EL1_CC | \
+		       BRBCR_EL1_MPRED | BRBCR_EL1_E1BRE | BRBCR_EL1_E0BRE)
+
+static void set_brbe_disabled(struct pmu_hw_events *cpuc)
+{
+	cpuc->brbe_nr = 0;
+}
+
+static bool brbe_disabled(struct pmu_hw_events *cpuc)
+{
+	return !cpuc->brbe_nr;
+}
+
+bool arm64_pmu_brbe_supported(struct perf_event *event)
+{
+	struct arm_pmu *armpmu = to_arm_pmu(event->pmu);
+	struct pmu_hw_events *hw_events = per_cpu_ptr(armpmu->hw_events, event->cpu);
+
+	/*
+	 * If the event does not have at least one of the privilege
+	 * branch filters as in PERF_SAMPLE_BRANCH_PLM_ALL, the core
+	 * perf will adjust its value based on perf event's existing
+	 * privilege level via attr.exclude_[user|kernel|hv].
+	 *
+	 * As event->attr.branch_sample_type might have been changed
+	 * when the event reaches here, it is not possible to figure
+	 * out whether the event originally had HV privilege request
+	 * or got added via the core perf. Just report this situation
+	 * once and continue ignoring if there are other instances.
+	 */
+	if (event->attr.branch_sample_type & PERF_SAMPLE_BRANCH_HV)
+		pr_warn_once("does not support hypervisor privilege branch filter\n");
+
+	if (event->attr.branch_sample_type & PERF_SAMPLE_BRANCH_ABORT_TX) {
+		pr_warn_once("does not support aborted transaction branch filter\n");
+		return false;
+	}
+
+	if (event->attr.branch_sample_type & PERF_SAMPLE_BRANCH_NO_TX) {
+		pr_warn_once("does not support non transaction branch filter\n");
+		return false;
+	}
+
+	if (event->attr.branch_sample_type & PERF_SAMPLE_BRANCH_IN_TX) {
+		pr_warn_once("does not support in transaction branch filter\n");
+		return false;
+	}
+	return !brbe_disabled(hw_events);
+}
+
+void arm64_pmu_brbe_probe(struct pmu_hw_events *cpuc)
+{
+	u64 aa64dfr0, brbidr;
+	unsigned int brbe, format, cpu = smp_processor_id();
+
+	aa64dfr0 = read_sysreg_s(SYS_ID_AA64DFR0_EL1);
+	brbe = cpuid_feature_extract_unsigned_field(aa64dfr0, ID_AA64DFR0_BRBE_SHIFT);
+	if (!brbe) {
+		pr_info("no implementation found on cpu %d\n", cpu);
+		set_brbe_disabled(cpuc);
+		return;
+	} else if (brbe == ID_AA64DFR0_BRBE) {
+		pr_info("implementation found on cpu %d\n", cpu);
+		cpuc->v1p1 = false;
+	} else if (brbe == ID_AA64DFR0_BRBE_V1P1) {
+		pr_info("implementation (v1p1) found on cpu %d\n", cpu);
+		cpuc->v1p1 = true;
+	}
+
+	brbidr = read_sysreg_s(SYS_BRBIDR0_EL1);
+	format = brbe_fetch_format(brbidr);
+	if (format != BRBIDR0_EL1_FORMAT_0) {
+		pr_warn("format 0 not implemented\n");
+		set_brbe_disabled(cpuc);
+		return;
+	}
+
+	cpuc->brbe_cc = brbe_fetch_cc_bits(brbidr);
+	if (cpuc->brbe_cc != BRBIDR0_EL1_CC_20_BIT) {
+		pr_warn("20-bit counter not implemented\n");
+		set_brbe_disabled(cpuc);
+		return;
+	}
+
+	cpuc->brbe_nr = brbe_fetch_numrec(brbidr);
+	if (!valid_brbe_nr(cpuc->brbe_nr)) {
+		pr_warn("invalid number of records\n");
+		set_brbe_disabled(cpuc);
+		return;
+	}
+}
+
+void arm64_pmu_brbe_enable(struct pmu_hw_events *cpuc)
+{
+	u64 brbfcr, brbcr;
+
+	if (brbe_disabled(cpuc))
+		return;
+
+	brbfcr = read_sysreg_s(SYS_BRBFCR_EL1);
+	brbfcr &= ~BRBFCR_EL1_BANK_MASK;
+	brbfcr &= ~(BRBFCR_EL1_ENL | BRBFCR_EL1_PAUSED | BRBE_FCR_MASK);
+	brbfcr |= (cpuc->brbfcr & BRBE_FCR_MASK);
+	write_sysreg_s(brbfcr, SYS_BRBFCR_EL1);
+	isb();
+
+	brbcr = read_sysreg_s(SYS_BRBCR_EL1);
+	brbcr &= ~BRBE_CR_MASK;
+	brbcr |= BRBCR_EL1_FZP;
+	brbcr |= (BRBCR_EL1_TS_PHYSICAL << BRBCR_EL1_TS_SHIFT);
+	brbcr |= (cpuc->brbcr & BRBE_CR_MASK);
+	write_sysreg_s(brbcr, SYS_BRBCR_EL1);
+	isb();
+}
+
+void arm64_pmu_brbe_disable(struct pmu_hw_events *cpuc)
+{
+	u64 brbcr;
+
+	if (brbe_disabled(cpuc))
+		return;
+
+	brbcr = read_sysreg_s(SYS_BRBCR_EL1);
+	brbcr &= ~(BRBCR_EL1_E0BRE | BRBCR_EL1_E1BRE);
+	write_sysreg_s(brbcr, SYS_BRBCR_EL1);
+	isb();
+}
+
+static void perf_branch_to_brbfcr(struct pmu_hw_events *cpuc, int branch_type)
+{
+	cpuc->brbfcr = 0;
+
+	if (branch_type & PERF_SAMPLE_BRANCH_ANY) {
+		cpuc->brbfcr |= BRBFCR_BRANCH_ALL;
+		return;
+	}
+
+	if (branch_type & PERF_SAMPLE_BRANCH_ANY_CALL)
+		cpuc->brbfcr |= (BRBFCR_EL1_INDCALL | BRBFCR_EL1_DIRCALL);
+
+	if (branch_type & PERF_SAMPLE_BRANCH_ANY_RETURN)
+		cpuc->brbfcr |= BRBFCR_EL1_RTN;
+
+	if (branch_type & PERF_SAMPLE_BRANCH_IND_CALL)
+		cpuc->brbfcr |= BRBFCR_EL1_INDCALL;
+
+	if (branch_type & PERF_SAMPLE_BRANCH_COND)
+		cpuc->brbfcr |= BRBFCR_EL1_CONDDIR;
+
+	if (branch_type & PERF_SAMPLE_BRANCH_IND_JUMP)
+		cpuc->brbfcr |= BRBFCR_EL1_INDIRECT;
+
+	if (branch_type & PERF_SAMPLE_BRANCH_CALL)
+		cpuc->brbfcr |= BRBFCR_EL1_DIRCALL;
+}
+
+static void perf_branch_to_brbcr(struct pmu_hw_events *cpuc, int branch_type)
+{
+	cpuc->brbcr = (BRBCR_EL1_CC | BRBCR_EL1_MPRED);
+
+	if (branch_type & PERF_SAMPLE_BRANCH_USER)
+		cpuc->brbcr |= BRBCR_EL1_E0BRE;
+
+	if (branch_type & PERF_SAMPLE_BRANCH_NO_CYCLES)
+		cpuc->brbcr &= ~BRBCR_EL1_CC;
+
+	if (branch_type & PERF_SAMPLE_BRANCH_NO_FLAGS)
+		cpuc->brbcr &= ~BRBCR_EL1_MPRED;
+
+	if (branch_type & PERF_SAMPLE_BRANCH_KERNEL)
+		cpuc->brbcr |= BRBCR_EL1_E1BRE;
+	else
+		return;
+
+	/*
+	 * The exception and exception return branches could be
+	 * captured only when the event has necessary privilege
+	 * indicated via branch type PERF_SAMPLE_BRANCH_KERNEL,
+	 * which has been ascertained in generic perf. Please
+	 * refer perf_copy_attr() for more details.
+	 */
+	if (branch_type & PERF_SAMPLE_BRANCH_ANY) {
+		cpuc->brbcr |= BRBCR_EL1_EXCEPTION;
+		cpuc->brbcr |= BRBCR_EL1_ERTN;
+		return;
+	}
+
+	if (branch_type & PERF_SAMPLE_BRANCH_ANY_CALL)
+		cpuc->brbcr |= BRBCR_EL1_EXCEPTION;
+
+	if (branch_type & PERF_SAMPLE_BRANCH_ANY_RETURN)
+		cpuc->brbcr |= BRBCR_EL1_ERTN;
+}
+
+
+void arm64_pmu_brbe_filter(struct pmu_hw_events *cpuc, struct perf_event *event)
+{
+	u64 branch_type = event->attr.branch_sample_type;
+
+	if (brbe_disabled(cpuc))
+		return;
+
+	perf_branch_to_brbfcr(cpuc, branch_type);
+	perf_branch_to_brbcr(cpuc, branch_type);
+}
+
+static int brbe_fetch_perf_type(u64 brbinf, bool *new_branch_type)
+{
+	int brbe_type = brbe_fetch_type(brbinf);
+	*new_branch_type = false;
+
+	switch (brbe_type) {
+	case BRBINF_EL1_TYPE_UNCOND_DIR:
+		return PERF_BR_UNCOND;
+	case BRBINF_EL1_TYPE_INDIR:
+		return PERF_BR_IND;
+	case BRBINF_EL1_TYPE_DIR_LINK:
+		return PERF_BR_CALL;
+	case BRBINF_EL1_TYPE_INDIR_LINK:
+		return PERF_BR_IND_CALL;
+	case BRBINF_EL1_TYPE_RET_SUB:
+		return PERF_BR_RET;
+	case BRBINF_EL1_TYPE_COND_DIR:
+		return PERF_BR_COND;
+	case BRBINF_EL1_TYPE_CALL:
+		return PERF_BR_CALL;
+	case BRBINF_EL1_TYPE_TRAP:
+		return PERF_BR_SYSCALL;
+	case BRBINF_EL1_TYPE_RET_EXCPT:
+		return PERF_BR_ERET;
+	case BRBINF_EL1_TYPE_IRQ:
+		return PERF_BR_IRQ;
+	case BRBINF_EL1_TYPE_DEBUG_HALT:
+		*new_branch_type = true;
+		return PERF_BR_ARM64_DEBUG_HALT;
+	case BRBINF_EL1_TYPE_SERROR:
+		return PERF_BR_SERROR;
+	case BRBINF_EL1_TYPE_INST_DEBUG:
+		*new_branch_type = true;
+		return PERF_BR_ARM64_DEBUG_INST;
+	case BRBINF_EL1_TYPE_DATA_DEBUG:
+		*new_branch_type = true;
+		return PERF_BR_ARM64_DEBUG_DATA;
+	case BRBINF_EL1_TYPE_ALGN_FAULT:
+		*new_branch_type = true;
+		return PERF_BR_NEW_FAULT_ALGN;
+	case BRBINF_EL1_TYPE_INST_FAULT:
+		*new_branch_type = true;
+		return PERF_BR_NEW_FAULT_INST;
+	case BRBINF_EL1_TYPE_DATA_FAULT:
+		*new_branch_type = true;
+		return PERF_BR_NEW_FAULT_DATA;
+	case BRBINF_EL1_TYPE_FIQ:
+		*new_branch_type = true;
+		return PERF_BR_ARM64_FIQ;
+	case BRBINF_EL1_TYPE_DEBUG_EXIT:
+		*new_branch_type = true;
+		return PERF_BR_ARM64_DEBUG_EXIT;
+	default:
+		pr_warn("unknown branch type captured\n");
+		return PERF_BR_UNKNOWN;
+	}
+}
+
+static int brbe_fetch_perf_priv(u64 brbinf)
+{
+       int brbe_el = brbe_fetch_el(brbinf);
+
+       switch (brbe_el) {
+       case BRBINF_EL1_EL_EL0:
+               return PERF_BR_PRIV_USER;
+       case BRBINF_EL1_EL_EL1:
+               return PERF_BR_PRIV_KERNEL;
+       case BRBINF_EL1_EL_EL2:
+               if (is_kernel_in_hyp_mode())
+                       return PERF_BR_PRIV_KERNEL;
+               return PERF_BR_PRIV_HV;
+       default:
+               pr_warn("unknown branch privilege captured\n");
+               return PERF_BR_PRIV_UNKNOWN;
+       }
+}
+
+static void capture_brbe_flags(struct pmu_hw_events *cpuc, struct perf_event *event,
+			       u64 brbinf, int idx)
+{
+	int branch_type, type = brbe_record_valid(brbinf);
+	bool new_branch_type;
+
+	if (!branch_sample_no_cycles(event))
+		cpuc->branches->brbe_entries[idx].cycles = brbe_fetch_cycles(brbinf);
+
+	if (branch_sample_type(event)) {
+		branch_type = brbe_fetch_perf_type(brbinf, &new_branch_type);
+		if (new_branch_type) {
+			cpuc->branches->brbe_entries[idx].type = PERF_BR_EXTEND_ABI;
+			cpuc->branches->brbe_entries[idx].new_type = branch_type;
+		} else {
+			cpuc->branches->brbe_entries[idx].type = branch_type;
+		}
+	}
+
+	if (!branch_sample_no_flags(event)) {
+		/*
+		 * BRBINF_LASTFAILED does not indicate that the last transaction
+		 * got failed or aborted during the current branch record itself.
+		 * Rather, this indicates that all the branch records which were
+		 * in transaction until the curret branch record have failed. So
+		 * the entire BRBE buffer needs to be processed later on to find
+		 * all branch records which might have failed.
+		 */
+		cpuc->branches->brbe_entries[idx].abort = brbinf & BRBINF_EL1_LASTFAILED;
+
+		/*
+		 * All these information (i.e transaction state and mispredicts)
+		 * are not available for target only branch records.
+		 */
+		if (type != BRBINF_EL1_VALID_TARGET) {
+			cpuc->branches->brbe_entries[idx].mispred = brbinf & BRBINF_EL1_MPRED;
+			cpuc->branches->brbe_entries[idx].predicted = !(brbinf & BRBINF_EL1_MPRED);
+			cpuc->branches->brbe_entries[idx].in_tx = brbinf & BRBINF_EL1_TX;
+		}
+	}
+
+	if (branch_sample_priv(event)) {
+		/*
+		 * All these information (i.e branch privilege level) are not
+		 * available for source only branch records.
+		 */
+		if (type != BRBINF_EL1_VALID_SOURCE)
+			cpuc->branches->brbe_entries[idx].priv = brbe_fetch_perf_priv(brbinf);
+	}
+}
+
+/*
+ * A branch record with BRBINF_EL1.LASTFAILED set, implies that all
+ * preceding consecutive branch records, that were in a transaction
+ * (i.e their BRBINF_EL1.TX set) have been aborted.
+ *
+ * Similarly BRBFCR_EL1.LASTFAILED set, indicate that all preceding
+ * consecutive branch records upto the last record, which were in a
+ * transaction (i.e their BRBINF_EL1.TX set) have been aborted.
+ *
+ * --------------------------------- -------------------
+ * | 00 | BRBSRC | BRBTGT | BRBINF | | TX = 1 | LF = 0 | [TX success]
+ * --------------------------------- -------------------
+ * | 01 | BRBSRC | BRBTGT | BRBINF | | TX = 1 | LF = 0 | [TX success]
+ * --------------------------------- -------------------
+ * | 02 | BRBSRC | BRBTGT | BRBINF | | TX = 0 | LF = 0 |
+ * --------------------------------- -------------------
+ * | 03 | BRBSRC | BRBTGT | BRBINF | | TX = 1 | LF = 0 | [TX failed]
+ * --------------------------------- -------------------
+ * | 04 | BRBSRC | BRBTGT | BRBINF | | TX = 1 | LF = 0 | [TX failed]
+ * --------------------------------- -------------------
+ * | 05 | BRBSRC | BRBTGT | BRBINF | | TX = 0 | LF = 1 |
+ * --------------------------------- -------------------
+ * | .. | BRBSRC | BRBTGT | BRBINF | | TX = 0 | LF = 0 |
+ * --------------------------------- -------------------
+ * | 61 | BRBSRC | BRBTGT | BRBINF | | TX = 1 | LF = 0 | [TX failed]
+ * --------------------------------- -------------------
+ * | 62 | BRBSRC | BRBTGT | BRBINF | | TX = 1 | LF = 0 | [TX failed]
+ * --------------------------------- -------------------
+ * | 63 | BRBSRC | BRBTGT | BRBINF | | TX = 1 | LF = 0 | [TX failed]
+ * --------------------------------- -------------------
+ *
+ * BRBFCR_EL1.LASTFAILED == 1
+ *
+ * Here BRBFCR_EL1.LASTFAILED failes all those consecutive and also
+ * in transaction branches near the end of the BRBE buffer.
+ */
+static void process_branch_aborts(struct pmu_hw_events *cpuc)
+{
+	u64 brbfcr = read_sysreg_s(SYS_BRBFCR_EL1);
+	bool lastfailed = !!(brbfcr & BRBFCR_EL1_LASTFAILED);
+	int idx = cpuc->brbe_nr - 1;
+
+	do {
+		if (cpuc->branches->brbe_entries[idx].in_tx) {
+			cpuc->branches->brbe_entries[idx].abort = lastfailed;
+		} else {
+			lastfailed = cpuc->branches->brbe_entries[idx].abort;
+			cpuc->branches->brbe_entries[idx].abort = false;
+		}
+	} while (idx--, idx >= 0);
+}
+
+void arm64_pmu_brbe_read(struct pmu_hw_events *cpuc, struct perf_event *event)
+{
+	u64 brbinf;
+	int idx;
+
+	if (brbe_disabled(cpuc))
+		return;
+
+	set_brbe_paused();
+	for (idx = 0; idx < cpuc->brbe_nr; idx++) {
+		select_brbe_bank_index(idx);
+		brbinf = get_brbinf_reg(idx);
+		/*
+		 * There are no valid entries anymore on the buffer.
+		 * Abort the branch record processing to save some
+		 * cycles and also reduce the capture/process load
+		 * for the user space as well.
+		 */
+		if (brbe_invalid(brbinf))
+			break;
+
+		if (brbe_valid(brbinf)) {
+			cpuc->branches->brbe_entries[idx].from =  get_brbsrc_reg(idx);
+			cpuc->branches->brbe_entries[idx].to =  get_brbtgt_reg(idx);
+		} else if (brbe_source(brbinf)) {
+			cpuc->branches->brbe_entries[idx].from =  get_brbsrc_reg(idx);
+			cpuc->branches->brbe_entries[idx].to = 0;
+		} else if (brbe_target(brbinf)) {
+			cpuc->branches->brbe_entries[idx].from = 0;
+			cpuc->branches->brbe_entries[idx].to =  get_brbtgt_reg(idx);
+		}
+		capture_brbe_flags(cpuc, event, brbinf, idx);
+	}
+	cpuc->branches->brbe_stack.nr = idx;
+	cpuc->branches->brbe_stack.hw_idx = -1ULL;
+	process_branch_aborts(cpuc);
+}
+
+void arm64_pmu_brbe_reset(struct pmu_hw_events *cpuc)
+{
+	if (brbe_disabled(cpuc))
+		return;
+
+	asm volatile(BRB_IALL);
+	isb();
+}
diff --git a/drivers/perf/arm_pmu_brbe.h b/drivers/perf/arm_pmu_brbe.h
new file mode 100644
index 000000000000..22c4b25b1777
--- /dev/null
+++ b/drivers/perf/arm_pmu_brbe.h
@@ -0,0 +1,259 @@
+/* SPDX-License-Identifier: GPL-2.0-only */
+/*
+ * Branch Record Buffer Extension Helpers.
+ *
+ * Copyright (C) 2021 ARM Limited
+ *
+ * Author: Anshuman Khandual <anshuman.khandual@arm.com>
+ */
+#define pr_fmt(fmt) "brbe: " fmt
+
+#include <linux/perf/arm_pmu.h>
+
+/*
+ * BRBE Instructions
+ *
+ * BRB_IALL : Invalidate the entire buffer
+ * BRB_INJ  : Inject latest branch record derived from [BRBSRCINJ, BRBTGTINJ, BRBINFINJ]
+ */
+#define BRB_IALL __emit_inst(0xD5000000 | sys_insn(1, 1, 7, 2, 4) | (0x1f))
+#define BRB_INJ  __emit_inst(0xD5000000 | sys_insn(1, 1, 7, 2, 5) | (0x1f))
+
+/*
+ * BRBE Buffer Organization
+ *
+ * BRBE buffer is arranged as multiple banks of 32 branch record
+ * entries each. An indivdial branch record in a given bank could
+ * be accessedi, after selecting the bank in BRBFCR_EL1.BANK and
+ * accessing the registers i.e [BRBSRC, BRBTGT, BRBINF] set with
+ * indices [0..31].
+ *
+ * Bank 0
+ *
+ *	---------------------------------	------
+ *	| 00 | BRBSRC | BRBTGT | BRBINF |	| 00 |
+ *	---------------------------------	------
+ *	| 01 | BRBSRC | BRBTGT | BRBINF |	| 01 |
+ *	---------------------------------	------
+ *	| .. | BRBSRC | BRBTGT | BRBINF |	| .. |
+ *	---------------------------------	------
+ *	| 31 | BRBSRC | BRBTGT | BRBINF |	| 31 |
+ *	---------------------------------	------
+ *
+ * Bank 1
+ *
+ *	---------------------------------	------
+ *	| 32 | BRBSRC | BRBTGT | BRBINF |	| 00 |
+ *	---------------------------------	------
+ *	| 33 | BRBSRC | BRBTGT | BRBINF |	| 01 |
+ *	---------------------------------	------
+ *	| .. | BRBSRC | BRBTGT | BRBINF |	| .. |
+ *	---------------------------------	------
+ *	| 63 | BRBSRC | BRBTGT | BRBINF |	| 31 |
+ *	---------------------------------	------
+ */
+#define BRBE_BANK0_IDX_MIN 0
+#define BRBE_BANK0_IDX_MAX 31
+#define BRBE_BANK1_IDX_MIN 32
+#define BRBE_BANK1_IDX_MAX 63
+
+#define RETURN_READ_BRBSRCN(n) \
+	read_sysreg_s(SYS_BRBSRC##n##_EL1)
+
+#define RETURN_READ_BRBTGTN(n) \
+	read_sysreg_s(SYS_BRBTGT##n##_EL1)
+
+#define RETURN_READ_BRBINFN(n) \
+	read_sysreg_s(SYS_BRBINF##n##_EL1)
+
+#define BRBE_REGN_CASE(n, case_macro) \
+	case n: return case_macro(n); break
+
+#define BRBE_REGN_SWITCH(x, case_macro)				\
+	do {							\
+		switch (x) {					\
+		BRBE_REGN_CASE(0, case_macro);			\
+		BRBE_REGN_CASE(1, case_macro);			\
+		BRBE_REGN_CASE(2, case_macro);			\
+		BRBE_REGN_CASE(3, case_macro);			\
+		BRBE_REGN_CASE(4, case_macro);			\
+		BRBE_REGN_CASE(5, case_macro);			\
+		BRBE_REGN_CASE(6, case_macro);			\
+		BRBE_REGN_CASE(7, case_macro);			\
+		BRBE_REGN_CASE(8, case_macro);			\
+		BRBE_REGN_CASE(9, case_macro);			\
+		BRBE_REGN_CASE(10, case_macro);			\
+		BRBE_REGN_CASE(11, case_macro);			\
+		BRBE_REGN_CASE(12, case_macro);			\
+		BRBE_REGN_CASE(13, case_macro);			\
+		BRBE_REGN_CASE(14, case_macro);			\
+		BRBE_REGN_CASE(15, case_macro);			\
+		BRBE_REGN_CASE(16, case_macro);			\
+		BRBE_REGN_CASE(17, case_macro);			\
+		BRBE_REGN_CASE(18, case_macro);			\
+		BRBE_REGN_CASE(19, case_macro);			\
+		BRBE_REGN_CASE(20, case_macro);			\
+		BRBE_REGN_CASE(21, case_macro);			\
+		BRBE_REGN_CASE(22, case_macro);			\
+		BRBE_REGN_CASE(23, case_macro);			\
+		BRBE_REGN_CASE(24, case_macro);			\
+		BRBE_REGN_CASE(25, case_macro);			\
+		BRBE_REGN_CASE(26, case_macro);			\
+		BRBE_REGN_CASE(27, case_macro);			\
+		BRBE_REGN_CASE(28, case_macro);			\
+		BRBE_REGN_CASE(29, case_macro);			\
+		BRBE_REGN_CASE(30, case_macro);			\
+		BRBE_REGN_CASE(31, case_macro);			\
+		default:					\
+			pr_warn("unknown register index\n");	\
+			return -1;				\
+		}						\
+	} while (0)
+
+static inline int buffer_to_brbe_idx(int buffer_idx)
+{
+	return buffer_idx % 32;
+}
+
+static inline u64 get_brbsrc_reg(int buffer_idx)
+{
+	int brbe_idx = buffer_to_brbe_idx(buffer_idx);
+
+	BRBE_REGN_SWITCH(brbe_idx, RETURN_READ_BRBSRCN);
+}
+
+static inline u64 get_brbtgt_reg(int buffer_idx)
+{
+	int brbe_idx = buffer_to_brbe_idx(buffer_idx);
+
+	BRBE_REGN_SWITCH(brbe_idx, RETURN_READ_BRBTGTN);
+}
+
+static inline u64 get_brbinf_reg(int buffer_idx)
+{
+	int brbe_idx = buffer_to_brbe_idx(buffer_idx);
+
+	BRBE_REGN_SWITCH(brbe_idx, RETURN_READ_BRBINFN);
+}
+
+static inline u64 brbe_record_valid(u64 brbinf)
+{
+	return (brbinf & BRBINF_EL1_VALID_MASK) >> BRBINF_EL1_VALID_SHIFT;
+}
+
+static inline bool brbe_invalid(u64 brbinf)
+{
+	return brbe_record_valid(brbinf) == BRBINF_EL1_VALID_NONE;
+}
+
+static inline bool brbe_valid(u64 brbinf)
+{
+	return brbe_record_valid(brbinf) == BRBINF_EL1_VALID_FULL;
+}
+
+static inline bool brbe_source(u64 brbinf)
+{
+	return brbe_record_valid(brbinf) == BRBINF_EL1_VALID_SOURCE;
+}
+
+static inline bool brbe_target(u64 brbinf)
+{
+	return brbe_record_valid(brbinf) == BRBINF_EL1_VALID_TARGET;
+}
+
+static inline int brbe_fetch_cycles(u64 brbinf)
+{
+	/*
+	 * Captured cycle count is unknown and hence
+	 * should not be passed on the user space.
+	 */
+	if (brbinf & BRBINF_EL1_CCU)
+		return 0;
+
+	return (brbinf & BRBINF_EL1_CC_MASK) >> BRBINF_EL1_CC_SHIFT;
+}
+
+static inline int brbe_fetch_type(u64 brbinf)
+{
+	return (brbinf & BRBINF_EL1_TYPE_MASK) >> BRBINF_EL1_TYPE_SHIFT;
+}
+
+static inline int brbe_fetch_el(u64 brbinf)
+{
+	return (brbinf & BRBINF_EL1_EL_MASK) >> BRBINF_EL1_EL_SHIFT;
+}
+
+static inline int brbe_fetch_numrec(u64 brbidr)
+{
+	return (brbidr & BRBIDR0_EL1_NUMREC_MASK) >> BRBIDR0_EL1_NUMREC_SHIFT;
+}
+
+static inline int brbe_fetch_format(u64 brbidr)
+{
+	return (brbidr & BRBIDR0_EL1_FORMAT_MASK) >> BRBIDR0_EL1_FORMAT_SHIFT;
+}
+
+static inline int brbe_fetch_cc_bits(u64 brbidr)
+{
+	return (brbidr & BRBIDR0_EL1_CC_MASK) >> BRBIDR0_EL1_CC_SHIFT;
+}
+
+static inline void select_brbe_bank(int bank)
+{
+	static int brbe_current_bank = -1;
+	u64 brbfcr;
+
+	if (brbe_current_bank == bank)
+		return;
+
+	WARN_ON(bank > 1);
+	brbfcr = read_sysreg_s(SYS_BRBFCR_EL1);
+	brbfcr &= ~BRBFCR_EL1_BANK_MASK;
+	brbfcr |= ((bank << BRBFCR_EL1_BANK_SHIFT) & BRBFCR_EL1_BANK_MASK);
+	write_sysreg_s(brbfcr, SYS_BRBFCR_EL1);
+	isb();
+	brbe_current_bank = bank;
+}
+
+static inline void select_brbe_bank_index(int buffer_idx)
+{
+	switch (buffer_idx) {
+	case BRBE_BANK0_IDX_MIN ... BRBE_BANK0_IDX_MAX:
+		select_brbe_bank(0);
+		break;
+	case BRBE_BANK1_IDX_MIN ... BRBE_BANK1_IDX_MAX:
+		select_brbe_bank(1);
+		break;
+	default:
+		pr_warn("unsupported BRBE index\n");
+	}
+}
+
+static inline bool valid_brbe_nr(int brbe_nr)
+{
+	switch (brbe_nr) {
+	case BRBIDR0_EL1_NUMREC_8:
+	case BRBIDR0_EL1_NUMREC_16:
+	case BRBIDR0_EL1_NUMREC_32:
+	case BRBIDR0_EL1_NUMREC_64:
+		return true;
+	default:
+		pr_warn("unsupported BRBE entries\n");
+		return false;
+	}
+}
+
+static inline bool brbe_paused(void)
+{
+	u64 brbfcr = read_sysreg_s(SYS_BRBFCR_EL1);
+
+	return brbfcr & BRBFCR_EL1_PAUSED;
+}
+
+static inline void set_brbe_paused(void)
+{
+	u64 brbfcr = read_sysreg_s(SYS_BRBFCR_EL1);
+
+	write_sysreg_s(brbfcr | BRBFCR_EL1_PAUSED, SYS_BRBFCR_EL1);
+	isb();
+}
diff --git a/include/linux/perf/arm_pmu.h b/include/linux/perf/arm_pmu.h
index ffce43ceb670..3cd94d401f98 100644
--- a/include/linux/perf/arm_pmu.h
+++ b/include/linux/perf/arm_pmu.h
@@ -166,6 +166,26 @@ struct arm_pmu {
 	unsigned long acpi_cpuid;
 };
 
+#ifdef CONFIG_ARM_BRBE_PMU
+void arm64_pmu_brbe_filter(struct pmu_hw_events *hw_events, struct perf_event *event);
+void arm64_pmu_brbe_read(struct pmu_hw_events *cpuc, struct perf_event *event);
+void arm64_pmu_brbe_disable(struct pmu_hw_events *cpuc);
+void arm64_pmu_brbe_enable(struct pmu_hw_events *cpuc);
+void arm64_pmu_brbe_probe(struct pmu_hw_events *cpuc);
+void arm64_pmu_brbe_reset(struct pmu_hw_events *cpuc);
+bool arm64_pmu_brbe_supported(struct perf_event *event);
+#else
+static inline void arm64_pmu_brbe_filter(struct pmu_hw_events *hw_events, struct perf_event *event)
+{
+}
+static inline void arm64_pmu_brbe_read(struct pmu_hw_events *cpuc, struct perf_event *event) { }
+static inline void arm64_pmu_brbe_disable(struct pmu_hw_events *cpuc) { }
+static inline void arm64_pmu_brbe_enable(struct pmu_hw_events *cpuc) { }
+static inline void arm64_pmu_brbe_probe(struct pmu_hw_events *cpuc) { }
+static inline void arm64_pmu_brbe_reset(struct pmu_hw_events *cpuc) { }
+static inline bool arm64_pmu_brbe_supported(struct perf_event *event) {return false; }
+#endif
+
 #define to_arm_pmu(p) (container_of(p, struct arm_pmu, pmu))
 
 u64 armpmu_event_update(struct perf_event *event);
-- 
2.25.1


_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel

^ permalink raw reply related	[flat|nested] 18+ messages in thread

* [PATCH V3 7/7] arm64/perf: Enable branch stack sampling
  2022-09-29  7:58 [PATCH V3 0/7] arm64/perf: Enable branch stack sampling Anshuman Khandual
                   ` (5 preceding siblings ...)
  2022-09-29  7:58 ` [PATCH V3 6/7] arm64/perf: Add BRBE driver Anshuman Khandual
@ 2022-09-29  7:58 ` Anshuman Khandual
  2022-10-10 13:55   ` James Clark
  6 siblings, 1 reply; 18+ messages in thread
From: Anshuman Khandual @ 2022-09-29  7:58 UTC (permalink / raw)
  To: linux-kernel, linux-perf-users, linux-arm-kernel, peterz, acme,
	mark.rutland, will, catalin.marinas
  Cc: Anshuman Khandual

Now that all the required pieces are already in place, just enable the perf
branch stack sampling support on arm64 platform, by removing the gate which
blocks it in armpmu_event_init().

Cc: Mark Rutland <mark.rutland@arm.com>
Cc: Will Deacon <will@kernel.org>
Cc: Catalin Marinas <catalin.marinas@arm.com>
Cc: linux-kernel@vger.kernel.org
Cc: linux-arm-kernel@lists.infradead.org
Signed-off-by: Anshuman Khandual <anshuman.khandual@arm.com>
---
 drivers/perf/arm_pmu.c | 32 +++++++++++++++++++++++++++++---
 1 file changed, 29 insertions(+), 3 deletions(-)

diff --git a/drivers/perf/arm_pmu.c b/drivers/perf/arm_pmu.c
index 93b36933124f..2a9b988b53c2 100644
--- a/drivers/perf/arm_pmu.c
+++ b/drivers/perf/arm_pmu.c
@@ -537,9 +537,35 @@ static int armpmu_event_init(struct perf_event *event)
 		!cpumask_test_cpu(event->cpu, &armpmu->supported_cpus))
 		return -ENOENT;
 
-	/* does not support taken branch sampling */
-	if (has_branch_stack(event))
-		return -EOPNOTSUPP;
+	if (has_branch_stack(event)) {
+		/*
+		 * BRBE support is absent. Select CONFIG_ARM_BRBE_PMU
+		 * in the config, before branch stack sampling events
+		 * can be requested.
+		 */
+		if (!IS_ENABLED(CONFIG_ARM_BRBE_PMU)) {
+			pr_warn_once("BRBE is disabled, select CONFIG_ARM_BRBE_PMU\n");
+			return -EOPNOTSUPP;
+		}
+
+		if (event->attr.branch_sample_type & PERF_SAMPLE_BRANCH_KERNEL) {
+			if (!perfmon_capable()) {
+				pr_warn_once("does not have permission for kernel branch filter\n");
+				return -EPERM;
+			}
+		}
+
+		/*
+		 * Branch stack sampling event can not be supported in
+		 * case either the required driver itself is absent or
+		 * BRBE buffer, is not supported. Besides checking for
+		 * the callback prevents a crash in case it's absent.
+		 */
+		if (!armpmu->brbe_supported || !armpmu->brbe_supported(event)) {
+			pr_warn_once("BRBE is not supported\n");
+			return -EOPNOTSUPP;
+		}
+	}
 
 	if (armpmu->map_event(event) == -ENOENT)
 		return -ENOENT;
-- 
2.25.1


_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel

^ permalink raw reply related	[flat|nested] 18+ messages in thread

* Re: [PATCH V3 1/7] arm64/perf: Add BRBE registers and fields
  2022-09-29  7:58 ` [PATCH V3 1/7] arm64/perf: Add BRBE registers and fields Anshuman Khandual
@ 2022-09-29 11:29   ` Mark Brown
  2022-09-30  4:07     ` Anshuman Khandual
  0 siblings, 1 reply; 18+ messages in thread
From: Mark Brown @ 2022-09-29 11:29 UTC (permalink / raw)
  To: Anshuman Khandual
  Cc: linux-kernel, linux-perf-users, linux-arm-kernel, peterz, acme,
	mark.rutland, will, catalin.marinas, Marc Zyngier


[-- Attachment #1.1: Type: text/plain, Size: 2097 bytes --]

On Thu, Sep 29, 2022 at 01:28:51PM +0530, Anshuman Khandual wrote:

Thanks for doing this work - I did spot a few small issues though.

>  /* id_aa64dfr0 */
> +#define ID_AA64DFR0_BRBE_SHIFT		52
>  #define ID_AA64DFR0_MTPMU_SHIFT		48
>  #define ID_AA64DFR0_TRBE_SHIFT		44
>  #define ID_AA64DFR0_TRACE_FILT_SHIFT	40
> @@ -848,6 +952,9 @@
>  #define ID_AA64DFR0_PMSVER_8_2		0x1
>  #define ID_AA64DFR0_PMSVER_8_3		0x2
>  
> +#define ID_AA64DFR0_BRBE		0x1
> +#define ID_AA64DFR0_BRBE_V1P1		0x2
> +
>  #define ID_DFR0_PERFMON_SHIFT		24
>  
>  #define ID_DFR0_PERFMON_8_0		0x3

This is already done in -next as a result of ID_AA64DFR0_EL1 being
converted, the enumberation define comes out as
ID_AA64DFR0_EL1_BRBE_BRBE_V1P1.

> +# This is just a dummy register declaration to get all common field masks and
> +# shifts for accessing given BRBINF contents.
> +Sysreg	BRBINF_EL1	2	1	8	0	0
> +Res0	63:47
> +Field	46	CCU
> +Field	45:32	CC
> +Res0	31:18
> +Field	17	LASTFAILED
> +Field	16	TX

According to DDI0487I.a bit 16 is called T not TX.

> +Res0	15:14
> +Enum	13:8		TYPE

It's probably worth noting in the comment the issue with Enums here
that's meaning you're not using a SysregFields - I'm not sure what
people will think of this but providing a definition using the ID for
the 0th register does seem expedient.

> +Enum	7:6	EL
> +	0b00	EL0
> +	0b01	EL1
> +	0b10	EL2
> +EndEnum

According to DDI0487I.a 0b11 has the value EL3 (when FEAT_BRBEv1p1).

> +Sysreg	BRBCR_EL1	2	1	9	0	0
> +Res0	63:24
> +Field	23 	EXCEPTION
> +Field	22 	ERTN
> +Res0	21:9
> +Field	8 	FZP
> +Field	7	ZERO

According to DDI0487I.a bit 7 is Res0.

> +Field	2	ZERO1

According to DDI0487I.a bit 2 is Res0.

> +Sysreg	BRBFCR_EL1	2	1	9	0	1

> +Field	16	ENL

Accoding to DDI0487I.a this is EnI (ie, an L not an I).

> +Sysreg	BRBINFINJ_EL1	2	1	9	1	0

> +Field	16	TX

According to DDI0487I.a this is T not TX.

> +Enum	7:6	EL
> +	0b00	EL0
> +	0b01	EL1
> +	0b10	EL2
> +EndEnum

According to DDI0487I.a 0b11 has the value EL3 (when FEAT_BRBEv1p1).

[-- Attachment #1.2: signature.asc --]
[-- Type: application/pgp-signature, Size: 488 bytes --]

[-- Attachment #2: Type: text/plain, Size: 176 bytes --]

_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [PATCH V3 1/7] arm64/perf: Add BRBE registers and fields
  2022-09-29 11:29   ` Mark Brown
@ 2022-09-30  4:07     ` Anshuman Khandual
  0 siblings, 0 replies; 18+ messages in thread
From: Anshuman Khandual @ 2022-09-30  4:07 UTC (permalink / raw)
  To: Mark Brown
  Cc: linux-kernel, linux-perf-users, linux-arm-kernel, peterz, acme,
	mark.rutland, will, catalin.marinas, Marc Zyngier



On 9/29/22 16:59, Mark Brown wrote:
> On Thu, Sep 29, 2022 at 01:28:51PM +0530, Anshuman Khandual wrote:
> 
> Thanks for doing this work - I did spot a few small issues though.
> 
>>  /* id_aa64dfr0 */
>> +#define ID_AA64DFR0_BRBE_SHIFT		52
>>  #define ID_AA64DFR0_MTPMU_SHIFT		48
>>  #define ID_AA64DFR0_TRBE_SHIFT		44
>>  #define ID_AA64DFR0_TRACE_FILT_SHIFT	40
>> @@ -848,6 +952,9 @@
>>  #define ID_AA64DFR0_PMSVER_8_2		0x1
>>  #define ID_AA64DFR0_PMSVER_8_3		0x2
>>  
>> +#define ID_AA64DFR0_BRBE		0x1
>> +#define ID_AA64DFR0_BRBE_V1P1		0x2
>> +
>>  #define ID_DFR0_PERFMON_SHIFT		24
>>  
>>  #define ID_DFR0_PERFMON_8_0		0x3
> 
> This is already done in -next as a result of ID_AA64DFR0_EL1 being
> converted, the enumberation define comes out as
> ID_AA64DFR0_EL1_BRBE_BRBE_V1P1.

Right. I will rebase the series on upcoming 6.1-rc1 which should have all the
current -next patches including ID_AA64DFR0_EL1 migration into tools/sysreg.
This should just fall in place.

> 
>> +# This is just a dummy register declaration to get all common field masks and
>> +# shifts for accessing given BRBINF contents.
>> +Sysreg	BRBINF_EL1	2	1	8	0	0
>> +Res0	63:47
>> +Field	46	CCU
>> +Field	45:32	CC
>> +Res0	31:18
>> +Field	17	LASTFAILED
>> +Field	16	TX
> 
> According to DDI0487I.a bit 16 is called T not TX.

I understand :) The intention here was to keep the field name associated with
"transaction" some how. But I guess, it would be more important to keep it as
is matching the ARM ARM than something for readability purpose. Will change it
as 'T'.

> 
>> +Res0	15:14
>> +Enum	13:8		TYPE
> 
> It's probably worth noting in the comment the issue with Enums here
> that's meaning you're not using a SysregFields - I'm not sure what

Sure, will add a comment describing the problem of using enum elements inside
SysregFields definition.

> people will think of this but providing a definition using the ID for
> the 0th register does seem expedient.

I understand your concern but this turned out to be a better option

- Original sysreg.h based definitions had all field masks on the right end

	- When reading (reg >> field_shift) & field_mask
	- When writing (val & field_mask) << field_shift

- tools/sysreg format creates in-place field masks

	- When reading (reg & field_mask) >> field_shift
	- When writing (val << field_shift) & field_mask

- After moving some BRBE registers into tools/sysreg, the driver code had
  to be changed to accommodate these new write/read methods

- To avoid mix up in the BRBE driver, all BRBINF register fields need to be
  converted into in place masks, either in sysreg.h itself or moving into
  tools/sysreg

Moving BRBE fields into tools/sysreg via a dummy BRBINF_EL1 register seems
to achieve that objective. These common fields can be used to work on any
BRBINF<N>_EL1 register value. But I might just keep them in sysreg.h, if
the proposed solution is not preferable or seems expedient.

Later when enum support comes up in SysregFields and tools/sysreg supports
formula based crm/op2 expansion entire BRBINF, BRBSRC, BRBTGT register set
can be moved into tools/sysreg.

> 
>> +Enum	7:6	EL
>> +	0b00	EL0
>> +	0b01	EL1
>> +	0b10	EL2
>> +EndEnum
> 
> According to DDI0487I.a 0b11 has the value EL3 (when FEAT_BRBEv1p1).

Sure, will add it.

> 
>> +Sysreg	BRBCR_EL1	2	1	9	0	0
>> +Res0	63:24
>> +Field	23 	EXCEPTION
>> +Field	22 	ERTN
>> +Res0	21:9
>> +Field	8 	FZP
>> +Field	7	ZERO
> 
> According to DDI0487I.a bit 7 is Res0.

Sure, will change.

> 
>> +Field	2	ZERO1
> 
> According to DDI0487I.a bit 2 is Res0.

Sure, will change.

> 
>> +Sysreg	BRBFCR_EL1	2	1	9	0	1
> 
>> +Field	16	ENL
> 
> Accoding to DDI0487I.a this is EnI (ie, an L not an I).

ENL != Enl ? Do we need to match the case as well ?

> 
>> +Sysreg	BRBINFINJ_EL1	2	1	9	1	0
> 
>> +Field	16	TX
> 
> According to DDI0487I.a this is T not TX.

As mentioned, will change it as 'T'.

> 
>> +Enum	7:6	EL
>> +	0b00	EL0
>> +	0b01	EL1
>> +	0b10	EL2
>> +EndEnum
> 
> According to DDI0487I.a 0b11 has the value EL3 (when FEAT_BRBEv1p1).

Sure, will add.

_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [PATCH V3 4/7] driver/perf/arm_pmu_platform: Add support for BRBE attributes detection
  2022-09-29  7:58 ` [PATCH V3 4/7] driver/perf/arm_pmu_platform: Add support for BRBE attributes detection Anshuman Khandual
@ 2022-10-06 13:37   ` James Clark
  2022-10-10 14:17     ` James Clark
  2022-10-11  9:16     ` Anshuman Khandual
  0 siblings, 2 replies; 18+ messages in thread
From: James Clark @ 2022-10-06 13:37 UTC (permalink / raw)
  To: Anshuman Khandual
  Cc: linux-kernel, linux-perf-users, linux-arm-kernel, peterz, acme,
	mark.rutland, will, catalin.marinas



On 29/09/2022 08:58, Anshuman Khandual wrote:
> This adds arm pmu infrastrure to probe BRBE implementation's attributes via
> driver exported callbacks later. The actual BRBE feature detection will be
> added by the driver itself.
> 
> CPU specific BRBE entries, cycle count, format support gets detected during
> PMU init. This information gets saved in per-cpu struct pmu_hw_events which
> later helps in operating BRBE during a perf event context.
> 
> Cc: Will Deacon <will@kernel.org>
> Cc: Mark Rutland <mark.rutland@arm.com>
> Cc: linux-arm-kernel@lists.infradead.org
> Cc: linux-kernel@vger.kernel.org
> Signed-off-by: Anshuman Khandual <anshuman.khandual@arm.com>
> ---
>  drivers/perf/arm_pmu_platform.c | 34 +++++++++++++++++++++++++++++++++
>  1 file changed, 34 insertions(+)
> 
> diff --git a/drivers/perf/arm_pmu_platform.c b/drivers/perf/arm_pmu_platform.c
> index 933b96e243b8..acdc445081aa 100644
> --- a/drivers/perf/arm_pmu_platform.c
> +++ b/drivers/perf/arm_pmu_platform.c
> @@ -172,6 +172,36 @@ static int armpmu_request_irqs(struct arm_pmu *armpmu)
>  	return err;
>  }
>  
> +static void arm_brbe_probe_cpu(void *info)
> +{
> +	struct pmu_hw_events *hw_events;
> +	struct arm_pmu *armpmu = info;
> +
> +	/*
> +	 * Return from here, if BRBE driver has not been
> +	 * implemented for this PMU. This helps prevent
> +	 * kernel crash later when brbe_probe() will be
> +	 * called on the PMU.
> +	 */
> +	if (!armpmu->brbe_probe)
> +		return;
> +
> +	hw_events = per_cpu_ptr(armpmu->hw_events, smp_processor_id());
> +	armpmu->brbe_probe(hw_events);
> +}
> +
> +static int armpmu_request_brbe(struct arm_pmu *armpmu)
> +{
> +	int cpu, err = 0;
> +
> +	for_each_cpu(cpu, &armpmu->supported_cpus) {
> +		err = smp_call_function_single(cpu, arm_brbe_probe_cpu, armpmu, 1);

Hi Anshuman,

I have LOCKDEP on and the patchset applied to perf/core (82aad7ff7) on
git://git.kernel.org/pub/scm/linux/kernel/git/peterz/queue.git and I get
this:

   armv8-pmu pmu: hw perfevents: no interrupt-affinity property, guessing.
   brbe: implementation found on cpu 0

   =============================
   [ BUG: Invalid wait context ]
   6.0.0-rc7 #38 Not tainted
   -----------------------------
   kworker/u8:0/9 is trying to lock:
   ffff000800855898 (&port_lock_key){....}-{3:3}, at:
pl011_console_write+0x148/0x240
   other info that might help us debug this:
   context-{2:2}
   5 locks held by kworker/u8:0/9:
    #0: ffff00080032a138 ((wq_completion)eval_map_wq){+.+.}-{0:0}, at:
process_one_work+0x200/0x6b0
    #1: ffff80000807bde0
((work_completion)(&eval_map_work)){+.+.}-{0:0}, at:
process_one_work+0x200/0x6b0
    #2: ffff80000aa3db70 (trace_event_sem){+.+.}-{4:4}, at:
trace_event_eval_update+0x28/0x420
    #3: ffff80000a9afe58 (console_lock){+.+.}-{0:0}, at:
vprintk_emit+0x130/0x380
    #4: ffff80000a9aff78 (console_owner){-...}-{0:0}, at:
console_emit_next_record.constprop.0+0x128/0x338
   stack backtrace:
   CPU: 0 PID: 9 Comm: kworker/u8:0 Not tainted 6.0.0-rc7 #38
   Hardware name: Foundation-v8A (DT)
   Workqueue: eval_map_wq eval_map_work_func
   Call trace:
    dump_backtrace+0x114/0x120
    show_stack+0x20/0x58
    dump_stack_lvl+0x9c/0xd8
    dump_stack+0x18/0x34
    __lock_acquire+0x17cc/0x1920
    lock_acquire+0x138/0x3b8
    _raw_spin_lock+0x58/0x70
    pl011_console_write+0x148/0x240
    console_emit_next_record.constprop.0+0x194/0x338
    console_unlock+0x18c/0x208
    vprintk_emit+0x24c/0x380
    vprintk_default+0x40/0x50
    vprintk+0xd4/0xf0
    _printk+0x68/0x90
    arm64_pmu_brbe_probe+0x10c/0x128
    armv8pmu_brbe_probe+0x18/0x28
    arm_brbe_probe_cpu+0x44/0x58
    __flush_smp_call_function_queue+0x1d0/0x440
    generic_smp_call_function_single_interrupt+0x20/0x78
    ipi_handler+0x98/0x368
    handle_percpu_devid_irq+0xc0/0x3a8
    generic_handle_domain_irq+0x34/0x50
    gic_handle_irq+0x58/0x138
    call_on_irq_stack+0x2c/0x58
    do_interrupt_handler+0x88/0x90
    el1_interrupt+0x40/0x78
    el1h_64_irq_handler+0x18/0x28
    el1h_64_irq+0x64/0x68
    trace_event_eval_update+0x114/0x420
    eval_map_work_func+0x30/0x40
    process_one_work+0x298/0x6b0
    worker_thread+0x54/0x408
    kthread+0x118/0x128
    ret_from_fork+0x10/0x20
   brbe: implementation found on cpu 1
   brbe: implementation found on cpu 2
   brbe: implementation found on cpu 3

> +		if (err)
> +			return err;
> +	}
> +	return err;
> +}
> +
>  static void armpmu_free_irqs(struct arm_pmu *armpmu)
>  {
>  	int cpu;
> @@ -229,6 +259,10 @@ int arm_pmu_device_probe(struct platform_device *pdev,
>  	if (ret)
>  		goto out_free_irqs;
>  
> +	ret = armpmu_request_brbe(pmu);
> +	if (ret)
> +		goto out_free_irqs;
> +
>  	ret = armpmu_register(pmu);
>  	if (ret) {
>  		dev_err(dev, "failed to register PMU devices!\n");

_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [PATCH V3 7/7] arm64/perf: Enable branch stack sampling
  2022-09-29  7:58 ` [PATCH V3 7/7] arm64/perf: Enable branch stack sampling Anshuman Khandual
@ 2022-10-10 13:55   ` James Clark
  2022-10-10 15:48     ` Suzuki K Poulose
  0 siblings, 1 reply; 18+ messages in thread
From: James Clark @ 2022-10-10 13:55 UTC (permalink / raw)
  To: Anshuman Khandual
  Cc: linux-kernel, linux-perf-users, linux-arm-kernel, peterz, acme,
	mark.rutland, will, catalin.marinas, Suzuki K Poulose



On 29/09/2022 08:58, Anshuman Khandual wrote:
> Now that all the required pieces are already in place, just enable the perf
> branch stack sampling support on arm64 platform, by removing the gate which
> blocks it in armpmu_event_init().
> 
> Cc: Mark Rutland <mark.rutland@arm.com>
> Cc: Will Deacon <will@kernel.org>
> Cc: Catalin Marinas <catalin.marinas@arm.com>
> Cc: linux-kernel@vger.kernel.org
> Cc: linux-arm-kernel@lists.infradead.org
> Signed-off-by: Anshuman Khandual <anshuman.khandual@arm.com>
> ---
>  drivers/perf/arm_pmu.c | 32 +++++++++++++++++++++++++++++---
>  1 file changed, 29 insertions(+), 3 deletions(-)
> 
> diff --git a/drivers/perf/arm_pmu.c b/drivers/perf/arm_pmu.c
> index 93b36933124f..2a9b988b53c2 100644
> --- a/drivers/perf/arm_pmu.c
> +++ b/drivers/perf/arm_pmu.c
> @@ -537,9 +537,35 @@ static int armpmu_event_init(struct perf_event *event)
>  		!cpumask_test_cpu(event->cpu, &armpmu->supported_cpus))
>  		return -ENOENT;
>  
> -	/* does not support taken branch sampling */
> -	if (has_branch_stack(event))
> -		return -EOPNOTSUPP;
> +	if (has_branch_stack(event)) {
> +		/*
> +		 * BRBE support is absent. Select CONFIG_ARM_BRBE_PMU
> +		 * in the config, before branch stack sampling events
> +		 * can be requested.
> +		 */
> +		if (!IS_ENABLED(CONFIG_ARM_BRBE_PMU)) {
> +			pr_warn_once("BRBE is disabled, select CONFIG_ARM_BRBE_PMU\n");
> +			return -EOPNOTSUPP;
> +		}
> +
> +		if (event->attr.branch_sample_type & PERF_SAMPLE_BRANCH_KERNEL) {
> +			if (!perfmon_capable()) {

I'm still getting different behaviour compared to x86 when using
perf_event_paranoid because of this perfmon_capable() call here.

> +				pr_warn_once("does not have permission for kernel branch filter\n");

Also I was under the impression that this should be more like a
KERN_INFO loglevel rather than a KERN_WARNING. It's more like expected
behavior rather than unexpected behavior and as far as I know anyone who
sees something in dmesg might think something has gone wrong and try to
follow it up. It is quite a useful message but I remember getting a
review like this before and it made sense to me.

> +				return -EPERM;
> +			}
> +		}
> +
> +		/*
> +		 * Branch stack sampling event can not be supported in
> +		 * case either the required driver itself is absent or
> +		 * BRBE buffer, is not supported. Besides checking for
> +		 * the callback prevents a crash in case it's absent.
> +		 */
> +		if (!armpmu->brbe_supported || !armpmu->brbe_supported(event)) {
> +			pr_warn_once("BRBE is not supported\n");
> +			return -EOPNOTSUPP;
> +		}
> +	}
>  
>  	if (armpmu->map_event(event) == -ENOENT)
>  		return -ENOENT;

_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [PATCH V3 4/7] driver/perf/arm_pmu_platform: Add support for BRBE attributes detection
  2022-10-06 13:37   ` James Clark
@ 2022-10-10 14:17     ` James Clark
  2022-10-11  9:21       ` Anshuman Khandual
  2022-10-11  9:16     ` Anshuman Khandual
  1 sibling, 1 reply; 18+ messages in thread
From: James Clark @ 2022-10-10 14:17 UTC (permalink / raw)
  To: Anshuman Khandual
  Cc: linux-kernel, linux-perf-users, linux-arm-kernel, peterz, acme,
	mark.rutland, will, catalin.marinas



On 06/10/2022 14:37, James Clark wrote:
> 
> 
> On 29/09/2022 08:58, Anshuman Khandual wrote:
>> This adds arm pmu infrastrure to probe BRBE implementation's attributes via
>> driver exported callbacks later. The actual BRBE feature detection will be
>> added by the driver itself.
>>
>> CPU specific BRBE entries, cycle count, format support gets detected during
>> PMU init. This information gets saved in per-cpu struct pmu_hw_events which
>> later helps in operating BRBE during a perf event context.
>>
>> Cc: Will Deacon <will@kernel.org>
>> Cc: Mark Rutland <mark.rutland@arm.com>
>> Cc: linux-arm-kernel@lists.infradead.org
>> Cc: linux-kernel@vger.kernel.org
>> Signed-off-by: Anshuman Khandual <anshuman.khandual@arm.com>
>> ---
>>  drivers/perf/arm_pmu_platform.c | 34 +++++++++++++++++++++++++++++++++
>>  1 file changed, 34 insertions(+)
>>
>> diff --git a/drivers/perf/arm_pmu_platform.c b/drivers/perf/arm_pmu_platform.c
>> index 933b96e243b8..acdc445081aa 100644
>> --- a/drivers/perf/arm_pmu_platform.c
>> +++ b/drivers/perf/arm_pmu_platform.c
>> @@ -172,6 +172,36 @@ static int armpmu_request_irqs(struct arm_pmu *armpmu)
>>  	return err;
>>  }
>>  
>> +static void arm_brbe_probe_cpu(void *info)
>> +{
>> +	struct pmu_hw_events *hw_events;
>> +	struct arm_pmu *armpmu = info;
>> +
>> +	/*
>> +	 * Return from here, if BRBE driver has not been
>> +	 * implemented for this PMU. This helps prevent
>> +	 * kernel crash later when brbe_probe() will be
>> +	 * called on the PMU.
>> +	 */
>> +	if (!armpmu->brbe_probe)
>> +		return;
>> +
>> +	hw_events = per_cpu_ptr(armpmu->hw_events, smp_processor_id());
>> +	armpmu->brbe_probe(hw_events);
>> +}
>> +
>> +static int armpmu_request_brbe(struct arm_pmu *armpmu)
>> +{
>> +	int cpu, err = 0;
>> +
>> +	for_each_cpu(cpu, &armpmu->supported_cpus) {
>> +		err = smp_call_function_single(cpu, arm_brbe_probe_cpu, armpmu, 1);
> 
> Hi Anshuman,
> 
> I have LOCKDEP on and the patchset applied to perf/core (82aad7ff7) on
> git://git.kernel.org/pub/scm/linux/kernel/git/peterz/queue.git and I get

Can you confirm if this is currently the correct place to apply this to?
I'm only getting 0 length branch stacks now. Seems like it could be
something to do with the layout of perf samples because I know that was
done in separate commits:

  sudo ./perf record -j any_call -- ls
  ./perf report -D | grep "branch stack"
  ... branch stack: nr:0
  ... branch stack: nr:0
  ... branch stack: nr:0
  ... branch stack: nr:0
  ...

> this:
> 
>    armv8-pmu pmu: hw perfevents: no interrupt-affinity property, guessing.
>    brbe: implementation found on cpu 0
> 
>    =============================
>    [ BUG: Invalid wait context ]
>    6.0.0-rc7 #38 Not tainted
>    -----------------------------
>    kworker/u8:0/9 is trying to lock:
>    ffff000800855898 (&port_lock_key){....}-{3:3}, at:
> pl011_console_write+0x148/0x240
>    other info that might help us debug this:
>    context-{2:2}
>    5 locks held by kworker/u8:0/9:
>     #0: ffff00080032a138 ((wq_completion)eval_map_wq){+.+.}-{0:0}, at:
> process_one_work+0x200/0x6b0
>     #1: ffff80000807bde0
> ((work_completion)(&eval_map_work)){+.+.}-{0:0}, at:
> process_one_work+0x200/0x6b0
>     #2: ffff80000aa3db70 (trace_event_sem){+.+.}-{4:4}, at:
> trace_event_eval_update+0x28/0x420
>     #3: ffff80000a9afe58 (console_lock){+.+.}-{0:0}, at:
> vprintk_emit+0x130/0x380
>     #4: ffff80000a9aff78 (console_owner){-...}-{0:0}, at:
> console_emit_next_record.constprop.0+0x128/0x338
>    stack backtrace:
>    CPU: 0 PID: 9 Comm: kworker/u8:0 Not tainted 6.0.0-rc7 #38
>    Hardware name: Foundation-v8A (DT)
>    Workqueue: eval_map_wq eval_map_work_func
>    Call trace:
>     dump_backtrace+0x114/0x120
>     show_stack+0x20/0x58
>     dump_stack_lvl+0x9c/0xd8
>     dump_stack+0x18/0x34
>     __lock_acquire+0x17cc/0x1920
>     lock_acquire+0x138/0x3b8
>     _raw_spin_lock+0x58/0x70
>     pl011_console_write+0x148/0x240
>     console_emit_next_record.constprop.0+0x194/0x338
>     console_unlock+0x18c/0x208
>     vprintk_emit+0x24c/0x380
>     vprintk_default+0x40/0x50
>     vprintk+0xd4/0xf0
>     _printk+0x68/0x90
>     arm64_pmu_brbe_probe+0x10c/0x128
>     armv8pmu_brbe_probe+0x18/0x28
>     arm_brbe_probe_cpu+0x44/0x58
>     __flush_smp_call_function_queue+0x1d0/0x440
>     generic_smp_call_function_single_interrupt+0x20/0x78
>     ipi_handler+0x98/0x368
>     handle_percpu_devid_irq+0xc0/0x3a8
>     generic_handle_domain_irq+0x34/0x50
>     gic_handle_irq+0x58/0x138
>     call_on_irq_stack+0x2c/0x58
>     do_interrupt_handler+0x88/0x90
>     el1_interrupt+0x40/0x78
>     el1h_64_irq_handler+0x18/0x28
>     el1h_64_irq+0x64/0x68
>     trace_event_eval_update+0x114/0x420
>     eval_map_work_func+0x30/0x40
>     process_one_work+0x298/0x6b0
>     worker_thread+0x54/0x408
>     kthread+0x118/0x128
>     ret_from_fork+0x10/0x20
>    brbe: implementation found on cpu 1
>    brbe: implementation found on cpu 2
>    brbe: implementation found on cpu 3
> 
>> +		if (err)
>> +			return err;
>> +	}
>> +	return err;
>> +}
>> +
>>  static void armpmu_free_irqs(struct arm_pmu *armpmu)
>>  {
>>  	int cpu;
>> @@ -229,6 +259,10 @@ int arm_pmu_device_probe(struct platform_device *pdev,
>>  	if (ret)
>>  		goto out_free_irqs;
>>  
>> +	ret = armpmu_request_brbe(pmu);
>> +	if (ret)
>> +		goto out_free_irqs;
>> +
>>  	ret = armpmu_register(pmu);
>>  	if (ret) {
>>  		dev_err(dev, "failed to register PMU devices!\n");

_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [PATCH V3 7/7] arm64/perf: Enable branch stack sampling
  2022-10-10 13:55   ` James Clark
@ 2022-10-10 15:48     ` Suzuki K Poulose
  2022-10-11  9:27       ` Anshuman Khandual
  0 siblings, 1 reply; 18+ messages in thread
From: Suzuki K Poulose @ 2022-10-10 15:48 UTC (permalink / raw)
  To: James Clark, Anshuman Khandual
  Cc: linux-kernel, linux-perf-users, linux-arm-kernel, peterz, acme,
	mark.rutland, will, catalin.marinas

On 10/10/2022 14:55, James Clark wrote:
> 
> 
> On 29/09/2022 08:58, Anshuman Khandual wrote:
>> Now that all the required pieces are already in place, just enable the perf
>> branch stack sampling support on arm64 platform, by removing the gate which
>> blocks it in armpmu_event_init().
>>
>> Cc: Mark Rutland <mark.rutland@arm.com>
>> Cc: Will Deacon <will@kernel.org>
>> Cc: Catalin Marinas <catalin.marinas@arm.com>
>> Cc: linux-kernel@vger.kernel.org
>> Cc: linux-arm-kernel@lists.infradead.org
>> Signed-off-by: Anshuman Khandual <anshuman.khandual@arm.com>
>> ---
>>   drivers/perf/arm_pmu.c | 32 +++++++++++++++++++++++++++++---
>>   1 file changed, 29 insertions(+), 3 deletions(-)
>>
>> diff --git a/drivers/perf/arm_pmu.c b/drivers/perf/arm_pmu.c
>> index 93b36933124f..2a9b988b53c2 100644
>> --- a/drivers/perf/arm_pmu.c
>> +++ b/drivers/perf/arm_pmu.c
>> @@ -537,9 +537,35 @@ static int armpmu_event_init(struct perf_event *event)
>>   		!cpumask_test_cpu(event->cpu, &armpmu->supported_cpus))
>>   		return -ENOENT;
>>   
>> -	/* does not support taken branch sampling */
>> -	if (has_branch_stack(event))
>> -		return -EOPNOTSUPP;
>> +	if (has_branch_stack(event)) {
>> +		/*
>> +		 * BRBE support is absent. Select CONFIG_ARM_BRBE_PMU
>> +		 * in the config, before branch stack sampling events
>> +		 * can be requested.
>> +		 */
>> +		if (!IS_ENABLED(CONFIG_ARM_BRBE_PMU)) {
>> +			pr_warn_once("BRBE is disabled, select CONFIG_ARM_BRBE_PMU\n");
>> +			return -EOPNOTSUPP;
>> +		}
>> +
>> +		if (event->attr.branch_sample_type & PERF_SAMPLE_BRANCH_KERNEL) {
>> +			if (!perfmon_capable()) {
> 
> I'm still getting different behaviour compared to x86 when using
> perf_event_paranoid because of this perfmon_capable() call here.

Given the generic events framework already checks this for any
privileged branch samples (i.e., for both KERNEL and HV), the
individual drivers must not add additional restrictions.

> 
>> +				pr_warn_once("does not have permission for kernel branch filter\n");
> 
> Also I was under the impression that this should be more like a
> KERN_INFO loglevel rather than a KERN_WARNING. It's more like expected
> behavior rather than unexpected behavior and as far as I know anyone who
> sees something in dmesg might think something has gone wrong and try to
> follow it up. It is quite a useful message but I remember getting a
> review like this before and it made sense to me.

+1

Suzuki


_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [PATCH V3 4/7] driver/perf/arm_pmu_platform: Add support for BRBE attributes detection
  2022-10-06 13:37   ` James Clark
  2022-10-10 14:17     ` James Clark
@ 2022-10-11  9:16     ` Anshuman Khandual
  1 sibling, 0 replies; 18+ messages in thread
From: Anshuman Khandual @ 2022-10-11  9:16 UTC (permalink / raw)
  To: James Clark
  Cc: linux-kernel, linux-perf-users, linux-arm-kernel, peterz, acme,
	mark.rutland, will, catalin.marinas



On 10/6/22 19:07, James Clark wrote:
> 
> On 29/09/2022 08:58, Anshuman Khandual wrote:
>> This adds arm pmu infrastrure to probe BRBE implementation's attributes via
>> driver exported callbacks later. The actual BRBE feature detection will be
>> added by the driver itself.
>>
>> CPU specific BRBE entries, cycle count, format support gets detected during
>> PMU init. This information gets saved in per-cpu struct pmu_hw_events which
>> later helps in operating BRBE during a perf event context.
>>
>> Cc: Will Deacon <will@kernel.org>
>> Cc: Mark Rutland <mark.rutland@arm.com>
>> Cc: linux-arm-kernel@lists.infradead.org
>> Cc: linux-kernel@vger.kernel.org
>> Signed-off-by: Anshuman Khandual <anshuman.khandual@arm.com>
>> ---
>>  drivers/perf/arm_pmu_platform.c | 34 +++++++++++++++++++++++++++++++++
>>  1 file changed, 34 insertions(+)
>>
>> diff --git a/drivers/perf/arm_pmu_platform.c b/drivers/perf/arm_pmu_platform.c
>> index 933b96e243b8..acdc445081aa 100644
>> --- a/drivers/perf/arm_pmu_platform.c
>> +++ b/drivers/perf/arm_pmu_platform.c
>> @@ -172,6 +172,36 @@ static int armpmu_request_irqs(struct arm_pmu *armpmu)
>>  	return err;
>>  }
>>  
>> +static void arm_brbe_probe_cpu(void *info)
>> +{
>> +	struct pmu_hw_events *hw_events;
>> +	struct arm_pmu *armpmu = info;
>> +
>> +	/*
>> +	 * Return from here, if BRBE driver has not been
>> +	 * implemented for this PMU. This helps prevent
>> +	 * kernel crash later when brbe_probe() will be
>> +	 * called on the PMU.
>> +	 */
>> +	if (!armpmu->brbe_probe)
>> +		return;
>> +
>> +	hw_events = per_cpu_ptr(armpmu->hw_events, smp_processor_id());
>> +	armpmu->brbe_probe(hw_events);
>> +}
>> +
>> +static int armpmu_request_brbe(struct arm_pmu *armpmu)
>> +{
>> +	int cpu, err = 0;
>> +
>> +	for_each_cpu(cpu, &armpmu->supported_cpus) {
>> +		err = smp_call_function_single(cpu, arm_brbe_probe_cpu, armpmu, 1);
> Hi Anshuman,
> 
> I have LOCKDEP on and the patchset applied to perf/core (82aad7ff7) on
> git://git.kernel.org/pub/scm/linux/kernel/git/peterz/queue.git and I get
> this:
> 
>    armv8-pmu pmu: hw perfevents: no interrupt-affinity property, guessing.
>    brbe: implementation found on cpu 0
> 
>    =============================
>    [ BUG: Invalid wait context ]
>    6.0.0-rc7 #38 Not tainted
>    -----------------------------
>    kworker/u8:0/9 is trying to lock:
>    ffff000800855898 (&port_lock_key){....}-{3:3}, at:
> pl011_console_write+0x148/0x240
>    other info that might help us debug this:
>    context-{2:2}
>    5 locks held by kworker/u8:0/9:
>     #0: ffff00080032a138 ((wq_completion)eval_map_wq){+.+.}-{0:0}, at:
> process_one_work+0x200/0x6b0
>     #1: ffff80000807bde0
> ((work_completion)(&eval_map_work)){+.+.}-{0:0}, at:
> process_one_work+0x200/0x6b0
>     #2: ffff80000aa3db70 (trace_event_sem){+.+.}-{4:4}, at:
> trace_event_eval_update+0x28/0x420
>     #3: ffff80000a9afe58 (console_lock){+.+.}-{0:0}, at:
> vprintk_emit+0x130/0x380
>     #4: ffff80000a9aff78 (console_owner){-...}-{0:0}, at:
> console_emit_next_record.constprop.0+0x128/0x338
>    stack backtrace:
>    CPU: 0 PID: 9 Comm: kworker/u8:0 Not tainted 6.0.0-rc7 #38
>    Hardware name: Foundation-v8A (DT)
>    Workqueue: eval_map_wq eval_map_work_func
>    Call trace:
>     dump_backtrace+0x114/0x120
>     show_stack+0x20/0x58
>     dump_stack_lvl+0x9c/0xd8
>     dump_stack+0x18/0x34
>     __lock_acquire+0x17cc/0x1920
>     lock_acquire+0x138/0x3b8
>     _raw_spin_lock+0x58/0x70
>     pl011_console_write+0x148/0x240
>     console_emit_next_record.constprop.0+0x194/0x338
>     console_unlock+0x18c/0x208
>     vprintk_emit+0x24c/0x380
>     vprintk_default+0x40/0x50
>     vprintk+0xd4/0xf0
>     _printk+0x68/0x90
>     arm64_pmu_brbe_probe+0x10c/0x128
>     armv8pmu_brbe_probe+0x18/0x28
>     arm_brbe_probe_cpu+0x44/0x58
>     __flush_smp_call_function_queue+0x1d0/0x440
>     generic_smp_call_function_single_interrupt+0x20/0x78
>     ipi_handler+0x98/0x368
>     handle_percpu_devid_irq+0xc0/0x3a8
>     generic_handle_domain_irq+0x34/0x50
>     gic_handle_irq+0x58/0x138
>     call_on_irq_stack+0x2c/0x58
>     do_interrupt_handler+0x88/0x90
>     el1_interrupt+0x40/0x78
>     el1h_64_irq_handler+0x18/0x28
>     el1h_64_irq+0x64/0x68
>     trace_event_eval_update+0x114/0x420
>     eval_map_work_func+0x30/0x40
>     process_one_work+0x298/0x6b0
>     worker_thread+0x54/0x408
>     kthread+0x118/0x128
>     ret_from_fork+0x10/0x20
>    brbe: implementation found on cpu 1
>    brbe: implementation found on cpu 2
>    brbe: implementation found on cpu 3


The LOCKDEP warnings are because of pr_warn/pr_info in arm64_pmu_brbe_probe()
which gets called from smp_call_function_single() context. I will drop these
prints, instead probably capture them in struct pmu_hw_events and display in
the caller itself.

_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [PATCH V3 4/7] driver/perf/arm_pmu_platform: Add support for BRBE attributes detection
  2022-10-10 14:17     ` James Clark
@ 2022-10-11  9:21       ` Anshuman Khandual
  2022-10-12  7:50         ` Anshuman Khandual
  0 siblings, 1 reply; 18+ messages in thread
From: Anshuman Khandual @ 2022-10-11  9:21 UTC (permalink / raw)
  To: James Clark
  Cc: linux-kernel, linux-perf-users, linux-arm-kernel, peterz, acme,
	mark.rutland, will, catalin.marinas



On 10/10/22 19:47, James Clark wrote:
> 
> 
> On 06/10/2022 14:37, James Clark wrote:
>>
>>
>> On 29/09/2022 08:58, Anshuman Khandual wrote:
>>> This adds arm pmu infrastrure to probe BRBE implementation's attributes via
>>> driver exported callbacks later. The actual BRBE feature detection will be
>>> added by the driver itself.
>>>
>>> CPU specific BRBE entries, cycle count, format support gets detected during
>>> PMU init. This information gets saved in per-cpu struct pmu_hw_events which
>>> later helps in operating BRBE during a perf event context.
>>>
>>> Cc: Will Deacon <will@kernel.org>
>>> Cc: Mark Rutland <mark.rutland@arm.com>
>>> Cc: linux-arm-kernel@lists.infradead.org
>>> Cc: linux-kernel@vger.kernel.org
>>> Signed-off-by: Anshuman Khandual <anshuman.khandual@arm.com>
>>> ---
>>>  drivers/perf/arm_pmu_platform.c | 34 +++++++++++++++++++++++++++++++++
>>>  1 file changed, 34 insertions(+)
>>>
>>> diff --git a/drivers/perf/arm_pmu_platform.c b/drivers/perf/arm_pmu_platform.c
>>> index 933b96e243b8..acdc445081aa 100644
>>> --- a/drivers/perf/arm_pmu_platform.c
>>> +++ b/drivers/perf/arm_pmu_platform.c
>>> @@ -172,6 +172,36 @@ static int armpmu_request_irqs(struct arm_pmu *armpmu)
>>>  	return err;
>>>  }
>>>  
>>> +static void arm_brbe_probe_cpu(void *info)
>>> +{
>>> +	struct pmu_hw_events *hw_events;
>>> +	struct arm_pmu *armpmu = info;
>>> +
>>> +	/*
>>> +	 * Return from here, if BRBE driver has not been
>>> +	 * implemented for this PMU. This helps prevent
>>> +	 * kernel crash later when brbe_probe() will be
>>> +	 * called on the PMU.
>>> +	 */
>>> +	if (!armpmu->brbe_probe)
>>> +		return;
>>> +
>>> +	hw_events = per_cpu_ptr(armpmu->hw_events, smp_processor_id());
>>> +	armpmu->brbe_probe(hw_events);
>>> +}
>>> +
>>> +static int armpmu_request_brbe(struct arm_pmu *armpmu)
>>> +{
>>> +	int cpu, err = 0;
>>> +
>>> +	for_each_cpu(cpu, &armpmu->supported_cpus) {
>>> +		err = smp_call_function_single(cpu, arm_brbe_probe_cpu, armpmu, 1);
>>
>> Hi Anshuman,
>>
>> I have LOCKDEP on and the patchset applied to perf/core (82aad7ff7) on
>> git://git.kernel.org/pub/scm/linux/kernel/git/peterz/queue.git and I get
> 
> Can you confirm if this is currently the correct place to apply this to?

This series applied on v6.0-rc5 after the perf ABI changes, both in kernel
and in user space tools.

> I'm only getting 0 length branch stacks now. Seems like it could be
> something to do with the layout of perf samples because I know that was
> done in separate commits:

Right, might be.

> 
>   sudo ./perf record -j any_call -- ls
>   ./perf report -D | grep "branch stack"
>   ... branch stack: nr:0
>   ... branch stack: nr:0
>   ... branch stack: nr:0
>   ... branch stack: nr:0

I am planning to respin the series on 6.1-rc1 next week which should solve
these multiple moving parts problem.

>   ...
> 
>> this:
>>
>>    armv8-pmu pmu: hw perfevents: no interrupt-affinity property, guessing.
>>    brbe: implementation found on cpu 0
>>
>>    =============================
>>    [ BUG: Invalid wait context ]
>>    6.0.0-rc7 #38 Not tainted
>>    -----------------------------
>>    kworker/u8:0/9 is trying to lock:
>>    ffff000800855898 (&port_lock_key){....}-{3:3}, at:
>> pl011_console_write+0x148/0x240
>>    other info that might help us debug this:
>>    context-{2:2}
>>    5 locks held by kworker/u8:0/9:
>>     #0: ffff00080032a138 ((wq_completion)eval_map_wq){+.+.}-{0:0}, at:
>> process_one_work+0x200/0x6b0
>>     #1: ffff80000807bde0
>> ((work_completion)(&eval_map_work)){+.+.}-{0:0}, at:
>> process_one_work+0x200/0x6b0
>>     #2: ffff80000aa3db70 (trace_event_sem){+.+.}-{4:4}, at:
>> trace_event_eval_update+0x28/0x420
>>     #3: ffff80000a9afe58 (console_lock){+.+.}-{0:0}, at:
>> vprintk_emit+0x130/0x380
>>     #4: ffff80000a9aff78 (console_owner){-...}-{0:0}, at:
>> console_emit_next_record.constprop.0+0x128/0x338
>>    stack backtrace:
>>    CPU: 0 PID: 9 Comm: kworker/u8:0 Not tainted 6.0.0-rc7 #38
>>    Hardware name: Foundation-v8A (DT)
>>    Workqueue: eval_map_wq eval_map_work_func
>>    Call trace:
>>     dump_backtrace+0x114/0x120
>>     show_stack+0x20/0x58
>>     dump_stack_lvl+0x9c/0xd8
>>     dump_stack+0x18/0x34
>>     __lock_acquire+0x17cc/0x1920
>>     lock_acquire+0x138/0x3b8
>>     _raw_spin_lock+0x58/0x70
>>     pl011_console_write+0x148/0x240
>>     console_emit_next_record.constprop.0+0x194/0x338
>>     console_unlock+0x18c/0x208
>>     vprintk_emit+0x24c/0x380
>>     vprintk_default+0x40/0x50
>>     vprintk+0xd4/0xf0
>>     _printk+0x68/0x90
>>     arm64_pmu_brbe_probe+0x10c/0x128
>>     armv8pmu_brbe_probe+0x18/0x28
>>     arm_brbe_probe_cpu+0x44/0x58
>>     __flush_smp_call_function_queue+0x1d0/0x440
>>     generic_smp_call_function_single_interrupt+0x20/0x78
>>     ipi_handler+0x98/0x368
>>     handle_percpu_devid_irq+0xc0/0x3a8
>>     generic_handle_domain_irq+0x34/0x50
>>     gic_handle_irq+0x58/0x138
>>     call_on_irq_stack+0x2c/0x58
>>     do_interrupt_handler+0x88/0x90
>>     el1_interrupt+0x40/0x78
>>     el1h_64_irq_handler+0x18/0x28
>>     el1h_64_irq+0x64/0x68
>>     trace_event_eval_update+0x114/0x420
>>     eval_map_work_func+0x30/0x40
>>     process_one_work+0x298/0x6b0
>>     worker_thread+0x54/0x408
>>     kthread+0x118/0x128
>>     ret_from_fork+0x10/0x20
>>    brbe: implementation found on cpu 1
>>    brbe: implementation found on cpu 2
>>    brbe: implementation found on cpu 3
>>
>>> +		if (err)
>>> +			return err;
>>> +	}
>>> +	return err;
>>> +}
>>> +
>>>  static void armpmu_free_irqs(struct arm_pmu *armpmu)
>>>  {
>>>  	int cpu;
>>> @@ -229,6 +259,10 @@ int arm_pmu_device_probe(struct platform_device *pdev,
>>>  	if (ret)
>>>  		goto out_free_irqs;
>>>  
>>> +	ret = armpmu_request_brbe(pmu);
>>> +	if (ret)
>>> +		goto out_free_irqs;
>>> +
>>>  	ret = armpmu_register(pmu);
>>>  	if (ret) {
>>>  		dev_err(dev, "failed to register PMU devices!\n");

_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [PATCH V3 7/7] arm64/perf: Enable branch stack sampling
  2022-10-10 15:48     ` Suzuki K Poulose
@ 2022-10-11  9:27       ` Anshuman Khandual
  0 siblings, 0 replies; 18+ messages in thread
From: Anshuman Khandual @ 2022-10-11  9:27 UTC (permalink / raw)
  To: Suzuki K Poulose, James Clark
  Cc: linux-kernel, linux-perf-users, linux-arm-kernel, peterz, acme,
	mark.rutland, will, catalin.marinas



On 10/10/22 21:18, Suzuki K Poulose wrote:
> On 10/10/2022 14:55, James Clark wrote:
>>
>>
>> On 29/09/2022 08:58, Anshuman Khandual wrote:
>>> Now that all the required pieces are already in place, just enable the perf
>>> branch stack sampling support on arm64 platform, by removing the gate which
>>> blocks it in armpmu_event_init().
>>>
>>> Cc: Mark Rutland <mark.rutland@arm.com>
>>> Cc: Will Deacon <will@kernel.org>
>>> Cc: Catalin Marinas <catalin.marinas@arm.com>
>>> Cc: linux-kernel@vger.kernel.org
>>> Cc: linux-arm-kernel@lists.infradead.org
>>> Signed-off-by: Anshuman Khandual <anshuman.khandual@arm.com>
>>> ---
>>>   drivers/perf/arm_pmu.c | 32 +++++++++++++++++++++++++++++---
>>>   1 file changed, 29 insertions(+), 3 deletions(-)
>>>
>>> diff --git a/drivers/perf/arm_pmu.c b/drivers/perf/arm_pmu.c
>>> index 93b36933124f..2a9b988b53c2 100644
>>> --- a/drivers/perf/arm_pmu.c
>>> +++ b/drivers/perf/arm_pmu.c
>>> @@ -537,9 +537,35 @@ static int armpmu_event_init(struct perf_event *event)
>>>           !cpumask_test_cpu(event->cpu, &armpmu->supported_cpus))
>>>           return -ENOENT;
>>>   -    /* does not support taken branch sampling */
>>> -    if (has_branch_stack(event))
>>> -        return -EOPNOTSUPP;
>>> +    if (has_branch_stack(event)) {
>>> +        /*
>>> +         * BRBE support is absent. Select CONFIG_ARM_BRBE_PMU
>>> +         * in the config, before branch stack sampling events
>>> +         * can be requested.
>>> +         */
>>> +        if (!IS_ENABLED(CONFIG_ARM_BRBE_PMU)) {
>>> +            pr_warn_once("BRBE is disabled, select CONFIG_ARM_BRBE_PMU\n");
>>> +            return -EOPNOTSUPP;
>>> +        }
>>> +
>>> +        if (event->attr.branch_sample_type & PERF_SAMPLE_BRANCH_KERNEL) {
>>> +            if (!perfmon_capable()) {
>>
>> I'm still getting different behaviour compared to x86 when using
>> perf_event_paranoid because of this perfmon_capable() call here.
> 
> Given the generic events framework already checks this for any
> privileged branch samples (i.e., for both KERNEL and HV), the
> individual drivers must not add additional restrictions.

Okay, will drop perfmon_capable() check here along with the warning.

> 
>>
>>> +                pr_warn_once("does not have permission for kernel branch filter\n");
>>
>> Also I was under the impression that this should be more like a
>> KERN_INFO loglevel rather than a KERN_WARNING. It's more like expected
>> behavior rather than unexpected behavior and as far as I know anyone who
>> sees something in dmesg might think something has gone wrong and try to
>> follow it up. It is quite a useful message but I remember getting a
>> review like this before and it made sense to me.
> 
> +1

Sure, will change remaining pr_warn_once() prints as pr_info() instead.

_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [PATCH V3 4/7] driver/perf/arm_pmu_platform: Add support for BRBE attributes detection
  2022-10-11  9:21       ` Anshuman Khandual
@ 2022-10-12  7:50         ` Anshuman Khandual
  0 siblings, 0 replies; 18+ messages in thread
From: Anshuman Khandual @ 2022-10-12  7:50 UTC (permalink / raw)
  To: James Clark
  Cc: linux-kernel, linux-perf-users, linux-arm-kernel, peterz, acme,
	mark.rutland, will, catalin.marinas



On 10/11/22 14:51, Anshuman Khandual wrote:
> 
> On 10/10/22 19:47, James Clark wrote:
>>
>> On 06/10/2022 14:37, James Clark wrote:
>>>
>>> On 29/09/2022 08:58, Anshuman Khandual wrote:
>>>> This adds arm pmu infrastrure to probe BRBE implementation's attributes via
>>>> driver exported callbacks later. The actual BRBE feature detection will be
>>>> added by the driver itself.
>>>>
>>>> CPU specific BRBE entries, cycle count, format support gets detected during
>>>> PMU init. This information gets saved in per-cpu struct pmu_hw_events which
>>>> later helps in operating BRBE during a perf event context.
>>>>
>>>> Cc: Will Deacon <will@kernel.org>
>>>> Cc: Mark Rutland <mark.rutland@arm.com>
>>>> Cc: linux-arm-kernel@lists.infradead.org
>>>> Cc: linux-kernel@vger.kernel.org
>>>> Signed-off-by: Anshuman Khandual <anshuman.khandual@arm.com>
>>>> ---
>>>>  drivers/perf/arm_pmu_platform.c | 34 +++++++++++++++++++++++++++++++++
>>>>  1 file changed, 34 insertions(+)
>>>>
>>>> diff --git a/drivers/perf/arm_pmu_platform.c b/drivers/perf/arm_pmu_platform.c
>>>> index 933b96e243b8..acdc445081aa 100644
>>>> --- a/drivers/perf/arm_pmu_platform.c
>>>> +++ b/drivers/perf/arm_pmu_platform.c
>>>> @@ -172,6 +172,36 @@ static int armpmu_request_irqs(struct arm_pmu *armpmu)
>>>>  	return err;
>>>>  }
>>>>  
>>>> +static void arm_brbe_probe_cpu(void *info)
>>>> +{
>>>> +	struct pmu_hw_events *hw_events;
>>>> +	struct arm_pmu *armpmu = info;
>>>> +
>>>> +	/*
>>>> +	 * Return from here, if BRBE driver has not been
>>>> +	 * implemented for this PMU. This helps prevent
>>>> +	 * kernel crash later when brbe_probe() will be
>>>> +	 * called on the PMU.
>>>> +	 */
>>>> +	if (!armpmu->brbe_probe)
>>>> +		return;
>>>> +
>>>> +	hw_events = per_cpu_ptr(armpmu->hw_events, smp_processor_id());
>>>> +	armpmu->brbe_probe(hw_events);
>>>> +}
>>>> +
>>>> +static int armpmu_request_brbe(struct arm_pmu *armpmu)
>>>> +{
>>>> +	int cpu, err = 0;
>>>> +
>>>> +	for_each_cpu(cpu, &armpmu->supported_cpus) {
>>>> +		err = smp_call_function_single(cpu, arm_brbe_probe_cpu, armpmu, 1);
>>> Hi Anshuman,
>>>
>>> I have LOCKDEP on and the patchset applied to perf/core (82aad7ff7) on
>>> git://git.kernel.org/pub/scm/linux/kernel/git/peterz/queue.git and I get
>> Can you confirm if this is currently the correct place to apply this to?
> This series applied on v6.0-rc5 after the perf ABI changes, both in kernel
> and in user space tools.
> 
>> I'm only getting 0 length branch stacks now. Seems like it could be
>> something to do with the layout of perf samples because I know that was
>> done in separate commits:
> Right, might be.
> 
>>   sudo ./perf record -j any_call -- ls
>>   ./perf report -D | grep "branch stack"
>>   ... branch stack: nr:0
>>   ... branch stack: nr:0
>>   ... branch stack: nr:0
>>   ... branch stack: nr:0
> I am planning to respin the series on 6.1-rc1 next week which should solve
> these multiple moving parts problem

There are some recent changes which require PMU driver to set data.sample_flags
indicating what kind of records are being filled in there. Here are the commits

a9a931e2666878343 ("perf: Use sample_flags for branch stack")
3aac580d5cc3001ca ("perf: Add sample_flags to indicate the PMU-filled sample data")

Following fix solves the problem for BRBE driver.

diff --git a/arch/arm64/kernel/perf_event.c b/arch/arm64/kernel/perf_event.c
index 98e9a615d3cb..85a3aaefc0fb 100644
--- a/arch/arm64/kernel/perf_event.c
+++ b/arch/arm64/kernel/perf_event.c
@@ -877,6 +877,7 @@ static irqreturn_t armv8pmu_handle_irq(struct arm_pmu *cpu_pmu)
                if (has_branch_stack(event)) {
                        cpu_pmu->brbe_read(cpuc, event);
                        data.br_stack = &cpuc->branches->brbe_stack;
+                       data.sample_flags |= PERF_SAMPLE_BRANCH_STACK;
                        cpu_pmu->brbe_reset(cpuc);
                }
 

_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel

^ permalink raw reply related	[flat|nested] 18+ messages in thread

end of thread, other threads:[~2022-10-12  7:51 UTC | newest]

Thread overview: 18+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2022-09-29  7:58 [PATCH V3 0/7] arm64/perf: Enable branch stack sampling Anshuman Khandual
2022-09-29  7:58 ` [PATCH V3 1/7] arm64/perf: Add BRBE registers and fields Anshuman Khandual
2022-09-29 11:29   ` Mark Brown
2022-09-30  4:07     ` Anshuman Khandual
2022-09-29  7:58 ` [PATCH V3 2/7] arm64/perf: Update struct arm_pmu for BRBE Anshuman Khandual
2022-09-29  7:58 ` [PATCH V3 3/7] arm64/perf: Update struct pmu_hw_events " Anshuman Khandual
2022-09-29  7:58 ` [PATCH V3 4/7] driver/perf/arm_pmu_platform: Add support for BRBE attributes detection Anshuman Khandual
2022-10-06 13:37   ` James Clark
2022-10-10 14:17     ` James Clark
2022-10-11  9:21       ` Anshuman Khandual
2022-10-12  7:50         ` Anshuman Khandual
2022-10-11  9:16     ` Anshuman Khandual
2022-09-29  7:58 ` [PATCH V3 5/7] arm64/perf: Drive BRBE from perf event states Anshuman Khandual
2022-09-29  7:58 ` [PATCH V3 6/7] arm64/perf: Add BRBE driver Anshuman Khandual
2022-09-29  7:58 ` [PATCH V3 7/7] arm64/perf: Enable branch stack sampling Anshuman Khandual
2022-10-10 13:55   ` James Clark
2022-10-10 15:48     ` Suzuki K Poulose
2022-10-11  9:27       ` Anshuman Khandual

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).