linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [PATCH v2 0/6] arm64: ARMv8.4 Activity Monitors support
@ 2019-12-18 18:26 Ionela Voinescu
  2019-12-18 18:26 ` [PATCH v2 1/6] arm64: add support for the AMU extension v1 Ionela Voinescu
                   ` (5 more replies)
  0 siblings, 6 replies; 40+ messages in thread
From: Ionela Voinescu @ 2019-12-18 18:26 UTC (permalink / raw)
  To: catalin.marinas, will, mark.rutland, maz, suzuki.poulose,
	sudeep.holla, dietmar.eggemann, ionela.voinescu
  Cc: peterz, mingo, ggherdovich, vincent.guittot, linux-arm-kernel,
	linux-doc, linux-kernel

These patches introduce support for the Activity Monitors Unit (AMU)
CPU extension, an optional extension in ARMv8.4 CPUs. This provides
performance counters intended for system management use. Two of these
counters are then used to compute the frequency scale correction
factor needed to achieve frequency invariance.

With the CONFIG_ARM64_AMU_EXTN enabled the kernel is able to safely
run a mix of CPUs with and without support for the AMU extension.
The AMU capability is unconditionally enabled in the kernel as to
allow any late CPU to use the feature: the cpu_enable function will
be called for all CPUs that match the criteria, including secondary
and hotplugged CPUs, marking this feature as present on that
respective CPU (through a per-cpu variable).

To be noted that firmware must implement AMU support when running on
CPUs that present the activity monitors extension: allow access to
the registers from lower exception levels, enable the counters,
implement save and restore functionality. More details can be found
in the documentation.

Given that the activity counters inform on activity on the CPUs, and 
that not all CPUs might implement the extension, for functional and 
security reasons, it's best to disable access to the AMU registers
from userspace (EL0) and KVM guests.

In the last patch of the series, two of the AMU counters are used to
compute the frequency scale factor needed to achieve frequency
invariance of signals in the scheduler, based on an interface added
to support counter-based frequency invariance - arch_scale_freq_tick.
The interface and update point for the counter-based frequency scale
factor is based on the similar approach in the patch that introduces
frequency invariance for x86 [1]. 

The current series is based on linux-next 20191217.

Testing:
 - Build tested for multiple architectures and defconfigs.
 - AMU feature detection, EL0 and KVM guest access to AMU registers,
   feature support in firmware (version 1.5 and later of the ARM 
   Trusted Firmware) was tested on an Armv8-A Base Platform FVP:
   Architecture Envelope Model [2] (supports version 8.0 to 8.5),
   with the following configurations:

   cluster0.has_arm_v8-4=1
   cluster1.has_arm_v8-4=1
   cluster0.has_amu=1
   cluster1.has_amu=1

v1 -> v2:
 - v1 can be found at [3]
 - Added patches that use the counters for the scheduler's frequency
   invariance engine
 - In patch arm64: add support for the AMU extension v1 - 
    - Defined an accessor function cpu_has_amu_feat to allow a read
      of amu_feat only from the current CPU, to ensure the safe use
      of the per-cpu variable for the current user (arm64 topology
      driver) and future users.
    - Modified type of amu_feat from bool to u8 to satisfy sparse
      checker's warning 'expression using sizeof _Bool [sparse]',
      as the size of bool is compiler dependent.

[1] https://lore.kernel.org/lkml/20191113124654.18122-1-ggherdovich@suse.cz/
[2] https://developer.arm.com/tools-and-software/simulation-models/fixed-virtual-platforms
[3] https://lore.kernel.org/lkml/20190917134228.5369-1-ionela.voinescu@arm.com/

Ionela Voinescu (6):
  arm64: add support for the AMU extension v1
  arm64: trap to EL1 accesses to AMU counters from EL0
  arm64/kvm: disable access to AMU registers from kvm guests
  Documentation: arm64: document support for the AMU extension
  TEMP: sched: add interface for counter-based frequency invariance
  arm64: use activity monitors for frequency invariance

 Documentation/arm64/amu.rst                   | 107 ++++++++
 Documentation/arm64/booting.rst               |  14 ++
 Documentation/arm64/cpu-feature-registers.rst |   2 +
 Documentation/arm64/index.rst                 |   1 +
 arch/arm64/Kconfig                            |  27 ++
 arch/arm64/include/asm/assembler.h            |  10 +
 arch/arm64/include/asm/cpucaps.h              |   3 +-
 arch/arm64/include/asm/cpufeature.h           |   4 +
 arch/arm64/include/asm/kvm_arm.h              |   7 +-
 arch/arm64/include/asm/sysreg.h               |  44 ++++
 arch/arm64/include/asm/topology.h             |   9 +
 arch/arm64/kernel/cpufeature.c                |  81 +++++-
 arch/arm64/kernel/topology.c                  | 233 ++++++++++++++++++
 arch/arm64/kvm/hyp/switch.c                   |  13 +-
 arch/arm64/kvm/sys_regs.c                     |  95 ++++++-
 arch/arm64/mm/proc.S                          |   3 +
 drivers/base/arch_topology.c                  |  16 ++
 kernel/sched/core.c                           |   1 +
 kernel/sched/sched.h                          |   7 +
 19 files changed, 666 insertions(+), 11 deletions(-)
 create mode 100644 Documentation/arm64/amu.rst

-- 
2.17.1


^ permalink raw reply	[flat|nested] 40+ messages in thread

* [PATCH v2 1/6] arm64: add support for the AMU extension v1
  2019-12-18 18:26 [PATCH v2 0/6] arm64: ARMv8.4 Activity Monitors support Ionela Voinescu
@ 2019-12-18 18:26 ` Ionela Voinescu
  2020-01-23 17:04   ` Valentin Schneider
  2020-01-28 16:34   ` Suzuki Kuruppassery Poulose
  2019-12-18 18:26 ` [PATCH v2 2/6] arm64: trap to EL1 accesses to AMU counters from EL0 Ionela Voinescu
                   ` (4 subsequent siblings)
  5 siblings, 2 replies; 40+ messages in thread
From: Ionela Voinescu @ 2019-12-18 18:26 UTC (permalink / raw)
  To: catalin.marinas, will, mark.rutland, maz, suzuki.poulose,
	sudeep.holla, dietmar.eggemann, ionela.voinescu
  Cc: peterz, mingo, ggherdovich, vincent.guittot, linux-arm-kernel,
	linux-doc, linux-kernel

The activity monitors extension is an optional extension introduced
by the ARMv8.4 CPU architecture. This implements basic support for
version 1 of the activity monitors architecture, AMUv1.

This support includes:
- Extension detection on each CPU (boot, secondary, hotplugged)
- Register interface for AMU aarch64 registers
- (while here) create defines for ID_PFR0_EL1 fields when adding
  the AMU field information.

Signed-off-by: Ionela Voinescu <ionela.voinescu@arm.com>
Cc: Catalin Marinas <catalin.marinas@arm.com>
Cc: Will Deacon <will@kernel.org>
Cc: Suzuki K Poulose <suzuki.poulose@arm.com>
Cc: Marc Zyngier <maz@kernel.org>
Cc: Mark Rutland <mark.rutland@arm.com>
---
 arch/arm64/Kconfig                  | 27 ++++++++++
 arch/arm64/include/asm/cpucaps.h    |  3 +-
 arch/arm64/include/asm/cpufeature.h |  4 ++
 arch/arm64/include/asm/sysreg.h     | 44 ++++++++++++++++
 arch/arm64/kernel/cpufeature.c      | 81 +++++++++++++++++++++++++++--
 5 files changed, 154 insertions(+), 5 deletions(-)

diff --git a/arch/arm64/Kconfig b/arch/arm64/Kconfig
index ac31ed6184d0..6ae7bfa5812e 100644
--- a/arch/arm64/Kconfig
+++ b/arch/arm64/Kconfig
@@ -1485,6 +1485,33 @@ config ARM64_PTR_AUTH
 
 endmenu
 
+menu "ARMv8.4 architectural features"
+
+config ARM64_AMU_EXTN
+	bool "Enable support for the Activity Monitors Unit CPU extension"
+	default y
+	help
+          The activity monitors extension is an optional extension introduced
+          by the ARMv8.4 CPU architecture. This enables support for version 1
+          of the activity monitors architecture, AMUv1.
+
+          To enable the use of this extension on CPUs that implement it, say Y.
+
+          Note that for architectural reasons, firmware _must_ implement AMU
+          support when running on CPUs that present the activity monitors
+          extension. The required support is present in:
+            * Version 1.5 and later of the ARM Trusted Firmware
+
+          For kernels that have this configuration enabled but boot with broken
+          firmware, you may need to say N here until the firmware is fixed.
+          Otherwise you may experience firmware panics or lockups when
+          accessing the counter registers. Even if you are not observing these
+          symptoms, the values returned by the register reads might not
+          correctly reflect reality. Most commonly, the value read will be 0,
+          indicating that the counter is not enabled.
+
+endmenu
+
 config ARM64_SVE
 	bool "ARM Scalable Vector Extension support"
 	default y
diff --git a/arch/arm64/include/asm/cpucaps.h b/arch/arm64/include/asm/cpucaps.h
index b92683871119..7dde890bde50 100644
--- a/arch/arm64/include/asm/cpucaps.h
+++ b/arch/arm64/include/asm/cpucaps.h
@@ -56,7 +56,8 @@
 #define ARM64_WORKAROUND_CAVIUM_TX2_219_PRFM	46
 #define ARM64_WORKAROUND_1542419		47
 #define ARM64_WORKAROUND_1319367		48
+#define ARM64_HAS_AMU_EXTN			49
 
-#define ARM64_NCAPS				49
+#define ARM64_NCAPS				50
 
 #endif /* __ASM_CPUCAPS_H */
diff --git a/arch/arm64/include/asm/cpufeature.h b/arch/arm64/include/asm/cpufeature.h
index 4261d55e8506..b89e799d6972 100644
--- a/arch/arm64/include/asm/cpufeature.h
+++ b/arch/arm64/include/asm/cpufeature.h
@@ -673,6 +673,10 @@ static inline bool cpu_has_hw_af(void)
 						ID_AA64MMFR1_HADBS_SHIFT);
 }
 
+#ifdef CONFIG_ARM64_AMU_EXTN
+extern inline bool cpu_has_amu_feat(void);
+#endif
+
 #endif /* __ASSEMBLY__ */
 
 #endif
diff --git a/arch/arm64/include/asm/sysreg.h b/arch/arm64/include/asm/sysreg.h
index 6e919fafb43d..bfcc87953a68 100644
--- a/arch/arm64/include/asm/sysreg.h
+++ b/arch/arm64/include/asm/sysreg.h
@@ -382,6 +382,42 @@
 #define SYS_TPIDR_EL0			sys_reg(3, 3, 13, 0, 2)
 #define SYS_TPIDRRO_EL0			sys_reg(3, 3, 13, 0, 3)
 
+/* Definitions for system register interface to AMU for ARMv8.4 onwards */
+#define SYS_AM_EL0(crm, op2)		sys_reg(3, 3, 13, crm, op2)
+#define SYS_AMCR_EL0			SYS_AM_EL0(2, 0)
+#define SYS_AMCFGR_EL0			SYS_AM_EL0(2, 1)
+#define SYS_AMCGCR_EL0			SYS_AM_EL0(2, 2)
+#define SYS_AMUSERENR_EL0		SYS_AM_EL0(2, 3)
+#define SYS_AMCNTENCLR0_EL0		SYS_AM_EL0(2, 4)
+#define SYS_AMCNTENSET0_EL0		SYS_AM_EL0(2, 5)
+#define SYS_AMCNTENCLR1_EL0		SYS_AM_EL0(3, 0)
+#define SYS_AMCNTENSET1_EL0		SYS_AM_EL0(3, 1)
+
+/*
+ * Group 0 of activity monitors (architected):
+ *                op0 CRn   op1   op2     CRm
+ * Counter:       11  1101  011   n<2:0>  010:n<3>
+ * Type:          11  1101  011   n<2:0>  011:n<3>
+ * n: 0-3
+ *
+ * Group 1 of activity monitors (auxiliary):
+ *                op0 CRn   op1   op2     CRm
+ * Counter:       11  1101  011   n<2:0>  110:n<3>
+ * Type:          11  1101  011   n<2:0>  111:n<3>
+ * n: 0-15
+ */
+
+#define SYS_AMEVCNTR0_EL0(n)            SYS_AM_EL0(4 + ((n) >> 3), (n) & 0x7)
+#define SYS_AMEVTYPE0_EL0(n)            SYS_AM_EL0(6 + ((n) >> 3), (n) & 0x7)
+#define SYS_AMEVCNTR1_EL0(n)            SYS_AM_EL0(12 + ((n) >> 3), (n) & 0x7)
+#define SYS_AMEVTYPE1_EL0(n)            SYS_AM_EL0(14 + ((n) >> 3), (n) & 0x7)
+
+/* V1: Fixed (architecturally defined) activity monitors */
+#define SYS_AMEVCNTR0_CORE_EL0          SYS_AMEVCNTR0_EL0(0)
+#define SYS_AMEVCNTR0_CONST_EL0         SYS_AMEVCNTR0_EL0(1)
+#define SYS_AMEVCNTR0_INST_RET_EL0      SYS_AMEVCNTR0_EL0(2)
+#define SYS_AMEVCNTR0_MEM_STALL         SYS_AMEVCNTR0_EL0(3)
+
 #define SYS_CNTFRQ_EL0			sys_reg(3, 3, 14, 0, 0)
 
 #define SYS_CNTP_TVAL_EL0		sys_reg(3, 3, 14, 2, 0)
@@ -577,6 +613,7 @@
 #define ID_AA64PFR0_CSV3_SHIFT		60
 #define ID_AA64PFR0_CSV2_SHIFT		56
 #define ID_AA64PFR0_DIT_SHIFT		48
+#define ID_AA64PFR0_AMU_SHIFT		44
 #define ID_AA64PFR0_SVE_SHIFT		32
 #define ID_AA64PFR0_RAS_SHIFT		28
 #define ID_AA64PFR0_GIC_SHIFT		24
@@ -587,6 +624,7 @@
 #define ID_AA64PFR0_EL1_SHIFT		4
 #define ID_AA64PFR0_EL0_SHIFT		0
 
+#define ID_AA64PFR0_AMU			0x1
 #define ID_AA64PFR0_SVE			0x1
 #define ID_AA64PFR0_RAS_V1		0x1
 #define ID_AA64PFR0_FP_NI		0xf
@@ -709,6 +747,12 @@
 #define ID_AA64MMFR0_TGRAN16_NI		0x0
 #define ID_AA64MMFR0_TGRAN16_SUPPORTED	0x1
 
+#define ID_PFR0_AMU_SHIFT		20
+#define ID_PFR0_STATE3_SHIFT		12
+#define ID_PFR0_STATE2_SHIFT		8
+#define ID_PFR0_STATE1_SHIFT		4
+#define ID_PFR0_STATE0_SHIFT		0
+
 #if defined(CONFIG_ARM64_4K_PAGES)
 #define ID_AA64MMFR0_TGRAN_SHIFT	ID_AA64MMFR0_TGRAN4_SHIFT
 #define ID_AA64MMFR0_TGRAN_SUPPORTED	ID_AA64MMFR0_TGRAN4_SUPPORTED
diff --git a/arch/arm64/kernel/cpufeature.c b/arch/arm64/kernel/cpufeature.c
index 04cf64e9f0c9..c639b3e052d7 100644
--- a/arch/arm64/kernel/cpufeature.c
+++ b/arch/arm64/kernel/cpufeature.c
@@ -156,6 +156,7 @@ static const struct arm64_ftr_bits ftr_id_aa64pfr0[] = {
 	ARM64_FTR_BITS(FTR_HIDDEN, FTR_NONSTRICT, FTR_LOWER_SAFE, ID_AA64PFR0_CSV3_SHIFT, 4, 0),
 	ARM64_FTR_BITS(FTR_HIDDEN, FTR_NONSTRICT, FTR_LOWER_SAFE, ID_AA64PFR0_CSV2_SHIFT, 4, 0),
 	ARM64_FTR_BITS(FTR_VISIBLE, FTR_STRICT, FTR_LOWER_SAFE, ID_AA64PFR0_DIT_SHIFT, 4, 0),
+	ARM64_FTR_BITS(FTR_HIDDEN, FTR_NONSTRICT, FTR_LOWER_SAFE, ID_AA64PFR0_AMU_SHIFT, 4, 0),
 	ARM64_FTR_BITS(FTR_VISIBLE_IF_IS_ENABLED(CONFIG_ARM64_SVE),
 				   FTR_STRICT, FTR_LOWER_SAFE, ID_AA64PFR0_SVE_SHIFT, 4, 0),
 	ARM64_FTR_BITS(FTR_HIDDEN, FTR_STRICT, FTR_LOWER_SAFE, ID_AA64PFR0_RAS_SHIFT, 4, 0),
@@ -314,10 +315,11 @@ static const struct arm64_ftr_bits ftr_id_mmfr4[] = {
 };
 
 static const struct arm64_ftr_bits ftr_id_pfr0[] = {
-	ARM64_FTR_BITS(FTR_HIDDEN, FTR_STRICT, FTR_LOWER_SAFE, 12, 4, 0),		/* State3 */
-	ARM64_FTR_BITS(FTR_HIDDEN, FTR_STRICT, FTR_LOWER_SAFE, 8, 4, 0),		/* State2 */
-	ARM64_FTR_BITS(FTR_HIDDEN, FTR_STRICT, FTR_LOWER_SAFE, 4, 4, 0),		/* State1 */
-	ARM64_FTR_BITS(FTR_HIDDEN, FTR_STRICT, FTR_LOWER_SAFE, 0, 4, 0),		/* State0 */
+	ARM64_FTR_BITS(FTR_HIDDEN, FTR_STRICT, FTR_LOWER_SAFE, ID_PFR0_AMU_SHIFT, 4, 0),
+	ARM64_FTR_BITS(FTR_HIDDEN, FTR_STRICT, FTR_LOWER_SAFE, ID_PFR0_STATE3_SHIFT, 4, 0),
+	ARM64_FTR_BITS(FTR_HIDDEN, FTR_STRICT, FTR_LOWER_SAFE, ID_PFR0_STATE2_SHIFT, 4, 0),
+	ARM64_FTR_BITS(FTR_HIDDEN, FTR_STRICT, FTR_LOWER_SAFE, ID_PFR0_STATE1_SHIFT, 4, 0),
+	ARM64_FTR_BITS(FTR_HIDDEN, FTR_STRICT, FTR_LOWER_SAFE, ID_PFR0_STATE0_SHIFT, 4, 0),
 	ARM64_FTR_END,
 };
 
@@ -1150,6 +1152,59 @@ static bool has_hw_dbm(const struct arm64_cpu_capabilities *cap,
 
 #endif
 
+#ifdef CONFIG_ARM64_AMU_EXTN
+
+/*
+ * This per cpu variable only signals that the CPU implementation supports
+ * the Activity Monitors Unit (AMU) but does not provide information
+ * regarding all the events that it supports.
+ * When this amu_feat per CPU variable is true, the user of this feature
+ * can only rely on the presence of the 4 fixed counters. But this does
+ * not guarantee that the counters are enabled or access to these counters
+ * is provided by code executed at higher exception levels.
+ *
+ * Also, to ensure the safe use of this per_cpu variable, the following
+ * accessor is defined to allow a read of amu_feat for the current cpu only
+ * from the current cpu.
+ *  - cpu_has_amu_feat()
+ */
+static DEFINE_PER_CPU_READ_MOSTLY(u8, amu_feat);
+
+inline bool cpu_has_amu_feat(void)
+{
+	return !!this_cpu_read(amu_feat);
+}
+
+static void cpu_amu_enable(struct arm64_cpu_capabilities const *cap)
+{
+	if (has_cpuid_feature(cap, SCOPE_LOCAL_CPU)) {
+		pr_info("detected CPU%d: Activity Monitors Unit (AMU)\n",
+			smp_processor_id());
+		this_cpu_write(amu_feat, 1);
+	}
+}
+
+static bool has_amu(const struct arm64_cpu_capabilities *cap,
+		       int __unused)
+{
+	/*
+	 * The AMU extension is a non-conflicting feature: the kernel can
+	 * safely run a mix of CPUs with and without support for the
+	 * activity monitors extension.
+	 * Therefore, unconditionally enable the capability to allow
+	 * any late CPU to use the feature.
+	 *
+	 * With this feature unconditionally enabled, the cpu_enable
+	 * function will be called for all CPUs that match the criteria,
+	 * including secondary and hotplugged, marking this feature as
+	 * present on that respective CPU. The enable function will also
+	 * print a detection message.
+	 */
+
+	return true;
+}
+#endif
+
 #ifdef CONFIG_ARM64_VHE
 static bool runs_at_el2(const struct arm64_cpu_capabilities *entry, int __unused)
 {
@@ -1419,6 +1474,24 @@ static const struct arm64_cpu_capabilities arm64_features[] = {
 		.cpu_enable = cpu_clear_disr,
 	},
 #endif /* CONFIG_ARM64_RAS_EXTN */
+#ifdef CONFIG_ARM64_AMU_EXTN
+	{
+		/*
+		 * The feature is enabled by default if CONFIG_ARM64_AMU_EXTN=y.
+		 * Therefore, don't provide .desc as we don't want the detection
+		 * message to be shown until at least one CPU is detected to
+		 * support the feature.
+		 */
+		.capability = ARM64_HAS_AMU_EXTN,
+		.type = ARM64_CPUCAP_WEAK_LOCAL_CPU_FEATURE,
+		.matches = has_amu,
+		.sys_reg = SYS_ID_AA64PFR0_EL1,
+		.sign = FTR_UNSIGNED,
+		.field_pos = ID_AA64PFR0_AMU_SHIFT,
+		.min_field_value = ID_AA64PFR0_AMU,
+		.cpu_enable = cpu_amu_enable,
+	},
+#endif /* CONFIG_ARM64_AMU_EXTN */
 	{
 		.desc = "Data cache clean to the PoU not required for I/D coherence",
 		.capability = ARM64_HAS_CACHE_IDC,
-- 
2.17.1


^ permalink raw reply	[flat|nested] 40+ messages in thread

* [PATCH v2 2/6] arm64: trap to EL1 accesses to AMU counters from EL0
  2019-12-18 18:26 [PATCH v2 0/6] arm64: ARMv8.4 Activity Monitors support Ionela Voinescu
  2019-12-18 18:26 ` [PATCH v2 1/6] arm64: add support for the AMU extension v1 Ionela Voinescu
@ 2019-12-18 18:26 ` Ionela Voinescu
  2020-01-23 17:04   ` Valentin Schneider
  2019-12-18 18:26 ` [PATCH v2 3/6] arm64/kvm: disable access to AMU registers from kvm guests Ionela Voinescu
                   ` (3 subsequent siblings)
  5 siblings, 1 reply; 40+ messages in thread
From: Ionela Voinescu @ 2019-12-18 18:26 UTC (permalink / raw)
  To: catalin.marinas, will, mark.rutland, maz, suzuki.poulose,
	sudeep.holla, dietmar.eggemann, ionela.voinescu
  Cc: peterz, mingo, ggherdovich, vincent.guittot, linux-arm-kernel,
	linux-doc, linux-kernel, Steve Capper

The activity monitors extension is an optional extension introduced
by the ARMv8.4 CPU architecture. In order to access the activity
monitors counters safely, if desired, the kernel should detect the
presence of the extension through the feature register, and mediate
the access.

Therefore, disable direct accesses to activity monitors counters
from EL0 (userspace) and trap them to EL1 (kernel).

Signed-off-by: Ionela Voinescu <ionela.voinescu@arm.com>
Cc: Catalin Marinas <catalin.marinas@arm.com>
Cc: Will Deacon <will@kernel.org>
Cc: Mark Rutland <mark.rutland@arm.com>
Cc: Steve Capper <steve.capper@arm.com>
---
 arch/arm64/include/asm/assembler.h | 10 ++++++++++
 arch/arm64/mm/proc.S               |  3 +++
 2 files changed, 13 insertions(+)

diff --git a/arch/arm64/include/asm/assembler.h b/arch/arm64/include/asm/assembler.h
index 2cc0dd8bd9f7..83bb499e8916 100644
--- a/arch/arm64/include/asm/assembler.h
+++ b/arch/arm64/include/asm/assembler.h
@@ -443,6 +443,16 @@ USER(\label, ic	ivau, \tmp2)			// invalidate I line PoU
 9000:
 	.endm
 
+/*
+ * reset_amuserenr_el0 - reset AMUSERENR_EL0 if AMUv1 present
+ */
+	.macro	reset_amuserenr_el0, tmpreg
+	mrs	\tmpreg, id_aa64pfr0_el1	// Check ID_AA64PFR0_EL1
+	ubfx	\tmpreg, \tmpreg, #ID_AA64PFR0_AMU_SHIFT, #4
+	cbz	\tmpreg, 9000f			// Skip if no AMU present
+	msr_s	SYS_AMUSERENR_EL0, xzr		// Disable AMU access from EL0
+9000:
+	.endm
 /*
  * copy_page - copy src to dest using temp registers t1-t8
  */
diff --git a/arch/arm64/mm/proc.S b/arch/arm64/mm/proc.S
index a1e0592d1fbc..d8aae1152c08 100644
--- a/arch/arm64/mm/proc.S
+++ b/arch/arm64/mm/proc.S
@@ -124,6 +124,7 @@ alternative_endif
 	ubfx	x11, x11, #1, #1
 	msr	oslar_el1, x11
 	reset_pmuserenr_el0 x0			// Disable PMU access from EL0
+	reset_amuserenr_el0 x0			// Disable AMU access from EL0
 
 alternative_if ARM64_HAS_RAS_EXTN
 	msr_s	SYS_DISR_EL1, xzr
@@ -415,6 +416,8 @@ ENTRY(__cpu_setup)
 	isb					// Unmask debug exceptions now,
 	enable_dbg				// since this is per-cpu
 	reset_pmuserenr_el0 x0			// Disable PMU access from EL0
+	reset_amuserenr_el0 x0			// Disable AMU access from EL0
+
 	/*
 	 * Memory region attributes for LPAE:
 	 *
-- 
2.17.1


^ permalink raw reply	[flat|nested] 40+ messages in thread

* [PATCH v2 3/6] arm64/kvm: disable access to AMU registers from kvm guests
  2019-12-18 18:26 [PATCH v2 0/6] arm64: ARMv8.4 Activity Monitors support Ionela Voinescu
  2019-12-18 18:26 ` [PATCH v2 1/6] arm64: add support for the AMU extension v1 Ionela Voinescu
  2019-12-18 18:26 ` [PATCH v2 2/6] arm64: trap to EL1 accesses to AMU counters from EL0 Ionela Voinescu
@ 2019-12-18 18:26 ` Ionela Voinescu
  2020-01-27 15:33   ` Valentin Schneider
  2019-12-18 18:26 ` [PATCH v2 4/6] Documentation: arm64: document support for the AMU extension Ionela Voinescu
                   ` (2 subsequent siblings)
  5 siblings, 1 reply; 40+ messages in thread
From: Ionela Voinescu @ 2019-12-18 18:26 UTC (permalink / raw)
  To: catalin.marinas, will, mark.rutland, maz, suzuki.poulose,
	sudeep.holla, dietmar.eggemann, ionela.voinescu
  Cc: peterz, mingo, ggherdovich, vincent.guittot, linux-arm-kernel,
	linux-doc, linux-kernel, James Morse, Julien Thierry

Access to the AMU counters should be disabled by default in kvm guests,
as information from the counters might reveal activity in other guests
or activity on the host.

Therefore, disable access to AMU registers from EL0 and EL1 in kvm
guests by:
 - Hiding the presence of the extension in the feature register
   (SYS_ID_AA64PFR0_EL1 and SYS_ID_PFR0_EL1) on the VCPU.
 - Disabling access to the AMU registers before switching to the guest.
 - Trapping accesses and injecting an undefined instruction into the
   guest.

Signed-off-by: Ionela Voinescu <ionela.voinescu@arm.com>
Cc: Marc Zyngier <maz@kernel.org>
Cc: James Morse <james.morse@arm.com>
Cc: Julien Thierry <julien.thierry.kdev@gmail.com>
Cc: Suzuki K Poulose <suzuki.poulose@arm.com>
Cc: Catalin Marinas <catalin.marinas@arm.com>
Cc: Will Deacon <will@kernel.org>
---
 arch/arm64/include/asm/kvm_arm.h |  7 ++-
 arch/arm64/kvm/hyp/switch.c      | 13 ++++-
 arch/arm64/kvm/sys_regs.c        | 95 +++++++++++++++++++++++++++++++-
 3 files changed, 109 insertions(+), 6 deletions(-)

diff --git a/arch/arm64/include/asm/kvm_arm.h b/arch/arm64/include/asm/kvm_arm.h
index 6e5d839f42b5..dd20fb185d56 100644
--- a/arch/arm64/include/asm/kvm_arm.h
+++ b/arch/arm64/include/asm/kvm_arm.h
@@ -266,10 +266,11 @@
 #define CPTR_EL2_TFP_SHIFT 10
 
 /* Hyp Coprocessor Trap Register */
-#define CPTR_EL2_TCPAC	(1 << 31)
-#define CPTR_EL2_TTA	(1 << 20)
-#define CPTR_EL2_TFP	(1 << CPTR_EL2_TFP_SHIFT)
 #define CPTR_EL2_TZ	(1 << 8)
+#define CPTR_EL2_TFP	(1 << CPTR_EL2_TFP_SHIFT)
+#define CPTR_EL2_TTA	(1 << 20)
+#define CPTR_EL2_TAM	(1 << 30)
+#define CPTR_EL2_TCPAC	(1 << 31)
 #define CPTR_EL2_RES1	0x000032ff /* known RES1 bits in CPTR_EL2 */
 #define CPTR_EL2_DEFAULT	CPTR_EL2_RES1
 
diff --git a/arch/arm64/kvm/hyp/switch.c b/arch/arm64/kvm/hyp/switch.c
index 72fbbd86eb5e..0bca87a2621f 100644
--- a/arch/arm64/kvm/hyp/switch.c
+++ b/arch/arm64/kvm/hyp/switch.c
@@ -90,6 +90,17 @@ static void activate_traps_vhe(struct kvm_vcpu *vcpu)
 	val = read_sysreg(cpacr_el1);
 	val |= CPACR_EL1_TTA;
 	val &= ~CPACR_EL1_ZEN;
+
+	/*
+	 * With VHE enabled, we have HCR_EL2.{E2H,TGE} = {1,1}. Note that in
+	 * this case CPACR_EL1 has the same bit layout as CPTR_EL2, and
+	 * CPACR_EL1 accessing instructions are redefined to access CPTR_EL2.
+	 * Therefore use CPTR_EL2.TAM bit reference to activate AMU register
+	 * traps.
+	 */
+
+	val |= CPTR_EL2_TAM;
+
 	if (update_fp_enabled(vcpu)) {
 		if (vcpu_has_sve(vcpu))
 			val |= CPACR_EL1_ZEN;
@@ -111,7 +122,7 @@ static void __hyp_text __activate_traps_nvhe(struct kvm_vcpu *vcpu)
 	__activate_traps_common(vcpu);
 
 	val = CPTR_EL2_DEFAULT;
-	val |= CPTR_EL2_TTA | CPTR_EL2_TZ;
+	val |= CPTR_EL2_TTA | CPTR_EL2_TZ | CPTR_EL2_TAM;
 	if (!update_fp_enabled(vcpu)) {
 		val |= CPTR_EL2_TFP;
 		__activate_traps_fpsimd32(vcpu);
diff --git a/arch/arm64/kvm/sys_regs.c b/arch/arm64/kvm/sys_regs.c
index 9f2165937f7d..940ab9b4c98b 100644
--- a/arch/arm64/kvm/sys_regs.c
+++ b/arch/arm64/kvm/sys_regs.c
@@ -1003,6 +1003,20 @@ static bool access_pmuserenr(struct kvm_vcpu *vcpu, struct sys_reg_params *p,
 	{ SYS_DESC(SYS_PMEVTYPERn_EL0(n)),					\
 	  access_pmu_evtyper, reset_unknown, (PMEVTYPER0_EL0 + n), }
 
+static bool access_amu(struct kvm_vcpu *vcpu, struct sys_reg_params *p,
+			     const struct sys_reg_desc *r)
+{
+	kvm_inject_undefined(vcpu);
+
+	return false;
+}
+
+/* Macro to expand the AMU counter and type registers*/
+#define AMU_AMEVCNTR0_EL0(n) { SYS_DESC(SYS_AMEVCNTR0_EL0(n)), access_amu }
+#define AMU_AMEVTYPE0_EL0(n) { SYS_DESC(SYS_AMEVTYPE0_EL0(n)), access_amu }
+#define AMU_AMEVCNTR1_EL0(n) { SYS_DESC(SYS_AMEVCNTR1_EL0(n)), access_amu }
+#define AMU_AMEVTYPE1_EL0(n) { SYS_DESC(SYS_AMEVTYPE1_EL0(n)), access_amu }
+
 static bool trap_ptrauth(struct kvm_vcpu *vcpu,
 			 struct sys_reg_params *p,
 			 const struct sys_reg_desc *rd)
@@ -1078,8 +1092,12 @@ static u64 read_id_reg(const struct kvm_vcpu *vcpu,
 			 (u32)r->CRn, (u32)r->CRm, (u32)r->Op2);
 	u64 val = raz ? 0 : read_sanitised_ftr_reg(id);
 
-	if (id == SYS_ID_AA64PFR0_EL1 && !vcpu_has_sve(vcpu)) {
-		val &= ~(0xfUL << ID_AA64PFR0_SVE_SHIFT);
+	if (id == SYS_ID_AA64PFR0_EL1) {
+		if (!vcpu_has_sve(vcpu))
+			val &= ~(0xfUL << ID_AA64PFR0_SVE_SHIFT);
+		val &= ~(0xfUL << ID_AA64PFR0_AMU_SHIFT);
+	} else if (id == SYS_ID_PFR0_EL1) {
+		val &= ~(0xfUL << ID_PFR0_AMU_SHIFT);
 	} else if (id == SYS_ID_AA64ISAR1_EL1 && !vcpu_has_ptrauth(vcpu)) {
 		val &= ~((0xfUL << ID_AA64ISAR1_APA_SHIFT) |
 			 (0xfUL << ID_AA64ISAR1_API_SHIFT) |
@@ -1565,6 +1583,79 @@ static const struct sys_reg_desc sys_reg_descs[] = {
 	{ SYS_DESC(SYS_TPIDR_EL0), NULL, reset_unknown, TPIDR_EL0 },
 	{ SYS_DESC(SYS_TPIDRRO_EL0), NULL, reset_unknown, TPIDRRO_EL0 },
 
+	{ SYS_DESC(SYS_AMCR_EL0), access_amu },
+	{ SYS_DESC(SYS_AMCFGR_EL0), access_amu },
+	{ SYS_DESC(SYS_AMCGCR_EL0), access_amu },
+	{ SYS_DESC(SYS_AMUSERENR_EL0), access_amu },
+	{ SYS_DESC(SYS_AMCNTENCLR0_EL0), access_amu },
+	{ SYS_DESC(SYS_AMCNTENSET0_EL0), access_amu },
+	{ SYS_DESC(SYS_AMCNTENCLR1_EL0), access_amu },
+	{ SYS_DESC(SYS_AMCNTENSET1_EL0), access_amu },
+	AMU_AMEVCNTR0_EL0(0),
+	AMU_AMEVCNTR0_EL0(1),
+	AMU_AMEVCNTR0_EL0(2),
+	AMU_AMEVCNTR0_EL0(3),
+	AMU_AMEVCNTR0_EL0(4),
+	AMU_AMEVCNTR0_EL0(5),
+	AMU_AMEVCNTR0_EL0(6),
+	AMU_AMEVCNTR0_EL0(7),
+	AMU_AMEVCNTR0_EL0(8),
+	AMU_AMEVCNTR0_EL0(9),
+	AMU_AMEVCNTR0_EL0(10),
+	AMU_AMEVCNTR0_EL0(11),
+	AMU_AMEVCNTR0_EL0(12),
+	AMU_AMEVCNTR0_EL0(13),
+	AMU_AMEVCNTR0_EL0(14),
+	AMU_AMEVCNTR0_EL0(15),
+	AMU_AMEVTYPE0_EL0(0),
+	AMU_AMEVTYPE0_EL0(1),
+	AMU_AMEVTYPE0_EL0(2),
+	AMU_AMEVTYPE0_EL0(3),
+	AMU_AMEVTYPE0_EL0(4),
+	AMU_AMEVTYPE0_EL0(5),
+	AMU_AMEVTYPE0_EL0(6),
+	AMU_AMEVTYPE0_EL0(7),
+	AMU_AMEVTYPE0_EL0(8),
+	AMU_AMEVTYPE0_EL0(9),
+	AMU_AMEVTYPE0_EL0(10),
+	AMU_AMEVTYPE0_EL0(11),
+	AMU_AMEVTYPE0_EL0(12),
+	AMU_AMEVTYPE0_EL0(13),
+	AMU_AMEVTYPE0_EL0(14),
+	AMU_AMEVTYPE0_EL0(15),
+	AMU_AMEVCNTR1_EL0(0),
+	AMU_AMEVCNTR1_EL0(1),
+	AMU_AMEVCNTR1_EL0(2),
+	AMU_AMEVCNTR1_EL0(3),
+	AMU_AMEVCNTR1_EL0(4),
+	AMU_AMEVCNTR1_EL0(5),
+	AMU_AMEVCNTR1_EL0(6),
+	AMU_AMEVCNTR1_EL0(7),
+	AMU_AMEVCNTR1_EL0(8),
+	AMU_AMEVCNTR1_EL0(9),
+	AMU_AMEVCNTR1_EL0(10),
+	AMU_AMEVCNTR1_EL0(11),
+	AMU_AMEVCNTR1_EL0(12),
+	AMU_AMEVCNTR1_EL0(13),
+	AMU_AMEVCNTR1_EL0(14),
+	AMU_AMEVCNTR1_EL0(15),
+	AMU_AMEVTYPE1_EL0(0),
+	AMU_AMEVTYPE1_EL0(1),
+	AMU_AMEVTYPE1_EL0(2),
+	AMU_AMEVTYPE1_EL0(3),
+	AMU_AMEVTYPE1_EL0(4),
+	AMU_AMEVTYPE1_EL0(5),
+	AMU_AMEVTYPE1_EL0(6),
+	AMU_AMEVTYPE1_EL0(7),
+	AMU_AMEVTYPE1_EL0(8),
+	AMU_AMEVTYPE1_EL0(9),
+	AMU_AMEVTYPE1_EL0(10),
+	AMU_AMEVTYPE1_EL0(11),
+	AMU_AMEVTYPE1_EL0(12),
+	AMU_AMEVTYPE1_EL0(13),
+	AMU_AMEVTYPE1_EL0(14),
+	AMU_AMEVTYPE1_EL0(15),
+
 	{ SYS_DESC(SYS_CNTP_TVAL_EL0), access_arch_timer },
 	{ SYS_DESC(SYS_CNTP_CTL_EL0), access_arch_timer },
 	{ SYS_DESC(SYS_CNTP_CVAL_EL0), access_arch_timer },
-- 
2.17.1


^ permalink raw reply	[flat|nested] 40+ messages in thread

* [PATCH v2 4/6] Documentation: arm64: document support for the AMU extension
  2019-12-18 18:26 [PATCH v2 0/6] arm64: ARMv8.4 Activity Monitors support Ionela Voinescu
                   ` (2 preceding siblings ...)
  2019-12-18 18:26 ` [PATCH v2 3/6] arm64/kvm: disable access to AMU registers from kvm guests Ionela Voinescu
@ 2019-12-18 18:26 ` Ionela Voinescu
  2020-01-27 16:47   ` Valentin Schneider
  2020-01-30 15:04   ` Suzuki Kuruppassery Poulose
  2019-12-18 18:26 ` [PATCH v2 5/6] TEMP: sched: add interface for counter-based frequency invariance Ionela Voinescu
  2019-12-18 18:26 ` [PATCH v2 6/6] arm64: use activity monitors for " Ionela Voinescu
  5 siblings, 2 replies; 40+ messages in thread
From: Ionela Voinescu @ 2019-12-18 18:26 UTC (permalink / raw)
  To: catalin.marinas, will, mark.rutland, maz, suzuki.poulose,
	sudeep.holla, dietmar.eggemann, ionela.voinescu
  Cc: peterz, mingo, ggherdovich, vincent.guittot, linux-arm-kernel,
	linux-doc, linux-kernel, Jonathan Corbet

The activity monitors extension is an optional extension introduced
by the ARMv8.4 CPU architecture.

Add initial documentation for the AMUv1 extension:
 - arm64/amu.txt: AMUv1 documentation
 - arm64/booting.txt: system registers initialisation
 - arm64/cpu-feature-registers.txt: visibility to userspace

Signed-off-by: Ionela Voinescu <ionela.voinescu@arm.com>
Cc: Catalin Marinas <catalin.marinas@arm.com>
Cc: Will Deacon <will@kernel.org>
Cc: Jonathan Corbet <corbet@lwn.net>
---
 Documentation/arm64/amu.rst                   | 107 ++++++++++++++++++
 Documentation/arm64/booting.rst               |  14 +++
 Documentation/arm64/cpu-feature-registers.rst |   2 +
 Documentation/arm64/index.rst                 |   1 +
 4 files changed, 124 insertions(+)
 create mode 100644 Documentation/arm64/amu.rst

diff --git a/Documentation/arm64/amu.rst b/Documentation/arm64/amu.rst
new file mode 100644
index 000000000000..62a6635522e1
--- /dev/null
+++ b/Documentation/arm64/amu.rst
@@ -0,0 +1,107 @@
+=======================================================
+Activity Monitors Unit (AMU) extension in AArch64 Linux
+=======================================================
+
+Author: Ionela Voinescu <ionela.voinescu@arm.com>
+
+Date: 2019-09-10
+
+This document briefly describes the provision of Activity Monitors Unit
+support in AArch64 Linux.
+
+
+Architecture overview
+---------------------
+
+The activity monitors extension is an optional extension introduced by the
+ARMv8.4 CPU architecture.
+
+The activity monitors unit, implemented in each CPU, provides performance
+counters intended for system management use. The AMU extension provides a
+system register interface to the counter registers and also supports an
+optional external memory-mapped interface.
+
+Version 1 of the Activity Monitors architecture implements a counter group
+of four fixed and architecturally defined 64-bit event counters.
+  - CPU cycle counter: increments at the frequency of the CPU.
+  - Constant counter: increments at the fixed frequency of the system
+    clock.
+  - Instructions retired: increments with every architecturally executed
+    instruction.
+  - Memory stall cycles: counts instruction dispatch stall cycles caused by
+    misses in the last level cache within the clock domain.
+
+When in WFI or WFE these counters do not increment.
+
+The Activity Monitors architecture provides space for up to 16 architected
+event counters. Future versions of the architecture may use this space to
+implement additional architected event counters.
+
+Additionally, version 1 implements a counter group of up to 16 auxiliary
+64-bit event counters.
+
+On cold reset all counters reset to 0.
+
+
+Basic support
+-------------
+
+The kernel can safely run a mix of CPUs with and without support for the
+activity monitors extension. Therefore, when CONFIG_ARM64_AMU_EXTN is
+selected we unconditionally enable the capability to allow any late CPU
+(secondary or hotplugged) to detect and use the feature.
+
+When the feature is detected on a CPU, a per-CPU variable (amu_feat) is
+set, but this does not guarantee the correct functionality of the
+counters, only the presence of the extension.
+
+Firmware (code running at higher exception levels, e.g. arm-tf) support is
+needed to:
+ - Enable access for lower exception levels (EL2 and EL1) to the AMU
+   registers.
+ - Enable the counters. If not enabled these will read as 0.
+ - Save/restore the counters before/after the CPU is being put/brought up
+   from the 'off' power state.
+
+When using kernels that have this configuration enabled but boot with
+broken firmware the user may experience panics or lockups when accessing
+the counter registers. Even if these symptoms are not observed, the
+values returned by the register reads might not correctly reflect reality.
+Most commonly, the counters will read as 0, indicating that they are not
+enabled. If proper support is not provided in firmware it's best to disable
+CONFIG_ARM64_AMU_EXTN.
+
+The fixed counters of AMUv1 are accessible though the following system
+register definitions:
+ - SYS_AMEVCNTR0_CORE_EL0
+ - SYS_AMEVCNTR0_CONST_EL0
+ - SYS_AMEVCNTR0_INST_RET_EL0
+ - SYS_AMEVCNTR0_MEM_STALL_EL0
+
+Auxiliary platform specific counters can be accessed using
+SYS_AMEVCNTR1_EL0(n), where n is a value between 0 and 15.
+
+Details can be found in: arch/arm64/include/asm/sysreg.h.
+
+
+Userspace access
+----------------
+
+Currently, access from userspace to the AMU registers is disabled due to:
+ - Security reasons: they might expose information about code executed in
+   secure mode.
+ - Purpose: AMU counters are intended for system management use.
+
+Also, the presence of the feature is not visible to userspace.
+
+
+Virtualization
+--------------
+
+Currently, access from userspace (EL0) and kernelspace (EL1) on the KVM
+guest side is disabled due to:
+ - Security reasons: they might expose information about code executed
+   by other guests or the host.
+
+Any attempt to access the AMU registers will result in an UNDEFINED
+exception being injected into the guest.
diff --git a/Documentation/arm64/booting.rst b/Documentation/arm64/booting.rst
index 5d78a6f5b0ae..a3f1a47b6f1c 100644
--- a/Documentation/arm64/booting.rst
+++ b/Documentation/arm64/booting.rst
@@ -248,6 +248,20 @@ Before jumping into the kernel, the following conditions must be met:
     - HCR_EL2.APK (bit 40) must be initialised to 0b1
     - HCR_EL2.API (bit 41) must be initialised to 0b1
 
+  For CPUs with Activity Monitors Unit v1 (AMUv1) extension present:
+  - If EL3 is present:
+    CPTR_EL3.TAM (bit 30) must be initialised to 0b0
+    CPTR_EL2.TAM (bit 30) must be initialised to 0b0
+    AMCNTENSET0_EL0 must be initialised to 0b1111
+    AMCNTENSET1_EL0 must be initialised to a platform specific value
+    having 0b1 set for the corresponding bit for each of the auxiliary
+    counters present.
+  - If the kernel is entered at EL1:
+    AMCNTENSET0_EL0 must be initialised to 0b1111
+    AMCNTENSET1_EL0 must be initialised to a platform specific value
+    having 0b1 set for the corresponding bit for each of the auxiliary
+    counters present.
+
 The requirements described above for CPU mode, caches, MMUs, architected
 timers, coherency and system registers apply to all CPUs.  All CPUs must
 enter the kernel in the same exception level.
diff --git a/Documentation/arm64/cpu-feature-registers.rst b/Documentation/arm64/cpu-feature-registers.rst
index b6e44884e3ad..4770ae54032b 100644
--- a/Documentation/arm64/cpu-feature-registers.rst
+++ b/Documentation/arm64/cpu-feature-registers.rst
@@ -150,6 +150,8 @@ infrastructure:
      +------------------------------+---------+---------+
      | DIT                          | [51-48] |    y    |
      +------------------------------+---------+---------+
+     | AMU                          | [47-44] |    n    |
+     +------------------------------+---------+---------+
      | SVE                          | [35-32] |    y    |
      +------------------------------+---------+---------+
      | GIC                          | [27-24] |    n    |
diff --git a/Documentation/arm64/index.rst b/Documentation/arm64/index.rst
index 5c0c69dc58aa..09cbb4ed2237 100644
--- a/Documentation/arm64/index.rst
+++ b/Documentation/arm64/index.rst
@@ -6,6 +6,7 @@ ARM64 Architecture
     :maxdepth: 1
 
     acpi_object_usage
+    amu
     arm-acpi
     booting
     cpu-feature-registers
-- 
2.17.1


^ permalink raw reply	[flat|nested] 40+ messages in thread

* [PATCH v2 5/6] TEMP: sched: add interface for counter-based frequency invariance
  2019-12-18 18:26 [PATCH v2 0/6] arm64: ARMv8.4 Activity Monitors support Ionela Voinescu
                   ` (3 preceding siblings ...)
  2019-12-18 18:26 ` [PATCH v2 4/6] Documentation: arm64: document support for the AMU extension Ionela Voinescu
@ 2019-12-18 18:26 ` Ionela Voinescu
  2020-01-29 19:37   ` Peter Zijlstra
  2019-12-18 18:26 ` [PATCH v2 6/6] arm64: use activity monitors for " Ionela Voinescu
  5 siblings, 1 reply; 40+ messages in thread
From: Ionela Voinescu @ 2019-12-18 18:26 UTC (permalink / raw)
  To: catalin.marinas, will, mark.rutland, maz, suzuki.poulose,
	sudeep.holla, dietmar.eggemann, ionela.voinescu
  Cc: peterz, mingo, ggherdovich, vincent.guittot, linux-arm-kernel,
	linux-doc, linux-kernel, Juri Lelli

To be noted that this patch is a temporary one. It introduces the
interface added by the patches at [1] to allow update of the frequency
invariance scale factor based on counters. If [1] is merged there is
not need for this patch.

For platforms that support counters (x86 - APERF/MPERF, arm64 - AMU
counters) the frequency invariance correction factor can be obtained
using a core counter and a fixed counter to get information on the
performance (frequency based only) obtained in a period of time. This
will more accurately reflect the actual current frequency of the CPU,
compared with the alternative implementation that reflects the request
of a performance level from the OS through the cpufreq framework
(arch_set_freq_scale).

Therefore, introduce an interface - arch_scale_freq_tick, to be
implemented by each architecture and called for each CPU on the tick
to update the scale factor based on the delta in the counter values,
if counter support is present on the CPU.

Either because reading counters is expensive or because reading
counters from remote CPUs is not possible or is expensive, only
update the counter based frequency scale factor on the tick for
now. A tick based update will definitely be necessary either due to
it being the only point of update for certain architectures or in
order to cache the counter values for a particular CPU, if a
further update from that CPU is not possible.

[1]
https://lore.kernel.org/lkml/20191113124654.18122-1-ggherdovich@suse.cz/

Suggested-by: Peter Zijlstra <peterz@infradead.org>
Suggested-by: Giovanni Gherdovich <ggherdovich@suse.cz>
Signed-off-by: Ionela Voinescu <ionela.voinescu@arm.com>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Juri Lelli <juri.lelli@redhat.com>
Cc: Vincent Guittot <vincent.guittot@linaro.org>
Cc: Dietmar Eggemann <dietmar.eggemann@arm.com>
---
 kernel/sched/core.c  | 1 +
 kernel/sched/sched.h | 7 +++++++
 2 files changed, 8 insertions(+)

diff --git a/kernel/sched/core.c b/kernel/sched/core.c
index 90e4b00ace89..e0b70b9fb5cc 100644
--- a/kernel/sched/core.c
+++ b/kernel/sched/core.c
@@ -3594,6 +3594,7 @@ void scheduler_tick(void)
 	struct task_struct *curr = rq->curr;
 	struct rq_flags rf;
 
+	arch_scale_freq_tick();
 	sched_clock_tick();
 
 	rq_lock(rq, &rf);
diff --git a/kernel/sched/sched.h b/kernel/sched/sched.h
index 280a3c735935..afdafcf7f9da 100644
--- a/kernel/sched/sched.h
+++ b/kernel/sched/sched.h
@@ -1771,6 +1771,13 @@ static inline void set_next_task(struct rq *rq, struct task_struct *next)
 	next->sched_class->set_next_task(rq, next, false);
 }
 
+#ifndef arch_scale_freq_tick
+static __always_inline
+void arch_scale_freq_tick(void)
+{
+}
+#endif
+
 #ifdef CONFIG_SMP
 #define sched_class_highest (&stop_sched_class)
 #else
-- 
2.17.1


^ permalink raw reply	[flat|nested] 40+ messages in thread

* [PATCH v2 6/6] arm64: use activity monitors for frequency invariance
  2019-12-18 18:26 [PATCH v2 0/6] arm64: ARMv8.4 Activity Monitors support Ionela Voinescu
                   ` (4 preceding siblings ...)
  2019-12-18 18:26 ` [PATCH v2 5/6] TEMP: sched: add interface for counter-based frequency invariance Ionela Voinescu
@ 2019-12-18 18:26 ` Ionela Voinescu
  2020-01-23 11:49   ` Lukasz Luba
  2020-01-29 17:13   ` Valentin Schneider
  5 siblings, 2 replies; 40+ messages in thread
From: Ionela Voinescu @ 2019-12-18 18:26 UTC (permalink / raw)
  To: catalin.marinas, will, mark.rutland, maz, suzuki.poulose,
	sudeep.holla, dietmar.eggemann, ionela.voinescu
  Cc: peterz, mingo, ggherdovich, vincent.guittot, linux-arm-kernel,
	linux-doc, linux-kernel

The Frequency Invariance Engine (FIE) is providing a frequency
scaling correction factor that helps achieve more accurate
load-tracking.

So far, for arm and arm64 platforms, this scale factor has been
obtained based on the ratio between the current frequency and the
maximum supported frequency recorded by the cpufreq policy. The
setting of this scale factor is triggered from cpufreq drivers by
calling arch_set_freq_scale. The current frequency used in computation
is the frequency requested by a governor, but it may not be the
frequency that was implemented by the platform.

This correction factor can also be obtained using a core counter and a
constant counter to get information on the performance (frequency based
only) obtained in a period of time. This will more accurately reflect
the actual current frequency of the CPU, compared with the alternative
implementation that reflects the request of a performance level from
the OS.

Therefore, implement arch_scale_freq_tick to use activity monitors, if
present, for the computation of the frequency scale factor.

The use of AMU counters depends on:
 - CONFIG_ARM64_AMU_EXTN - depents on the AMU extension being present
 - CONFIG_CPU_FREQ - the current frequency obtained using counter
   information is divided by the maximum frequency obtained from the
   cpufreq policy.

While it is possible to have a combination of CPUs in the system with
and without support for activity monitors, the use of counters for
frequency invariance is only enabled for a CPU, if all related CPUs
(CPUs in the same frequency domain) support and have enabled the core
and constant activity monitor counters. In this way, there is a clear
separation between the policies for which arch_set_freq_scale
(cpufreq based FIE) is used, and the policies for which
arch_scale_freq_tick (counter based FIE) is used to set the frequency
scale factor. For this purpose, a cpufreq notifier is registered to
trigger validation work for CPUs and policies at policy creation that
will enable or disable the use of AMU counters for frequency invariance.

Signed-off-by: Ionela Voinescu <ionela.voinescu@arm.com>
Cc: Catalin Marinas <catalin.marinas@arm.com>
Cc: Will Deacon <will@kernel.org>
Cc: Sudeep Holla <sudeep.holla@arm.com>
---
 arch/arm64/include/asm/topology.h |   9 ++
 arch/arm64/kernel/topology.c      | 233 ++++++++++++++++++++++++++++++
 drivers/base/arch_topology.c      |  16 ++
 3 files changed, 258 insertions(+)

diff --git a/arch/arm64/include/asm/topology.h b/arch/arm64/include/asm/topology.h
index a4d945db95a2..98412dd27565 100644
--- a/arch/arm64/include/asm/topology.h
+++ b/arch/arm64/include/asm/topology.h
@@ -19,6 +19,15 @@ int pcibus_to_node(struct pci_bus *bus);
 /* Replace task scheduler's default frequency-invariant accounting */
 #define arch_scale_freq_capacity topology_get_freq_scale
 
+#if defined(CONFIG_ARM64_AMU_EXTN) && defined(CONFIG_CPU_FREQ)
+void topology_scale_freq_tick(void);
+/*
+ * Replace task scheduler's default counter-based frequency-invariance
+ * scale factor setting.
+ */
+#define arch_scale_freq_tick topology_scale_freq_tick
+#endif
+
 /* Replace task scheduler's default cpu-invariant accounting */
 #define arch_scale_cpu_capacity topology_get_cpu_scale
 
diff --git a/arch/arm64/kernel/topology.c b/arch/arm64/kernel/topology.c
index fa9528dfd0ce..61f8264afec9 100644
--- a/arch/arm64/kernel/topology.c
+++ b/arch/arm64/kernel/topology.c
@@ -14,6 +14,7 @@
 #include <linux/acpi.h>
 #include <linux/arch_topology.h>
 #include <linux/cacheinfo.h>
+#include <linux/cpufreq.h>
 #include <linux/init.h>
 #include <linux/percpu.h>
 
@@ -120,4 +121,236 @@ int __init parse_acpi_topology(void)
 }
 #endif
 
+#if defined(CONFIG_ARM64_AMU_EXTN) && defined(CONFIG_CPU_FREQ)
 
+#undef pr_fmt
+#define pr_fmt(fmt) "AMU: " fmt
+
+static void init_fie_counters_done_workfn(struct work_struct *work);
+static DECLARE_WORK(init_fie_counters_done_work,
+		    init_fie_counters_done_workfn);
+
+static struct workqueue_struct *policy_amu_fie_init_wq;
+static struct workqueue_struct *cpu_amu_fie_init_wq;
+
+struct cpu_amu_work {
+	struct work_struct cpu_work;
+	struct work_struct policy_work;
+	unsigned int cpuinfo_max_freq;
+	struct cpumask policy_cpus;
+	bool cpu_amu_fie;
+};
+static struct cpu_amu_work __percpu *works;
+static cpumask_var_t cpus_to_visit;
+
+static DEFINE_PER_CPU_READ_MOSTLY(unsigned long, arch_max_freq_scale);
+static DEFINE_PER_CPU(u64, arch_const_cycles_prev);
+static DEFINE_PER_CPU(u64, arch_core_cycles_prev);
+DECLARE_PER_CPU(u8, amu_scale_freq);
+
+static void cpu_amu_fie_init_workfn(struct work_struct *work)
+{
+	u64 core_cnt, const_cnt, ratio;
+	struct cpu_amu_work *amu_work;
+	int cpu = smp_processor_id();
+
+	if (!cpu_has_amu_feat()) {
+		pr_debug("CPU%d: counters are not supported.\n", cpu);
+		return;
+	}
+
+	core_cnt = read_sysreg_s(SYS_AMEVCNTR0_CORE_EL0);
+	const_cnt = read_sysreg_s(SYS_AMEVCNTR0_CONST_EL0);
+
+	if (unlikely(!core_cnt || !const_cnt)) {
+		pr_err("CPU%d: cycle counters are not enabled.\n", cpu);
+		return;
+	}
+
+	amu_work = container_of(work, struct cpu_amu_work, cpu_work);
+	if (unlikely(!(amu_work->cpuinfo_max_freq))) {
+		pr_err("CPU%d: invalid maximum frequency.\n", cpu);
+		return;
+	}
+
+	/*
+	 * Pre-compute the fixed ratio between the frequency of the
+	 * constant counter and the maximum frequency of the CPU (hz).
+	 */
+	ratio = (u64)arch_timer_get_rate() << (2 * SCHED_CAPACITY_SHIFT);
+	ratio = div64_u64(ratio, amu_work->cpuinfo_max_freq * 1000);
+	this_cpu_write(arch_max_freq_scale, (unsigned long)ratio);
+
+	this_cpu_write(arch_core_cycles_prev, core_cnt);
+	this_cpu_write(arch_const_cycles_prev, const_cnt);
+	amu_work->cpu_amu_fie = true;
+}
+
+static void policy_amu_fie_init_workfn(struct work_struct *work)
+{
+	struct cpu_amu_work *amu_work;
+	u8 enable;
+	int cpu;
+
+	amu_work = container_of(work, struct cpu_amu_work, policy_work);
+
+	flush_workqueue(cpu_amu_fie_init_wq);
+
+	for_each_cpu(cpu, &amu_work->policy_cpus)
+		if (!(per_cpu_ptr(works, cpu)->cpu_amu_fie))
+			break;
+
+	enable = (cpu >= nr_cpu_ids) ? 1 : 0;
+
+	for_each_cpu(cpu, &amu_work->policy_cpus)
+		per_cpu(amu_scale_freq, cpu) = enable;
+
+	pr_info("CPUs[%*pbl]: counters %s be used for FIE.",
+		cpumask_pr_args(&amu_work->policy_cpus),
+		enable ? "will" : "WON'T");
+}
+
+static int init_fie_counters_callback(struct notifier_block *nb,
+				      unsigned long val,
+				      void *data)
+{
+	struct cpufreq_policy *policy = data;
+	struct cpu_amu_work *work;
+	int cpu;
+
+	if (val != CPUFREQ_CREATE_POLICY)
+		return 0;
+
+	/* Return if not all related CPUs are online */
+	if (!cpumask_equal(policy->cpus, policy->related_cpus)) {
+		pr_info("CPUs[%*pbl]: counters WON'T be used for FIE.",
+			cpumask_pr_args(policy->related_cpus));
+		return 0;
+	}
+
+	/*
+	 * Queue functions on all online CPUs from policy to:
+	 *  - Check support and enablement for AMU counters
+	 *  - Store system freq to max freq ratio per cpu
+	 *  - Flag CPU as valid for use of counters for FIE
+	 */
+	for_each_cpu(cpu, policy->cpus) {
+		work = per_cpu_ptr(works, cpu);
+		work->cpuinfo_max_freq = policy->cpuinfo.max_freq;
+		work->cpu_amu_fie = false;
+		INIT_WORK(&work->cpu_work, cpu_amu_fie_init_workfn);
+		queue_work_on(cpu, cpu_amu_fie_init_wq, &work->cpu_work);
+	}
+
+	/*
+	 * Queue function to validate support at policy level:
+	 *  - Flush all work on online policy CPUs
+	 *  - Verify that all online policy CPUs are flagged as
+	 *    valid for use of counters for FIE
+	 *  - Enable or disable use of counters for FIE on CPUs
+	 */
+	work = per_cpu_ptr(works, cpumask_first(policy->cpus));
+	cpumask_copy(&work->policy_cpus, policy->cpus);
+	INIT_WORK(&work->policy_work, policy_amu_fie_init_workfn);
+	queue_work(policy_amu_fie_init_wq, &work->policy_work);
+
+	cpumask_andnot(cpus_to_visit, cpus_to_visit, policy->cpus);
+	if (cpumask_empty(cpus_to_visit))
+		schedule_work(&init_fie_counters_done_work);
+
+	return 0;
+}
+
+static struct notifier_block init_fie_counters_notifier = {
+	.notifier_call = init_fie_counters_callback,
+};
+
+static void init_fie_counters_done_workfn(struct work_struct *work)
+{
+	cpufreq_unregister_notifier(&init_fie_counters_notifier,
+				    CPUFREQ_POLICY_NOTIFIER);
+
+	/*
+	 * Destroy policy_amu_fie_init_wq first to ensure all policy
+	 * work is finished, which includes flushing of the per-CPU
+	 * work, before cpu_amu_fie_init_wq is destroyed.
+	 */
+	destroy_workqueue(policy_amu_fie_init_wq);
+	destroy_workqueue(cpu_amu_fie_init_wq);
+
+	free_percpu(works);
+	free_cpumask_var(cpus_to_visit);
+}
+
+static int __init register_fie_counters_cpufreq_notifier(void)
+{
+	int ret = -ENOMEM;
+
+	if (!alloc_cpumask_var(&cpus_to_visit, GFP_KERNEL))
+		goto out;
+
+	cpumask_copy(cpus_to_visit, cpu_possible_mask);
+
+	cpu_amu_fie_init_wq = create_workqueue("cpu_amu_fie_init_wq");
+	if (!cpu_amu_fie_init_wq)
+		goto free_cpumask;
+
+	policy_amu_fie_init_wq = create_workqueue("policy_amu_fie_init_wq");
+	if (!cpu_amu_fie_init_wq)
+		goto free_cpu_wq;
+
+	works = alloc_percpu(struct cpu_amu_work);
+	if (!works)
+		goto free_policy_wq;
+
+	ret = cpufreq_register_notifier(&init_fie_counters_notifier,
+					CPUFREQ_POLICY_NOTIFIER);
+	if (ret)
+		goto free_works;
+
+	return 0;
+
+free_works:
+	free_percpu(works);
+free_policy_wq:
+	destroy_workqueue(policy_amu_fie_init_wq);
+free_cpu_wq:
+	destroy_workqueue(cpu_amu_fie_init_wq);
+free_cpumask:
+	free_cpumask_var(cpus_to_visit);
+out:
+	return ret;
+}
+core_initcall(register_fie_counters_cpufreq_notifier);
+
+void topology_scale_freq_tick(void)
+{
+	u64 prev_core_cnt, prev_const_cnt;
+	u64 core_cnt, const_cnt, scale;
+
+	if (!this_cpu_read(amu_scale_freq))
+		return;
+
+	const_cnt = read_sysreg_s(SYS_AMEVCNTR0_CONST_EL0);
+	core_cnt = read_sysreg_s(SYS_AMEVCNTR0_CORE_EL0);
+	prev_const_cnt = this_cpu_read(arch_const_cycles_prev);
+	prev_core_cnt = this_cpu_read(arch_core_cycles_prev);
+
+	if (unlikely(core_cnt <= prev_core_cnt ||
+		     const_cnt <= prev_const_cnt))
+		goto store_and_exit;
+
+	scale = core_cnt - prev_core_cnt;
+	scale *= this_cpu_read(arch_max_freq_scale);
+	scale = div64_u64(scale >> SCHED_CAPACITY_SHIFT,
+			  const_cnt - prev_const_cnt);
+
+	scale = min_t(unsigned long, scale, SCHED_CAPACITY_SCALE);
+	this_cpu_write(freq_scale, (unsigned long)scale);
+
+store_and_exit:
+	this_cpu_write(arch_core_cycles_prev, core_cnt);
+	this_cpu_write(arch_const_cycles_prev, const_cnt);
+}
+
+#endif
diff --git a/drivers/base/arch_topology.c b/drivers/base/arch_topology.c
index 1eb81f113786..3ae6091d845e 100644
--- a/drivers/base/arch_topology.c
+++ b/drivers/base/arch_topology.c
@@ -23,12 +23,28 @@
 
 DEFINE_PER_CPU(unsigned long, freq_scale) = SCHED_CAPACITY_SCALE;
 
+#if defined(CONFIG_ARM64_AMU_EXTN) && defined(CONFIG_CPU_FREQ)
+DEFINE_PER_CPU_READ_MOSTLY(u8, amu_scale_freq);
+#endif
+
 void arch_set_freq_scale(struct cpumask *cpus, unsigned long cur_freq,
 			 unsigned long max_freq)
 {
 	unsigned long scale;
 	int i;
 
+#if defined(CONFIG_ARM64_AMU_EXTN) && defined(CONFIG_CPU_FREQ)
+	/*
+	 * This function will only be called from CPUFREQ drivers.
+	 * If the use of counters for FIE is enabled, establish if a CPU,
+	 * the first one, supports counters and if they are valid. If they
+	 * are, just return as we don't want to update with information
+	 * from CPUFREQ. In this case the scale factor will be updated
+	 * from arch_scale_freq_tick.
+	 */
+	if (per_cpu(amu_scale_freq, cpumask_first(cpus)))
+		return;
+#endif
 	scale = (cur_freq << SCHED_CAPACITY_SHIFT) / max_freq;
 
 	for_each_cpu(i, cpus)
-- 
2.17.1


^ permalink raw reply	[flat|nested] 40+ messages in thread

* Re: [PATCH v2 6/6] arm64: use activity monitors for frequency invariance
  2019-12-18 18:26 ` [PATCH v2 6/6] arm64: use activity monitors for " Ionela Voinescu
@ 2020-01-23 11:49   ` Lukasz Luba
  2020-01-23 17:07     ` Ionela Voinescu
  2020-01-29 17:13   ` Valentin Schneider
  1 sibling, 1 reply; 40+ messages in thread
From: Lukasz Luba @ 2020-01-23 11:49 UTC (permalink / raw)
  To: Ionela Voinescu, catalin.marinas, will, mark.rutland, maz,
	suzuki.poulose, sudeep.holla, dietmar.eggemann
  Cc: peterz, mingo, ggherdovich, vincent.guittot, linux-arm-kernel,
	linux-doc, linux-kernel

Hi Ionela,

Please find my few comments below.

On 12/18/19 6:26 PM, Ionela Voinescu wrote:
> The Frequency Invariance Engine (FIE) is providing a frequency
> scaling correction factor that helps achieve more accurate
> load-tracking.
> 
> So far, for arm and arm64 platforms, this scale factor has been
> obtained based on the ratio between the current frequency and the
> maximum supported frequency recorded by the cpufreq policy. The
> setting of this scale factor is triggered from cpufreq drivers by
> calling arch_set_freq_scale. The current frequency used in computation
> is the frequency requested by a governor, but it may not be the
> frequency that was implemented by the platform.
> 
> This correction factor can also be obtained using a core counter and a
> constant counter to get information on the performance (frequency based
> only) obtained in a period of time. This will more accurately reflect
> the actual current frequency of the CPU, compared with the alternative
> implementation that reflects the request of a performance level from
> the OS.
> 
> Therefore, implement arch_scale_freq_tick to use activity monitors, if
> present, for the computation of the frequency scale factor.
> 
> The use of AMU counters depends on:
>   - CONFIG_ARM64_AMU_EXTN - depents on the AMU extension being present
>   - CONFIG_CPU_FREQ - the current frequency obtained using counter
>     information is divided by the maximum frequency obtained from the
>     cpufreq policy.
> 
> While it is possible to have a combination of CPUs in the system with
> and without support for activity monitors, the use of counters for
> frequency invariance is only enabled for a CPU, if all related CPUs
> (CPUs in the same frequency domain) support and have enabled the core

This looks like an edge case scenario, for which we are designing the
whole machinery with workqueues. AFAIU we cannot run the code in
arch_set_freq_scale() and you want to be check all CPUs upfront.

Maybe you can just wait till all CPUs boot and then set the proper
flags and finish initialization. Something like:
per_cpu(s8, amu_feat) /* form the patch 1/6 */
OR
per_cpu(u8, amu_scale_freq) /* from this patch */
with maybe some values:
0 - not checked yet
1 - checked and present
-1 - checked and not available
-2 - checked but in conflict with others in the freq domain
-3..-k - other odd configurations

could potentially eliminate the need of workqueues.

Then, if we could trigger this from i.e. late_initcall, the CPUs
should be online and you can validate them.

> and constant activity monitor counters. In this way, there is a clear
> separation between the policies for which arch_set_freq_scale
> (cpufreq based FIE) is used, and the policies for which
> arch_scale_freq_tick (counter based FIE) is used to set the frequency
> scale factor. For this purpose, a cpufreq notifier is registered to
> trigger validation work for CPUs and policies at policy creation that
> will enable or disable the use of AMU counters for frequency invariance.
> 
> Signed-off-by: Ionela Voinescu <ionela.voinescu@arm.com>
> Cc: Catalin Marinas <catalin.marinas@arm.com>
> Cc: Will Deacon <will@kernel.org>
> Cc: Sudeep Holla <sudeep.holla@arm.com>
> ---
>   arch/arm64/include/asm/topology.h |   9 ++
>   arch/arm64/kernel/topology.c      | 233 ++++++++++++++++++++++++++++++
>   drivers/base/arch_topology.c      |  16 ++
>   3 files changed, 258 insertions(+)
> 
> diff --git a/arch/arm64/include/asm/topology.h b/arch/arm64/include/asm/topology.h
> index a4d945db95a2..98412dd27565 100644
> --- a/arch/arm64/include/asm/topology.h
> +++ b/arch/arm64/include/asm/topology.h
> @@ -19,6 +19,15 @@ int pcibus_to_node(struct pci_bus *bus);
>   /* Replace task scheduler's default frequency-invariant accounting */
>   #define arch_scale_freq_capacity topology_get_freq_scale
>   
> +#if defined(CONFIG_ARM64_AMU_EXTN) && defined(CONFIG_CPU_FREQ)
> +void topology_scale_freq_tick(void);
> +/*
> + * Replace task scheduler's default counter-based frequency-invariance
> + * scale factor setting.
> + */
> +#define arch_scale_freq_tick topology_scale_freq_tick
> +#endif
> +
>   /* Replace task scheduler's default cpu-invariant accounting */
>   #define arch_scale_cpu_capacity topology_get_cpu_scale
>   
> diff --git a/arch/arm64/kernel/topology.c b/arch/arm64/kernel/topology.c
> index fa9528dfd0ce..61f8264afec9 100644
> --- a/arch/arm64/kernel/topology.c
> +++ b/arch/arm64/kernel/topology.c
> @@ -14,6 +14,7 @@
>   #include <linux/acpi.h>
>   #include <linux/arch_topology.h>
>   #include <linux/cacheinfo.h>
> +#include <linux/cpufreq.h>
>   #include <linux/init.h>
>   #include <linux/percpu.h>
>   
> @@ -120,4 +121,236 @@ int __init parse_acpi_topology(void)
>   }
>   #endif
>   
> +#if defined(CONFIG_ARM64_AMU_EXTN) && defined(CONFIG_CPU_FREQ)
>   
> +#undef pr_fmt
> +#define pr_fmt(fmt) "AMU: " fmt
> +
> +static void init_fie_counters_done_workfn(struct work_struct *work);
> +static DECLARE_WORK(init_fie_counters_done_work,
> +		    init_fie_counters_done_workfn);
> +
> +static struct workqueue_struct *policy_amu_fie_init_wq;
> +static struct workqueue_struct *cpu_amu_fie_init_wq;
> +
> +struct cpu_amu_work {
> +	struct work_struct cpu_work;
> +	struct work_struct policy_work;
> +	unsigned int cpuinfo_max_freq;
> +	struct cpumask policy_cpus;
> +	bool cpu_amu_fie;
> +};
> +static struct cpu_amu_work __percpu *works;
> +static cpumask_var_t cpus_to_visit;
> +
> +static DEFINE_PER_CPU_READ_MOSTLY(unsigned long, arch_max_freq_scale);
> +static DEFINE_PER_CPU(u64, arch_const_cycles_prev);
> +static DEFINE_PER_CPU(u64, arch_core_cycles_prev);
> +DECLARE_PER_CPU(u8, amu_scale_freq);
> +
> +static void cpu_amu_fie_init_workfn(struct work_struct *work)
> +{
> +	u64 core_cnt, const_cnt, ratio;
> +	struct cpu_amu_work *amu_work;
> +	int cpu = smp_processor_id();
> +
> +	if (!cpu_has_amu_feat()) {
> +		pr_debug("CPU%d: counters are not supported.\n", cpu);
> +		return;
> +	}
> +
> +	core_cnt = read_sysreg_s(SYS_AMEVCNTR0_CORE_EL0);
> +	const_cnt = read_sysreg_s(SYS_AMEVCNTR0_CONST_EL0);
> +
> +	if (unlikely(!core_cnt || !const_cnt)) {
> +		pr_err("CPU%d: cycle counters are not enabled.\n", cpu);
> +		return;
> +	}
> +
> +	amu_work = container_of(work, struct cpu_amu_work, cpu_work);
> +	if (unlikely(!(amu_work->cpuinfo_max_freq))) {
> +		pr_err("CPU%d: invalid maximum frequency.\n", cpu);
> +		return;
> +	}
> +
> +	/*
> +	 * Pre-compute the fixed ratio between the frequency of the
> +	 * constant counter and the maximum frequency of the CPU (hz).
> +	 */
> +	ratio = (u64)arch_timer_get_rate() << (2 * SCHED_CAPACITY_SHIFT);
> +	ratio = div64_u64(ratio, amu_work->cpuinfo_max_freq * 1000);
> +	this_cpu_write(arch_max_freq_scale, (unsigned long)ratio);
> +
> +	this_cpu_write(arch_core_cycles_prev, core_cnt);
> +	this_cpu_write(arch_const_cycles_prev, const_cnt);
> +	amu_work->cpu_amu_fie = true;
> +}
> +
> +static void policy_amu_fie_init_workfn(struct work_struct *work)
> +{
> +	struct cpu_amu_work *amu_work;
> +	u8 enable;
> +	int cpu;
> +
> +	amu_work = container_of(work, struct cpu_amu_work, policy_work);
> +
> +	flush_workqueue(cpu_amu_fie_init_wq);
> +
> +	for_each_cpu(cpu, &amu_work->policy_cpus)
> +		if (!(per_cpu_ptr(works, cpu)->cpu_amu_fie))
> +			break;
> +
> +	enable = (cpu >= nr_cpu_ids) ? 1 : 0;
> +
> +	for_each_cpu(cpu, &amu_work->policy_cpus)
> +		per_cpu(amu_scale_freq, cpu) = enable;
> +
> +	pr_info("CPUs[%*pbl]: counters %s be used for FIE.",
> +		cpumask_pr_args(&amu_work->policy_cpus),
> +		enable ? "will" : "WON'T");
> +}
> +
> +static int init_fie_counters_callback(struct notifier_block *nb,
> +				      unsigned long val,
> +				      void *data)
> +{
> +	struct cpufreq_policy *policy = data;
> +	struct cpu_amu_work *work;
> +	int cpu;
> +
> +	if (val != CPUFREQ_CREATE_POLICY)
> +		return 0;
> +
> +	/* Return if not all related CPUs are online */
> +	if (!cpumask_equal(policy->cpus, policy->related_cpus)) {
> +		pr_info("CPUs[%*pbl]: counters WON'T be used for FIE.",
> +			cpumask_pr_args(policy->related_cpus));
> +		return 0;
> +	}
> +
> +	/*
> +	 * Queue functions on all online CPUs from policy to:
> +	 *  - Check support and enablement for AMU counters
> +	 *  - Store system freq to max freq ratio per cpu
> +	 *  - Flag CPU as valid for use of counters for FIE
> +	 */
> +	for_each_cpu(cpu, policy->cpus) {
> +		work = per_cpu_ptr(works, cpu);
> +		work->cpuinfo_max_freq = policy->cpuinfo.max_freq;
> +		work->cpu_amu_fie = false;
> +		INIT_WORK(&work->cpu_work, cpu_amu_fie_init_workfn);
> +		queue_work_on(cpu, cpu_amu_fie_init_wq, &work->cpu_work);
> +	}
> +
> +	/*
> +	 * Queue function to validate support at policy level:
> +	 *  - Flush all work on online policy CPUs
> +	 *  - Verify that all online policy CPUs are flagged as
> +	 *    valid for use of counters for FIE
> +	 *  - Enable or disable use of counters for FIE on CPUs
> +	 */
> +	work = per_cpu_ptr(works, cpumask_first(policy->cpus));
> +	cpumask_copy(&work->policy_cpus, policy->cpus);
> +	INIT_WORK(&work->policy_work, policy_amu_fie_init_workfn);
> +	queue_work(policy_amu_fie_init_wq, &work->policy_work);
> +
> +	cpumask_andnot(cpus_to_visit, cpus_to_visit, policy->cpus);
> +	if (cpumask_empty(cpus_to_visit))
> +		schedule_work(&init_fie_counters_done_work);
> +
> +	return 0;
> +}
> +
> +static struct notifier_block init_fie_counters_notifier = {
> +	.notifier_call = init_fie_counters_callback,
> +};
> +
> +static void init_fie_counters_done_workfn(struct work_struct *work)
> +{
> +	cpufreq_unregister_notifier(&init_fie_counters_notifier,
> +				    CPUFREQ_POLICY_NOTIFIER);
> +
> +	/*
> +	 * Destroy policy_amu_fie_init_wq first to ensure all policy
> +	 * work is finished, which includes flushing of the per-CPU
> +	 * work, before cpu_amu_fie_init_wq is destroyed.
> +	 */
> +	destroy_workqueue(policy_amu_fie_init_wq);
> +	destroy_workqueue(cpu_amu_fie_init_wq);
> +
> +	free_percpu(works);
> +	free_cpumask_var(cpus_to_visit);
> +}
> +
> +static int __init register_fie_counters_cpufreq_notifier(void)
> +{
> +	int ret = -ENOMEM;
> +
> +	if (!alloc_cpumask_var(&cpus_to_visit, GFP_KERNEL))
> +		goto out;
> +
> +	cpumask_copy(cpus_to_visit, cpu_possible_mask);
> +
> +	cpu_amu_fie_init_wq = create_workqueue("cpu_amu_fie_init_wq");
> +	if (!cpu_amu_fie_init_wq)
> +		goto free_cpumask;
> +
> +	policy_amu_fie_init_wq = create_workqueue("policy_amu_fie_init_wq");
> +	if (!cpu_amu_fie_init_wq)
> +		goto free_cpu_wq;
> +
> +	works = alloc_percpu(struct cpu_amu_work);
> +	if (!works)
> +		goto free_policy_wq;
> +
> +	ret = cpufreq_register_notifier(&init_fie_counters_notifier,
> +					CPUFREQ_POLICY_NOTIFIER);
> +	if (ret)
> +		goto free_works;
> +
> +	return 0;
> +
> +free_works:
> +	free_percpu(works);
> +free_policy_wq:
> +	destroy_workqueue(policy_amu_fie_init_wq);
> +free_cpu_wq:
> +	destroy_workqueue(cpu_amu_fie_init_wq);
> +free_cpumask:
> +	free_cpumask_var(cpus_to_visit);
> +out:
> +	return ret;
> +}
> +core_initcall(register_fie_counters_cpufreq_notifier);

If we move it to a bit later stage maybe it could solve the
issue with not-all-CPUs-online? Is it needed at this stage?
The device_initcall or late_initcall is not an option for it?


> +
> +void topology_scale_freq_tick(void)
> +{
> +	u64 prev_core_cnt, prev_const_cnt;
> +	u64 core_cnt, const_cnt, scale;
> +
> +	if (!this_cpu_read(amu_scale_freq))
> +		return;
> +
> +	const_cnt = read_sysreg_s(SYS_AMEVCNTR0_CONST_EL0);
> +	core_cnt = read_sysreg_s(SYS_AMEVCNTR0_CORE_EL0);
> +	prev_const_cnt = this_cpu_read(arch_const_cycles_prev);
> +	prev_core_cnt = this_cpu_read(arch_core_cycles_prev);
> +
> +	if (unlikely(core_cnt <= prev_core_cnt ||
> +		     const_cnt <= prev_const_cnt))
> +		goto store_and_exit;
> +
> +	scale = core_cnt - prev_core_cnt;
> +	scale *= this_cpu_read(arch_max_freq_scale);
> +	scale = div64_u64(scale >> SCHED_CAPACITY_SHIFT,
> +			  const_cnt - prev_const_cnt);
> +
> +	scale = min_t(unsigned long, scale, SCHED_CAPACITY_SCALE);
> +	this_cpu_write(freq_scale, (unsigned long)scale);
> +
> +store_and_exit:
> +	this_cpu_write(arch_core_cycles_prev, core_cnt);
> +	this_cpu_write(arch_const_cycles_prev, const_cnt);
> +}
> +
> +#endif
> diff --git a/drivers/base/arch_topology.c b/drivers/base/arch_topology.c
> index 1eb81f113786..3ae6091d845e 100644
> --- a/drivers/base/arch_topology.c
> +++ b/drivers/base/arch_topology.c
> @@ -23,12 +23,28 @@
>   
>   DEFINE_PER_CPU(unsigned long, freq_scale) = SCHED_CAPACITY_SCALE;
>   
> +#if defined(CONFIG_ARM64_AMU_EXTN) && defined(CONFIG_CPU_FREQ)
> +DEFINE_PER_CPU_READ_MOSTLY(u8, amu_scale_freq);
> +#endif
> +
>   void arch_set_freq_scale(struct cpumask *cpus, unsigned long cur_freq,
>   			 unsigned long max_freq)
>   {
>   	unsigned long scale;
>   	int i;
>   
> +#if defined(CONFIG_ARM64_AMU_EXTN) && defined(CONFIG_CPU_FREQ)

This kind of #ifdef is probably not the best option inside drivers/base/
The function is called from cpufreq drivers, could we react earlier
and keep this function untouched?


> +	/*
> +	 * This function will only be called from CPUFREQ drivers.
> +	 * If the use of counters for FIE is enabled, establish if a CPU,
> +	 * the first one, supports counters and if they are valid. If they
> +	 * are, just return as we don't want to update with information
> +	 * from CPUFREQ. In this case the scale factor will be updated
> +	 * from arch_scale_freq_tick.
> +	 */
> +	if (per_cpu(amu_scale_freq, cpumask_first(cpus)))
> +		return;
> +#endif
>   	scale = (cur_freq << SCHED_CAPACITY_SHIFT) / max_freq;
>   
>   	for_each_cpu(i, cpus)
> 


Regards,
Lukasz

^ permalink raw reply	[flat|nested] 40+ messages in thread

* Re: [PATCH v2 1/6] arm64: add support for the AMU extension v1
  2019-12-18 18:26 ` [PATCH v2 1/6] arm64: add support for the AMU extension v1 Ionela Voinescu
@ 2020-01-23 17:04   ` Valentin Schneider
  2020-01-23 18:32     ` Ionela Voinescu
  2020-01-28 16:34   ` Suzuki Kuruppassery Poulose
  1 sibling, 1 reply; 40+ messages in thread
From: Valentin Schneider @ 2020-01-23 17:04 UTC (permalink / raw)
  To: Ionela Voinescu, catalin.marinas, will, mark.rutland, maz,
	suzuki.poulose, sudeep.holla, dietmar.eggemann
  Cc: peterz, mingo, ggherdovich, vincent.guittot, linux-arm-kernel,
	linux-doc, linux-kernel

Hi Ionela,

On 18/12/2019 18:26, Ionela Voinescu wrote:
> --- a/arch/arm64/include/asm/sysreg.h
> +++ b/arch/arm64/include/asm/sysreg.h
> @@ -382,6 +382,42 @@
>  #define SYS_TPIDR_EL0			sys_reg(3, 3, 13, 0, 2)
>  #define SYS_TPIDRRO_EL0			sys_reg(3, 3, 13, 0, 3)
>  
> +/* Definitions for system register interface to AMU for ARMv8.4 onwards */
> +#define SYS_AM_EL0(crm, op2)		sys_reg(3, 3, 13, crm, op2)
> +#define SYS_AMCR_EL0			SYS_AM_EL0(2, 0)
> +#define SYS_AMCFGR_EL0			SYS_AM_EL0(2, 1)
> +#define SYS_AMCGCR_EL0			SYS_AM_EL0(2, 2)
> +#define SYS_AMUSERENR_EL0		SYS_AM_EL0(2, 3)
> +#define SYS_AMCNTENCLR0_EL0		SYS_AM_EL0(2, 4)
> +#define SYS_AMCNTENSET0_EL0		SYS_AM_EL0(2, 5)
> +#define SYS_AMCNTENCLR1_EL0		SYS_AM_EL0(3, 0)
> +#define SYS_AMCNTENSET1_EL0		SYS_AM_EL0(3, 1)
> +
> +/*
> + * Group 0 of activity monitors (architected):
> + *                op0 CRn   op1   op2     CRm
> + * Counter:       11  1101  011   n<2:0>  010:n<3>

Nit: any reason for picking a different order than the encoding one? e.g.
                     op0  op1  CRn   CRm       op2
                     11   011  1101  010:<n3>  n<2:0>

> + * Type:          11  1101  011   n<2:0>  011:n<3>
> + * n: 0-3

My Arm ARM (DDI 0487E.a) says n can be in the [0, 15] range, despite there
being only 4 architected counters ATM. Shouldn't matter too much now, but
when more architected counters are added we'll have to assert 'n' against
something (some revision #?).

> + *
> + * Group 1 of activity monitors (auxiliary):
> + *                op0 CRn   op1   op2     CRm
> + * Counter:       11  1101  011   n<2:0>  110:n<3>
> + * Type:          11  1101  011   n<2:0>  111:n<3>
> + * n: 0-15
> + */
> +
> +#define SYS_AMEVCNTR0_EL0(n)            SYS_AM_EL0(4 + ((n) >> 3), (n) & 0x7)
                                                                          /^^^^
If you want to be fancy, you could use GENMASK(2, 0) --------------------/

> +#define SYS_AMEVTYPE0_EL0(n)            SYS_AM_EL0(6 + ((n) >> 3), (n) & 0x7)
> +#define SYS_AMEVCNTR1_EL0(n)            SYS_AM_EL0(12 + ((n) >> 3), (n) & 0x7)
> +#define SYS_AMEVTYPE1_EL0(n)            SYS_AM_EL0(14 + ((n) >> 3), (n) & 0x7)
> +
> +/* V1: Fixed (architecturally defined) activity monitors */
> +#define SYS_AMEVCNTR0_CORE_EL0          SYS_AMEVCNTR0_EL0(0)
> +#define SYS_AMEVCNTR0_CONST_EL0         SYS_AMEVCNTR0_EL0(1)
> +#define SYS_AMEVCNTR0_INST_RET_EL0      SYS_AMEVCNTR0_EL0(2)
> +#define SYS_AMEVCNTR0_MEM_STALL         SYS_AMEVCNTR0_EL0(3)
> +
>  #define SYS_CNTFRQ_EL0			sys_reg(3, 3, 14, 0, 0)
>  
>  #define SYS_CNTP_TVAL_EL0		sys_reg(3, 3, 14, 2, 0)

> @@ -1150,6 +1152,59 @@ static bool has_hw_dbm(const struct arm64_cpu_capabilities *cap,
>  
>  #endif
>  
> +#ifdef CONFIG_ARM64_AMU_EXTN
> +
> +/*
> + * This per cpu variable only signals that the CPU implementation supports
> + * the Activity Monitors Unit (AMU) but does not provide information
> + * regarding all the events that it supports.
> + * When this amu_feat per CPU variable is true, the user of this feature
> + * can only rely on the presence of the 4 fixed counters. But this does
> + * not guarantee that the counters are enabled or access to these counters
> + * is provided by code executed at higher exception levels.
> + *
> + * Also, to ensure the safe use of this per_cpu variable, the following
> + * accessor is defined to allow a read of amu_feat for the current cpu only
> + * from the current cpu.
> + *  - cpu_has_amu_feat()
> + */
> +static DEFINE_PER_CPU_READ_MOSTLY(u8, amu_feat);
> +

Why not bool?

> +inline bool cpu_has_amu_feat(void)
> +{
> +	return !!this_cpu_read(amu_feat);
> +}
> +
> +static void cpu_amu_enable(struct arm64_cpu_capabilities const *cap)
> +{
> +	if (has_cpuid_feature(cap, SCOPE_LOCAL_CPU)) {
> +		pr_info("detected CPU%d: Activity Monitors Unit (AMU)\n",
> +			smp_processor_id());
> +		this_cpu_write(amu_feat, 1);
> +	}
> +}

^ permalink raw reply	[flat|nested] 40+ messages in thread

* Re: [PATCH v2 2/6] arm64: trap to EL1 accesses to AMU counters from EL0
  2019-12-18 18:26 ` [PATCH v2 2/6] arm64: trap to EL1 accesses to AMU counters from EL0 Ionela Voinescu
@ 2020-01-23 17:04   ` Valentin Schneider
  2020-01-23 17:34     ` Ionela Voinescu
  0 siblings, 1 reply; 40+ messages in thread
From: Valentin Schneider @ 2020-01-23 17:04 UTC (permalink / raw)
  To: Ionela Voinescu, catalin.marinas, will, mark.rutland, maz,
	suzuki.poulose, sudeep.holla, dietmar.eggemann
  Cc: peterz, mingo, ggherdovich, vincent.guittot, linux-arm-kernel,
	linux-doc, linux-kernel, Steve Capper

On 18/12/2019 18:26, Ionela Voinescu wrote:
> +/*
> + * reset_amuserenr_el0 - reset AMUSERENR_EL0 if AMUv1 present
> + */
> +	.macro	reset_amuserenr_el0, tmpreg
> +	mrs	\tmpreg, id_aa64pfr0_el1	// Check ID_AA64PFR0_EL1
> +	ubfx	\tmpreg, \tmpreg, #ID_AA64PFR0_AMU_SHIFT, #4
> +	cbz	\tmpreg, 9000f			// Skip if no AMU present
> +	msr_s	SYS_AMUSERENR_EL0, xzr		// Disable AMU access from EL0
> +9000:

AIUI you can steer away from the obscure numbering scheme and define the
label using the macro counter:

	cbz \tmpreg, .Lskip_\@
	[...]
.Lskip_\@:
	.endm


> +	.endm

^ permalink raw reply	[flat|nested] 40+ messages in thread

* Re: [PATCH v2 6/6] arm64: use activity monitors for frequency invariance
  2020-01-23 11:49   ` Lukasz Luba
@ 2020-01-23 17:07     ` Ionela Voinescu
  2020-01-24  1:19       ` Lukasz Luba
  0 siblings, 1 reply; 40+ messages in thread
From: Ionela Voinescu @ 2020-01-23 17:07 UTC (permalink / raw)
  To: Lukasz Luba
  Cc: catalin.marinas, will, mark.rutland, maz, suzuki.poulose,
	sudeep.holla, dietmar.eggemann, peterz, mingo, ggherdovich,
	vincent.guittot, linux-arm-kernel, linux-doc, linux-kernel

Hi Lukasz,

Thank you for taking a look over the patches.

On Thursday 23 Jan 2020 at 11:49:29 (+0000), Lukasz Luba wrote:
> Hi Ionela,
> 
> Please find my few comments below.
> 
> On 12/18/19 6:26 PM, Ionela Voinescu wrote:
> > The Frequency Invariance Engine (FIE) is providing a frequency
> > scaling correction factor that helps achieve more accurate
> > load-tracking.
> > 
> > So far, for arm and arm64 platforms, this scale factor has been
> > obtained based on the ratio between the current frequency and the
> > maximum supported frequency recorded by the cpufreq policy. The
> > setting of this scale factor is triggered from cpufreq drivers by
> > calling arch_set_freq_scale. The current frequency used in computation
> > is the frequency requested by a governor, but it may not be the
> > frequency that was implemented by the platform.
> > 
> > This correction factor can also be obtained using a core counter and a
> > constant counter to get information on the performance (frequency based
> > only) obtained in a period of time. This will more accurately reflect
> > the actual current frequency of the CPU, compared with the alternative
> > implementation that reflects the request of a performance level from
> > the OS.
> > 
> > Therefore, implement arch_scale_freq_tick to use activity monitors, if
> > present, for the computation of the frequency scale factor.
> > 
> > The use of AMU counters depends on:
> >   - CONFIG_ARM64_AMU_EXTN - depents on the AMU extension being present
> >   - CONFIG_CPU_FREQ - the current frequency obtained using counter
> >     information is divided by the maximum frequency obtained from the
> >     cpufreq policy.
> > 
> > While it is possible to have a combination of CPUs in the system with
> > and without support for activity monitors, the use of counters for
> > frequency invariance is only enabled for a CPU, if all related CPUs
> > (CPUs in the same frequency domain) support and have enabled the core
> 
> This looks like an edge case scenario, for which we are designing the
> whole machinery with workqueues. AFAIU we cannot run the code in
> arch_set_freq_scale() and you want to be check all CPUs upfront.
> 

Unfortunately, I don't believe it to be be an edge-case. Given that this
is an optional feature, I do believe that people might skip on
implementing it on some CPUs(LITTLEs) while keeping it for CPUs(bigs)
where power and thermal mitigation is more probable to happen in firmware.
This is the main reason to be conservative in the validation of CPUs and
cpufreq policies.

In regards to arch_set_freq_scale, I want to be able to tell, when that
function is called, if I should return a scale factor based on cpufreq
for the current policy. If activity monitors are useable for the CPUs in
the full policy, than I'm bailing out and leave the AMU FIE machinery
set the scale factor. Unfortunately this works at policy granularity.

This could  be done in a nicer way by setting the scale factor per cpu
and not for all CPUs in a policy in this arch_set_freq_scale function.
But this would require some rewriting for the full frequency invariance
support in drivers which we've talked about for a while but it was not
the purpose of this patch set. But it would eliminate the policy
verification I do with the second workqueue.

> Maybe you can just wait till all CPUs boot and then set the proper
> flags and finish initialization. Something like:
> per_cpu(s8, amu_feat) /* form the patch 1/6 */
> OR
> per_cpu(u8, amu_scale_freq) /* from this patch */
> with maybe some values:
> 0 - not checked yet
> 1 - checked and present
> -1 - checked and not available
> -2 - checked but in conflict with others in the freq domain
> -3..-k - other odd configurations
> 
> could potentially eliminate the need of workqueues.
> 
> Then, if we could trigger this from i.e. late_initcall, the CPUs
> should be online and you can validate them.
> 

I did initially give such a state machine a try but it proved to be
quite messy. A big reason for this is that the activity monitors unit
has multiple counters that can be used for different purposes.

The amu_feat per_cpu variable only flags that you have the AMU present
for potential users (in this case FIE) to validate the counters they
need for their respective usecase. For this reason I don't want to
overload the meaning of amu_feat. For the same reason I'm not doing the
validation of the counters in a generic way, but I'm tying it to the
usecase for particular counters. For example, it would not matter if
the instructions retired counter is not enabled from firmware for the
usecase of FIE. For frequency invariance we only need the core and
constant cycle counters and I'm making it the job of the user (arm64
topology code) to do the checking.

Secondly, for amu_scale_freq I could have added such a state machine,
but I did not think it was useful. The only thing it would change is
that I would not have to use the cpu_amu_fie variable in the data
structure that gets passed to the work functions. The only way I would
eliminate the second workqueue was if I did not do a check of all CPUs
in a policy, as described above, and rewrite frequency invariance to
work at CPU granularity and not policy granularity. This would eliminate
the dependency on cpufreq policy all-together, so it would be worth
doing if only for this reason alone :).

But even in that case, it's probably not needed to have more than two
states for amu_freq_scale.

What do you think?

Thank you,
Ionela.

^ permalink raw reply	[flat|nested] 40+ messages in thread

* Re: [PATCH v2 2/6] arm64: trap to EL1 accesses to AMU counters from EL0
  2020-01-23 17:04   ` Valentin Schneider
@ 2020-01-23 17:34     ` Ionela Voinescu
  0 siblings, 0 replies; 40+ messages in thread
From: Ionela Voinescu @ 2020-01-23 17:34 UTC (permalink / raw)
  To: Valentin Schneider
  Cc: catalin.marinas, will, mark.rutland, maz, suzuki.poulose,
	sudeep.holla, dietmar.eggemann, peterz, mingo, ggherdovich,
	vincent.guittot, linux-arm-kernel, linux-doc, linux-kernel,
	Steve Capper

On Thursday 23 Jan 2020 at 17:04:32 (+0000), Valentin Schneider wrote:
> On 18/12/2019 18:26, Ionela Voinescu wrote:
> > +/*
> > + * reset_amuserenr_el0 - reset AMUSERENR_EL0 if AMUv1 present
> > + */
> > +	.macro	reset_amuserenr_el0, tmpreg
> > +	mrs	\tmpreg, id_aa64pfr0_el1	// Check ID_AA64PFR0_EL1
> > +	ubfx	\tmpreg, \tmpreg, #ID_AA64PFR0_AMU_SHIFT, #4
> > +	cbz	\tmpreg, 9000f			// Skip if no AMU present
> > +	msr_s	SYS_AMUSERENR_EL0, xzr		// Disable AMU access from EL0
> > +9000:
> 
> AIUI you can steer away from the obscure numbering scheme and define the
> label using the macro counter:
> 
> 	cbz \tmpreg, .Lskip_\@
> 	[...]
> .Lskip_\@:
> 	.endm
> 

Cool, good to know! Although calling it "obscure numbering scheme" does
make it more appealing to use.

Thanks, I'll change it in the next version :).

Ionela.

> 
> > +	.endm

^ permalink raw reply	[flat|nested] 40+ messages in thread

* Re: [PATCH v2 1/6] arm64: add support for the AMU extension v1
  2020-01-23 17:04   ` Valentin Schneider
@ 2020-01-23 18:32     ` Ionela Voinescu
  2020-01-24 12:00       ` Valentin Schneider
  0 siblings, 1 reply; 40+ messages in thread
From: Ionela Voinescu @ 2020-01-23 18:32 UTC (permalink / raw)
  To: Valentin Schneider
  Cc: catalin.marinas, will, mark.rutland, maz, suzuki.poulose,
	sudeep.holla, dietmar.eggemann, peterz, mingo, ggherdovich,
	vincent.guittot, linux-arm-kernel, linux-doc, linux-kernel

On Thursday 23 Jan 2020 at 17:04:07 (+0000), Valentin Schneider wrote:
> Hi Ionela,
> 
> On 18/12/2019 18:26, Ionela Voinescu wrote:
> > --- a/arch/arm64/include/asm/sysreg.h
> > +++ b/arch/arm64/include/asm/sysreg.h
> > @@ -382,6 +382,42 @@
> >  #define SYS_TPIDR_EL0			sys_reg(3, 3, 13, 0, 2)
> >  #define SYS_TPIDRRO_EL0			sys_reg(3, 3, 13, 0, 3)
> >  
> > +/* Definitions for system register interface to AMU for ARMv8.4 onwards */
> > +#define SYS_AM_EL0(crm, op2)		sys_reg(3, 3, 13, crm, op2)
> > +#define SYS_AMCR_EL0			SYS_AM_EL0(2, 0)
> > +#define SYS_AMCFGR_EL0			SYS_AM_EL0(2, 1)
> > +#define SYS_AMCGCR_EL0			SYS_AM_EL0(2, 2)
> > +#define SYS_AMUSERENR_EL0		SYS_AM_EL0(2, 3)
> > +#define SYS_AMCNTENCLR0_EL0		SYS_AM_EL0(2, 4)
> > +#define SYS_AMCNTENSET0_EL0		SYS_AM_EL0(2, 5)
> > +#define SYS_AMCNTENCLR1_EL0		SYS_AM_EL0(3, 0)
> > +#define SYS_AMCNTENSET1_EL0		SYS_AM_EL0(3, 1)
> > +
> > +/*
> > + * Group 0 of activity monitors (architected):
> > + *                op0 CRn   op1   op2     CRm
> > + * Counter:       11  1101  011   n<2:0>  010:n<3>
> 
> Nit: any reason for picking a different order than the encoding one? e.g.
>                      op0  op1  CRn   CRm       op2
>                      11   011  1101  010:<n3>  n<2:0>


I followed the format in the documentation at the time: DDI 0487D.a.
But you are correct as in I should have used the encoding format.


> 
> > + * Type:          11  1101  011   n<2:0>  011:n<3>
> > + * n: 0-3
> 
> My Arm ARM (DDI 0487E.a) says n can be in the [0, 15] range, despite there
> being only 4 architected counters ATM. Shouldn't matter too much now, but
> when more architected counters are added we'll have to assert 'n' against
> something (some revision #?).
> 

You are correct, that interval for the values of n should change. I
probably mapped my brain to the current architected counters. 

But the way I've defined SYS_AMEVCNTR0_EL0 will allow to access the full
range of 16 counters, for future versions of the AMU. I am hoping that
we won't have to directly use information in the feature register in
regards to the version of AMU. These first 4 architected counters should
be present in all future versions, and later we can use information in
AMCGCR_EL0 to get the number of architected counters (n) and
AMEVTYPER0<n>_EL0 to find out the type. The same logic would apply to
the auxiliary counters.

> > + *
> > + * Group 1 of activity monitors (auxiliary):
> > + *                op0 CRn   op1   op2     CRm
> > + * Counter:       11  1101  011   n<2:0>  110:n<3>
> > + * Type:          11  1101  011   n<2:0>  111:n<3>
> > + * n: 0-15
> > + */
> > +
> > +#define SYS_AMEVCNTR0_EL0(n)            SYS_AM_EL0(4 + ((n) >> 3), (n) & 0x7)
>                                                                           /^^^^
> If you want to be fancy, you could use GENMASK(2, 0) --------------------/
> 

I'll be fancy!

> > +#define SYS_AMEVTYPE0_EL0(n)            SYS_AM_EL0(6 + ((n) >> 3), (n) & 0x7)
> > +#define SYS_AMEVCNTR1_EL0(n)            SYS_AM_EL0(12 + ((n) >> 3), (n) & 0x7)
> > +#define SYS_AMEVTYPE1_EL0(n)            SYS_AM_EL0(14 + ((n) >> 3), (n) & 0x7)
> > +
> > +/* V1: Fixed (architecturally defined) activity monitors */
> > +#define SYS_AMEVCNTR0_CORE_EL0          SYS_AMEVCNTR0_EL0(0)
> > +#define SYS_AMEVCNTR0_CONST_EL0         SYS_AMEVCNTR0_EL0(1)
> > +#define SYS_AMEVCNTR0_INST_RET_EL0      SYS_AMEVCNTR0_EL0(2)
> > +#define SYS_AMEVCNTR0_MEM_STALL         SYS_AMEVCNTR0_EL0(3)
> > +
> >  #define SYS_CNTFRQ_EL0			sys_reg(3, 3, 14, 0, 0)
> >  
> >  #define SYS_CNTP_TVAL_EL0		sys_reg(3, 3, 14, 2, 0)
> 
> > @@ -1150,6 +1152,59 @@ static bool has_hw_dbm(const struct arm64_cpu_capabilities *cap,
> >  
> >  #endif
> >  
> > +#ifdef CONFIG_ARM64_AMU_EXTN
> > +
> > +/*
> > + * This per cpu variable only signals that the CPU implementation supports
> > + * the Activity Monitors Unit (AMU) but does not provide information
> > + * regarding all the events that it supports.
> > + * When this amu_feat per CPU variable is true, the user of this feature
> > + * can only rely on the presence of the 4 fixed counters. But this does
> > + * not guarantee that the counters are enabled or access to these counters
> > + * is provided by code executed at higher exception levels.
> > + *
> > + * Also, to ensure the safe use of this per_cpu variable, the following
> > + * accessor is defined to allow a read of amu_feat for the current cpu only
> > + * from the current cpu.
> > + *  - cpu_has_amu_feat()
> > + */
> > +static DEFINE_PER_CPU_READ_MOSTLY(u8, amu_feat);
> > +
> 
> Why not bool?
> 

I've changed it from bool after a sparse warning about expression using
sizeof(bool) and found this is due to sizeof(bool) being compiler
dependent. It does not change anything but I thought it might be a good
idea to define it as 8-bit unsigned and rely on fixed size.

Thank you for the review,
Ionela.

> > +inline bool cpu_has_amu_feat(void)
> > +{
> > +	return !!this_cpu_read(amu_feat);
> > +}
> > +
> > +static void cpu_amu_enable(struct arm64_cpu_capabilities const *cap)
> > +{
> > +	if (has_cpuid_feature(cap, SCOPE_LOCAL_CPU)) {
> > +		pr_info("detected CPU%d: Activity Monitors Unit (AMU)\n",
> > +			smp_processor_id());
> > +		this_cpu_write(amu_feat, 1);
> > +	}
> > +}

^ permalink raw reply	[flat|nested] 40+ messages in thread

* Re: [PATCH v2 6/6] arm64: use activity monitors for frequency invariance
  2020-01-23 17:07     ` Ionela Voinescu
@ 2020-01-24  1:19       ` Lukasz Luba
  2020-01-24 13:12         ` Ionela Voinescu
  0 siblings, 1 reply; 40+ messages in thread
From: Lukasz Luba @ 2020-01-24  1:19 UTC (permalink / raw)
  To: Ionela Voinescu
  Cc: catalin.marinas, will, mark.rutland, maz, suzuki.poulose,
	sudeep.holla, dietmar.eggemann, peterz, mingo, ggherdovich,
	vincent.guittot, linux-arm-kernel, linux-doc, linux-kernel



On 1/23/20 5:07 PM, Ionela Voinescu wrote:
> Hi Lukasz,
> 
> Thank you for taking a look over the patches.
> 
> On Thursday 23 Jan 2020 at 11:49:29 (+0000), Lukasz Luba wrote:
>> Hi Ionela,
>>
>> Please find my few comments below.
>>
>> On 12/18/19 6:26 PM, Ionela Voinescu wrote:
>>> The Frequency Invariance Engine (FIE) is providing a frequency
>>> scaling correction factor that helps achieve more accurate
>>> load-tracking.
>>>
>>> So far, for arm and arm64 platforms, this scale factor has been
>>> obtained based on the ratio between the current frequency and the
>>> maximum supported frequency recorded by the cpufreq policy. The
>>> setting of this scale factor is triggered from cpufreq drivers by
>>> calling arch_set_freq_scale. The current frequency used in computation
>>> is the frequency requested by a governor, but it may not be the
>>> frequency that was implemented by the platform.
>>>
>>> This correction factor can also be obtained using a core counter and a
>>> constant counter to get information on the performance (frequency based
>>> only) obtained in a period of time. This will more accurately reflect
>>> the actual current frequency of the CPU, compared with the alternative
>>> implementation that reflects the request of a performance level from
>>> the OS.
>>>
>>> Therefore, implement arch_scale_freq_tick to use activity monitors, if
>>> present, for the computation of the frequency scale factor.
>>>
>>> The use of AMU counters depends on:
>>>    - CONFIG_ARM64_AMU_EXTN - depents on the AMU extension being present
>>>    - CONFIG_CPU_FREQ - the current frequency obtained using counter
>>>      information is divided by the maximum frequency obtained from the
>>>      cpufreq policy.
>>>
>>> While it is possible to have a combination of CPUs in the system with
>>> and without support for activity monitors, the use of counters for
>>> frequency invariance is only enabled for a CPU, if all related CPUs
>>> (CPUs in the same frequency domain) support and have enabled the core
>>
>> This looks like an edge case scenario, for which we are designing the
>> whole machinery with workqueues. AFAIU we cannot run the code in
>> arch_set_freq_scale() and you want to be check all CPUs upfront.
>>
> 
> Unfortunately, I don't believe it to be be an edge-case. Given that this
> is an optional feature, I do believe that people might skip on
> implementing it on some CPUs(LITTLEs) while keeping it for CPUs(bigs)
> where power and thermal mitigation is more probable to happen in firmware.
> This is the main reason to be conservative in the validation of CPUs and
> cpufreq policies.
> 
> In regards to arch_set_freq_scale, I want to be able to tell, when that
> function is called, if I should return a scale factor based on cpufreq
> for the current policy. If activity monitors are useable for the CPUs in
> the full policy, than I'm bailing out and leave the AMU FIE machinery
> set the scale factor. Unfortunately this works at policy granularity.
> 
> This could  be done in a nicer way by setting the scale factor per cpu
> and not for all CPUs in a policy in this arch_set_freq_scale function.
> But this would require some rewriting for the full frequency invariance
> support in drivers which we've talked about for a while but it was not
> the purpose of this patch set. But it would eliminate the policy
> verification I do with the second workqueue.
> 
>> Maybe you can just wait till all CPUs boot and then set the proper
>> flags and finish initialization. Something like:
>> per_cpu(s8, amu_feat) /* form the patch 1/6 */
>> OR
>> per_cpu(u8, amu_scale_freq) /* from this patch */
>> with maybe some values:
>> 0 - not checked yet
>> 1 - checked and present
>> -1 - checked and not available
>> -2 - checked but in conflict with others in the freq domain
>> -3..-k - other odd configurations
>>
>> could potentially eliminate the need of workqueues.
>>
>> Then, if we could trigger this from i.e. late_initcall, the CPUs
>> should be online and you can validate them.
>>
> 
> I did initially give such a state machine a try but it proved to be
> quite messy. A big reason for this is that the activity monitors unit
> has multiple counters that can be used for different purposes.
> 
> The amu_feat per_cpu variable only flags that you have the AMU present
> for potential users (in this case FIE) to validate the counters they
> need for their respective usecase. For this reason I don't want to
> overload the meaning of amu_feat. For the same reason I'm not doing the
> validation of the counters in a generic way, but I'm tying it to the
> usecase for particular counters. For example, it would not matter if
> the instructions retired counter is not enabled from firmware for the
> usecase of FIE. For frequency invariance we only need the core and
> constant cycle counters and I'm making it the job of the user (arm64
> topology code) to do the checking.
> 
> Secondly, for amu_scale_freq I could have added such a state machine,
> but I did not think it was useful. The only thing it would change is
> that I would not have to use the cpu_amu_fie variable in the data
> structure that gets passed to the work functions. The only way I would
> eliminate the second workqueue was if I did not do a check of all CPUs
> in a policy, as described above, and rewrite frequency invariance to
> work at CPU granularity and not policy granularity. This would eliminate
> the dependency on cpufreq policy all-together, so it would be worth
> doing if only for this reason alone :).
> 
> But even in that case, it's probably not needed to have more than two
> states for amu_freq_scale.
> 
> What do you think?

I think currently we are the only users for this AMU and if there will
be another in the future, then we can start thinking about his proposed
changes. Let's cross that bridge when we come to it.

Regarding the code, in the arch/arm64/cpufeature.c you can already
read the cycle registers. All the CPUs are going through that code
during start. If you use this fact in the late_initcall() all CPUs
should be checked and you can just ask for cpufreq policy, calculate the 
max_freq ratio, set the per cpu config value to 'ready' state.

Something like in the code below, it is on top of your patch set.

------------------------>8-------------------------------------


diff --git a/arch/arm64/kernel/cpufeature.c b/arch/arm64/kernel/cpufeature.c
index c639b3e052d7..837ea46d8867 100644
--- a/arch/arm64/kernel/cpufeature.c
+++ b/arch/arm64/kernel/cpufeature.c
@@ -1168,19 +1168,26 @@ static bool has_hw_dbm(const struct 
arm64_cpu_capabilities *cap,
   * from the current cpu.
   *  - cpu_has_amu_feat()
   */
-static DEFINE_PER_CPU_READ_MOSTLY(u8, amu_feat);
-
-inline bool cpu_has_amu_feat(void)
-{
-	return !!this_cpu_read(amu_feat);
-}
+DECLARE_PER_CPU(u64, arch_const_cycles_prev);
+DECLARE_PER_CPU(u64, arch_core_cycles_prev);
+DECLARE_PER_CPU(u8, amu_scale_freq);

  static void cpu_amu_enable(struct arm64_cpu_capabilities const *cap)
  {
+	u64 core_cnt, const_cnt;
+
  	if (has_cpuid_feature(cap, SCOPE_LOCAL_CPU)) {
  		pr_info("detected CPU%d: Activity Monitors Unit (AMU)\n",
  			smp_processor_id());
-		this_cpu_write(amu_feat, 1);
+		core_cnt = read_sysreg_s(SYS_AMEVCNTR0_CORE_EL0);
+		const_cnt = read_sysreg_s(SYS_AMEVCNTR0_CONST_EL0);
+
+		this_cpu_write(arch_core_cycles_prev, core_cnt);
+		this_cpu_write(arch_const_cycles_prev, const_cnt);
+
+		this_cpu_write(amu_scale_freq, 1);
+	} else {
+		this_cpu_write(amu_scale_freq, 2);
  	}
  }

diff --git a/arch/arm64/kernel/topology.c b/arch/arm64/kernel/topology.c
index 61f8264afec9..95b34085ae64 100644
--- a/arch/arm64/kernel/topology.c
+++ b/arch/arm64/kernel/topology.c
@@ -144,8 +144,8 @@ static struct cpu_amu_work __percpu *works;
  static cpumask_var_t cpus_to_visit;

  static DEFINE_PER_CPU_READ_MOSTLY(unsigned long, arch_max_freq_scale);
-static DEFINE_PER_CPU(u64, arch_const_cycles_prev);
-static DEFINE_PER_CPU(u64, arch_core_cycles_prev);
+DEFINE_PER_CPU(u64, arch_const_cycles_prev);
+DEFINE_PER_CPU(u64, arch_core_cycles_prev);
  DECLARE_PER_CPU(u8, amu_scale_freq);

  static void cpu_amu_fie_init_workfn(struct work_struct *work)
@@ -323,12 +323,64 @@ static int __init 
register_fie_counters_cpufreq_notifier(void)
  }
  core_initcall(register_fie_counters_cpufreq_notifier);

+static int __init init_amu_feature(void)
+{
+	struct cpufreq_policy *policy;
+	struct cpumask *checked_cpus;
+	int count, total;
+	int cpu, i;
+	s8 amu_config;
+	u64 ratio;
+
+	checked_cpus = kzalloc(cpumask_size(), GFP_KERNEL);
+	if (!checked_cpus)
+		return -ENOMEM;
+
+	for_each_possible_cpu(cpu) {
+		if (cpumask_test_cpu(cpu, checked_cpus))
+			continue;
+
+		policy = cpufreq_cpu_get(cpu);
+		if (!policy) {
+			pr_warn("No cpufreq policy found for CPU%d\n", cpu);
+			continue;
+		}
+
+		count = total = 0;
+
+		for_each_cpu(i, policy->related_cpus) {
+			amu_config = per_cpu(amu_scale_freq, i);
+			if (amu_config == 1)
+				count++;
+			total++;
+		}
+
+		amu_config = (total == count) ? 3 : 4;
+
+		ratio = (u64)arch_timer_get_rate() << (2 * SCHED_CAPACITY_SHIFT);
+		ratio = div64_u64(ratio, policy->cpuinfo.max_freq * 1000);
+
+		for_each_cpu(i, policy->related_cpus) {
+			per_cpu(arch_max_freq_scale, i) = (unsigned long)ratio;
+			per_cpu(amu_scale_freq, i) = amu_config;
+			cpumask_set_cpu(i, checked_cpus);
+		}
+
+		cpufreq_cpu_put(policy);
+	}
+
+	kfree(checked_cpus);
+
+	return 0;
+}
+late_initcall(init_amu_feature);
+
  void topology_scale_freq_tick(void)
  {
  	u64 prev_core_cnt, prev_const_cnt;
  	u64 core_cnt, const_cnt, scale;

-	if (!this_cpu_read(amu_scale_freq))
+	if (this_cpu_read(amu_scale_freq) != 3)
  		return;

  	const_cnt = read_sysreg_s(SYS_AMEVCNTR0_CONST_EL0);


-------------------------8<------------------------------------

Regards,
Lukasz

> 
> Thank you,
> Ionela.
> 

^ permalink raw reply	[flat|nested] 40+ messages in thread

* Re: [PATCH v2 1/6] arm64: add support for the AMU extension v1
  2020-01-23 18:32     ` Ionela Voinescu
@ 2020-01-24 12:00       ` Valentin Schneider
  2020-01-28 11:00         ` Ionela Voinescu
  0 siblings, 1 reply; 40+ messages in thread
From: Valentin Schneider @ 2020-01-24 12:00 UTC (permalink / raw)
  To: Ionela Voinescu
  Cc: catalin.marinas, will, mark.rutland, maz, suzuki.poulose,
	sudeep.holla, dietmar.eggemann, peterz, mingo, ggherdovich,
	vincent.guittot, linux-arm-kernel, linux-doc, linux-kernel

On 23/01/2020 18:32, Ionela Voinescu wrote:
[...]
> and later we can use information in
> AMCGCR_EL0 to get the number of architected counters (n) and
> AMEVTYPER0<n>_EL0 to find out the type. The same logic would apply to
> the auxiliary counters.
> 

Good, I think that's all we'll really need. I've not gone through the whole
series (yet!) so I might've missed AMCGCR being used.

>>> @@ -1150,6 +1152,59 @@ static bool has_hw_dbm(const struct arm64_cpu_capabilities *cap,
>>>  
>>>  #endif
>>>  
>>> +#ifdef CONFIG_ARM64_AMU_EXTN
>>> +
>>> +/*
>>> + * This per cpu variable only signals that the CPU implementation supports
>>> + * the Activity Monitors Unit (AMU) but does not provide information
>>> + * regarding all the events that it supports.
>>> + * When this amu_feat per CPU variable is true, the user of this feature
>>> + * can only rely on the presence of the 4 fixed counters. But this does
>>> + * not guarantee that the counters are enabled or access to these counters
>>> + * is provided by code executed at higher exception levels.
>>> + *
>>> + * Also, to ensure the safe use of this per_cpu variable, the following
>>> + * accessor is defined to allow a read of amu_feat for the current cpu only
>>> + * from the current cpu.
>>> + *  - cpu_has_amu_feat()
>>> + */
>>> +static DEFINE_PER_CPU_READ_MOSTLY(u8, amu_feat);
>>> +
>>
>> Why not bool?
>>
> 
> I've changed it from bool after a sparse warning about expression using
> sizeof(bool) and found this is due to sizeof(bool) being compiler
> dependent. It does not change anything but I thought it might be a good
> idea to define it as 8-bit unsigned and rely on fixed size.
> 

I believe conveying the intent (a truth value) is more important than the
underlying storage size in this case. It mostly matters when dealing with
aggregates, but here it's just a free-standing variable.

We already have a few per-CPU boolean variables in arm64/kernel/fpsimd.c
and the commits aren't even a year old, so I'd go for ignoring sparse this
time around.

> Thank you for the review,
> Ionela.
> 
>>> +inline bool cpu_has_amu_feat(void)
>>> +{
>>> +	return !!this_cpu_read(amu_feat);
>>> +}
>>> +
>>> +static void cpu_amu_enable(struct arm64_cpu_capabilities const *cap)
>>> +{
>>> +	if (has_cpuid_feature(cap, SCOPE_LOCAL_CPU)) {
>>> +		pr_info("detected CPU%d: Activity Monitors Unit (AMU)\n",
>>> +			smp_processor_id());
>>> +		this_cpu_write(amu_feat, 1);
>>> +	}
>>> +}

^ permalink raw reply	[flat|nested] 40+ messages in thread

* Re: [PATCH v2 6/6] arm64: use activity monitors for frequency invariance
  2020-01-24  1:19       ` Lukasz Luba
@ 2020-01-24 13:12         ` Ionela Voinescu
  2020-01-24 15:17           ` Lukasz Luba
  0 siblings, 1 reply; 40+ messages in thread
From: Ionela Voinescu @ 2020-01-24 13:12 UTC (permalink / raw)
  To: Lukasz Luba
  Cc: catalin.marinas, will, mark.rutland, maz, suzuki.poulose,
	sudeep.holla, dietmar.eggemann, peterz, mingo, ggherdovich,
	vincent.guittot, linux-arm-kernel, linux-doc, linux-kernel

Hi Lukasz,

On Friday 24 Jan 2020 at 01:19:31 (+0000), Lukasz Luba wrote:
> 
> 
> On 1/23/20 5:07 PM, Ionela Voinescu wrote:
> > Hi Lukasz,
> > 
> > Thank you for taking a look over the patches.
> > 
> > On Thursday 23 Jan 2020 at 11:49:29 (+0000), Lukasz Luba wrote:
> > > Hi Ionela,
> > > 
> > > Please find my few comments below.
> > > 
> > > On 12/18/19 6:26 PM, Ionela Voinescu wrote:
> > > > The Frequency Invariance Engine (FIE) is providing a frequency
> > > > scaling correction factor that helps achieve more accurate
> > > > load-tracking.
> > > > 
> > > > So far, for arm and arm64 platforms, this scale factor has been
> > > > obtained based on the ratio between the current frequency and the
> > > > maximum supported frequency recorded by the cpufreq policy. The
> > > > setting of this scale factor is triggered from cpufreq drivers by
> > > > calling arch_set_freq_scale. The current frequency used in computation
> > > > is the frequency requested by a governor, but it may not be the
> > > > frequency that was implemented by the platform.
> > > > 
> > > > This correction factor can also be obtained using a core counter and a
> > > > constant counter to get information on the performance (frequency based
> > > > only) obtained in a period of time. This will more accurately reflect
> > > > the actual current frequency of the CPU, compared with the alternative
> > > > implementation that reflects the request of a performance level from
> > > > the OS.
> > > > 
> > > > Therefore, implement arch_scale_freq_tick to use activity monitors, if
> > > > present, for the computation of the frequency scale factor.
> > > > 
> > > > The use of AMU counters depends on:
> > > >    - CONFIG_ARM64_AMU_EXTN - depents on the AMU extension being present
> > > >    - CONFIG_CPU_FREQ - the current frequency obtained using counter
> > > >      information is divided by the maximum frequency obtained from the
> > > >      cpufreq policy.
> > > > 
> > > > While it is possible to have a combination of CPUs in the system with
> > > > and without support for activity monitors, the use of counters for
> > > > frequency invariance is only enabled for a CPU, if all related CPUs
> > > > (CPUs in the same frequency domain) support and have enabled the core
> > > 
> > > This looks like an edge case scenario, for which we are designing the
> > > whole machinery with workqueues. AFAIU we cannot run the code in
> > > arch_set_freq_scale() and you want to be check all CPUs upfront.
> > > 
> > 
> > Unfortunately, I don't believe it to be be an edge-case. Given that this
> > is an optional feature, I do believe that people might skip on
> > implementing it on some CPUs(LITTLEs) while keeping it for CPUs(bigs)
> > where power and thermal mitigation is more probable to happen in firmware.
> > This is the main reason to be conservative in the validation of CPUs and
> > cpufreq policies.
> > 
> > In regards to arch_set_freq_scale, I want to be able to tell, when that
> > function is called, if I should return a scale factor based on cpufreq
> > for the current policy. If activity monitors are useable for the CPUs in
> > the full policy, than I'm bailing out and leave the AMU FIE machinery
> > set the scale factor. Unfortunately this works at policy granularity.
> > 
> > This could  be done in a nicer way by setting the scale factor per cpu
> > and not for all CPUs in a policy in this arch_set_freq_scale function.
> > But this would require some rewriting for the full frequency invariance
> > support in drivers which we've talked about for a while but it was not
> > the purpose of this patch set. But it would eliminate the policy
> > verification I do with the second workqueue.
> > 
> > > Maybe you can just wait till all CPUs boot and then set the proper
> > > flags and finish initialization. Something like:
> > > per_cpu(s8, amu_feat) /* form the patch 1/6 */
> > > OR
> > > per_cpu(u8, amu_scale_freq) /* from this patch */
> > > with maybe some values:
> > > 0 - not checked yet
> > > 1 - checked and present
> > > -1 - checked and not available
> > > -2 - checked but in conflict with others in the freq domain
> > > -3..-k - other odd configurations
> > > 
> > > could potentially eliminate the need of workqueues.
> > > 
> > > Then, if we could trigger this from i.e. late_initcall, the CPUs
> > > should be online and you can validate them.
> > > 
> > 
> > I did initially give such a state machine a try but it proved to be
> > quite messy. A big reason for this is that the activity monitors unit
> > has multiple counters that can be used for different purposes.
> > 
> > The amu_feat per_cpu variable only flags that you have the AMU present
> > for potential users (in this case FIE) to validate the counters they
> > need for their respective usecase. For this reason I don't want to
> > overload the meaning of amu_feat. For the same reason I'm not doing the
> > validation of the counters in a generic way, but I'm tying it to the
> > usecase for particular counters. For example, it would not matter if
> > the instructions retired counter is not enabled from firmware for the
> > usecase of FIE. For frequency invariance we only need the core and
> > constant cycle counters and I'm making it the job of the user (arm64
> > topology code) to do the checking.
> > 
> > Secondly, for amu_scale_freq I could have added such a state machine,
> > but I did not think it was useful. The only thing it would change is
> > that I would not have to use the cpu_amu_fie variable in the data
> > structure that gets passed to the work functions. The only way I would
> > eliminate the second workqueue was if I did not do a check of all CPUs
> > in a policy, as described above, and rewrite frequency invariance to
> > work at CPU granularity and not policy granularity. This would eliminate
> > the dependency on cpufreq policy all-together, so it would be worth
> > doing if only for this reason alone :).
> > 
> > But even in that case, it's probably not needed to have more than two
> > states for amu_freq_scale.
> > 
> > What do you think?
> 
> I think currently we are the only users for this AMU and if there will
> be another in the future, then we can start thinking about his proposed
> changes. Let's cross that bridge when we come to it.
> 
> Regarding the code, in the arch/arm64/cpufeature.c you can already
> read the cycle registers. All the CPUs are going through that code
> during start. If you use this fact in the late_initcall() all CPUs
> should be checked and you can just ask for cpufreq policy, calculate the
> max_freq ratio, set the per cpu config value to 'ready' state.
> 
> Something like in the code below, it is on top of your patch set.
> 
> ------------------------>8-------------------------------------
> 
> 
> diff --git a/arch/arm64/kernel/cpufeature.c b/arch/arm64/kernel/cpufeature.c
> index c639b3e052d7..837ea46d8867 100644
> --- a/arch/arm64/kernel/cpufeature.c
> +++ b/arch/arm64/kernel/cpufeature.c
> @@ -1168,19 +1168,26 @@ static bool has_hw_dbm(const struct
> arm64_cpu_capabilities *cap,
>   * from the current cpu.
>   *  - cpu_has_amu_feat()
>   */
> -static DEFINE_PER_CPU_READ_MOSTLY(u8, amu_feat);
> -
> -inline bool cpu_has_amu_feat(void)
> -{
> -	return !!this_cpu_read(amu_feat);
> -}
> +DECLARE_PER_CPU(u64, arch_const_cycles_prev);
> +DECLARE_PER_CPU(u64, arch_core_cycles_prev);
> +DECLARE_PER_CPU(u8, amu_scale_freq);
> 
>  static void cpu_amu_enable(struct arm64_cpu_capabilities const *cap)
>  {
> +	u64 core_cnt, const_cnt;
> +
>  	if (has_cpuid_feature(cap, SCOPE_LOCAL_CPU)) {
>  		pr_info("detected CPU%d: Activity Monitors Unit (AMU)\n",
>  			smp_processor_id());
> -		this_cpu_write(amu_feat, 1);
> +		core_cnt = read_sysreg_s(SYS_AMEVCNTR0_CORE_EL0);
> +		const_cnt = read_sysreg_s(SYS_AMEVCNTR0_CONST_EL0);
> +
> +		this_cpu_write(arch_core_cycles_prev, core_cnt);
> +		this_cpu_write(arch_const_cycles_prev, const_cnt);
> +
> +		this_cpu_write(amu_scale_freq, 1);
> +	} else {
> +		this_cpu_write(amu_scale_freq, 2);
>  	}
>  }


Yes, functionally this can be done here (it would need some extra checks
on the initial values of core_cnt and const_cnt), but what I was saying
in my previous comment is that I don't want to mix generic feature
detection, which should happen here, with counter validation for
frequency invariance. As you see, this would already bring here per-cpu
variables for counters and amu_scale_freq flag, and I only see this
getting more messy with the future use of more counters. I don't believe
this code belongs here.

Looking a bit more over the code and checking against the new frequency
invariance code for x86, there is a case of either doing this CPU
validation in smp_prepare_cpus (separately for arm64 and x86) or calling
an arch_init_freq_invariance() maybe in sched_init_smp to be defined with
the proper frequency invariance counter initialisation code separately
for x86 and arm64. I'll have to look more over the details to make sure
this is feasible.

> 
> diff --git a/arch/arm64/kernel/topology.c b/arch/arm64/kernel/topology.c
> index 61f8264afec9..95b34085ae64 100644
> --- a/arch/arm64/kernel/topology.c
> +++ b/arch/arm64/kernel/topology.c
> @@ -144,8 +144,8 @@ static struct cpu_amu_work __percpu *works;
>  static cpumask_var_t cpus_to_visit;
> 
>  static DEFINE_PER_CPU_READ_MOSTLY(unsigned long, arch_max_freq_scale);
> -static DEFINE_PER_CPU(u64, arch_const_cycles_prev);
> -static DEFINE_PER_CPU(u64, arch_core_cycles_prev);
> +DEFINE_PER_CPU(u64, arch_const_cycles_prev);
> +DEFINE_PER_CPU(u64, arch_core_cycles_prev);
>  DECLARE_PER_CPU(u8, amu_scale_freq);
> 
>  static void cpu_amu_fie_init_workfn(struct work_struct *work)
> @@ -323,12 +323,64 @@ static int __init
> register_fie_counters_cpufreq_notifier(void)
>  }
>  core_initcall(register_fie_counters_cpufreq_notifier);
> 
> +static int __init init_amu_feature(void)
> +{
> +	struct cpufreq_policy *policy;
> +	struct cpumask *checked_cpus;
> +	int count, total;
> +	int cpu, i;
> +	s8 amu_config;
> +	u64 ratio;
> +
> +	checked_cpus = kzalloc(cpumask_size(), GFP_KERNEL);
> +	if (!checked_cpus)
> +		return -ENOMEM;
> +
> +	for_each_possible_cpu(cpu) {
> +		if (cpumask_test_cpu(cpu, checked_cpus))
> +			continue;
> +
> +		policy = cpufreq_cpu_get(cpu);
> +		if (!policy) {
> +			pr_warn("No cpufreq policy found for CPU%d\n", cpu);
> +			continue;
> +		}
> +
> +		count = total = 0;
> +
> +		for_each_cpu(i, policy->related_cpus) {
> +			amu_config = per_cpu(amu_scale_freq, i);
> +			if (amu_config == 1)
> +				count++;
> +			total++;
> +		}
> +
> +		amu_config = (total == count) ? 3 : 4;
> +
> +		ratio = (u64)arch_timer_get_rate() << (2 * SCHED_CAPACITY_SHIFT);
> +		ratio = div64_u64(ratio, policy->cpuinfo.max_freq * 1000);
> +
> +		for_each_cpu(i, policy->related_cpus) {
> +			per_cpu(arch_max_freq_scale, i) = (unsigned long)ratio;
> +			per_cpu(amu_scale_freq, i) = amu_config;
> +			cpumask_set_cpu(i, checked_cpus);
> +		}
> +
> +		cpufreq_cpu_put(policy);
> +	}
> +
> +	kfree(checked_cpus);
> +
> +	return 0;
> +}
> +late_initcall(init_amu_feature);
> +

Yes, with the design I mentioned above, this CPU policy validation could
move to a late_initcall and I could drop the workqueues and the extra
data structure. Thanks for this!

Let me know what you think!

Thank you,
Ionela.

>  void topology_scale_freq_tick(void)
>  {
>  	u64 prev_core_cnt, prev_const_cnt;
>  	u64 core_cnt, const_cnt, scale;
> 
> -	if (!this_cpu_read(amu_scale_freq))
> +	if (this_cpu_read(amu_scale_freq) != 3)
>  		return;
> 
>  	const_cnt = read_sysreg_s(SYS_AMEVCNTR0_CONST_EL0);
> 
> 
> -------------------------8<------------------------------------
> 
> Regards,
> Lukasz
> 
> > 
> > Thank you,
> > Ionela.
> > 

^ permalink raw reply	[flat|nested] 40+ messages in thread

* Re: [PATCH v2 6/6] arm64: use activity monitors for frequency invariance
  2020-01-24 13:12         ` Ionela Voinescu
@ 2020-01-24 15:17           ` Lukasz Luba
  2020-01-28 17:36             ` Ionela Voinescu
  0 siblings, 1 reply; 40+ messages in thread
From: Lukasz Luba @ 2020-01-24 15:17 UTC (permalink / raw)
  To: Ionela Voinescu
  Cc: catalin.marinas, will, mark.rutland, maz, suzuki.poulose,
	sudeep.holla, dietmar.eggemann, peterz, mingo, ggherdovich,
	vincent.guittot, linux-arm-kernel, linux-doc, linux-kernel



On 1/24/20 1:12 PM, Ionela Voinescu wrote:
> Hi Lukasz,
> 
> On Friday 24 Jan 2020 at 01:19:31 (+0000), Lukasz Luba wrote:
>>
>>
>> On 1/23/20 5:07 PM, Ionela Voinescu wrote:
>>> Hi Lukasz,
>>>
>>> Thank you for taking a look over the patches.
>>>
>>> On Thursday 23 Jan 2020 at 11:49:29 (+0000), Lukasz Luba wrote:
>>>> Hi Ionela,
>>>>
>>>> Please find my few comments below.
>>>>
>>>> On 12/18/19 6:26 PM, Ionela Voinescu wrote:
>>>>> The Frequency Invariance Engine (FIE) is providing a frequency
>>>>> scaling correction factor that helps achieve more accurate
>>>>> load-tracking.
>>>>>
>>>>> So far, for arm and arm64 platforms, this scale factor has been
>>>>> obtained based on the ratio between the current frequency and the
>>>>> maximum supported frequency recorded by the cpufreq policy. The
>>>>> setting of this scale factor is triggered from cpufreq drivers by
>>>>> calling arch_set_freq_scale. The current frequency used in computation
>>>>> is the frequency requested by a governor, but it may not be the
>>>>> frequency that was implemented by the platform.
>>>>>
>>>>> This correction factor can also be obtained using a core counter and a
>>>>> constant counter to get information on the performance (frequency based
>>>>> only) obtained in a period of time. This will more accurately reflect
>>>>> the actual current frequency of the CPU, compared with the alternative
>>>>> implementation that reflects the request of a performance level from
>>>>> the OS.
>>>>>
>>>>> Therefore, implement arch_scale_freq_tick to use activity monitors, if
>>>>> present, for the computation of the frequency scale factor.
>>>>>
>>>>> The use of AMU counters depends on:
>>>>>     - CONFIG_ARM64_AMU_EXTN - depents on the AMU extension being present
>>>>>     - CONFIG_CPU_FREQ - the current frequency obtained using counter
>>>>>       information is divided by the maximum frequency obtained from the
>>>>>       cpufreq policy.
>>>>>
>>>>> While it is possible to have a combination of CPUs in the system with
>>>>> and without support for activity monitors, the use of counters for
>>>>> frequency invariance is only enabled for a CPU, if all related CPUs
>>>>> (CPUs in the same frequency domain) support and have enabled the core
>>>>
>>>> This looks like an edge case scenario, for which we are designing the
>>>> whole machinery with workqueues. AFAIU we cannot run the code in
>>>> arch_set_freq_scale() and you want to be check all CPUs upfront.
>>>>
>>>
>>> Unfortunately, I don't believe it to be be an edge-case. Given that this
>>> is an optional feature, I do believe that people might skip on
>>> implementing it on some CPUs(LITTLEs) while keeping it for CPUs(bigs)
>>> where power and thermal mitigation is more probable to happen in firmware.
>>> This is the main reason to be conservative in the validation of CPUs and
>>> cpufreq policies.
>>>
>>> In regards to arch_set_freq_scale, I want to be able to tell, when that
>>> function is called, if I should return a scale factor based on cpufreq
>>> for the current policy. If activity monitors are useable for the CPUs in
>>> the full policy, than I'm bailing out and leave the AMU FIE machinery
>>> set the scale factor. Unfortunately this works at policy granularity.
>>>
>>> This could  be done in a nicer way by setting the scale factor per cpu
>>> and not for all CPUs in a policy in this arch_set_freq_scale function.
>>> But this would require some rewriting for the full frequency invariance
>>> support in drivers which we've talked about for a while but it was not
>>> the purpose of this patch set. But it would eliminate the policy
>>> verification I do with the second workqueue.
>>>
>>>> Maybe you can just wait till all CPUs boot and then set the proper
>>>> flags and finish initialization. Something like:
>>>> per_cpu(s8, amu_feat) /* form the patch 1/6 */
>>>> OR
>>>> per_cpu(u8, amu_scale_freq) /* from this patch */
>>>> with maybe some values:
>>>> 0 - not checked yet
>>>> 1 - checked and present
>>>> -1 - checked and not available
>>>> -2 - checked but in conflict with others in the freq domain
>>>> -3..-k - other odd configurations
>>>>
>>>> could potentially eliminate the need of workqueues.
>>>>
>>>> Then, if we could trigger this from i.e. late_initcall, the CPUs
>>>> should be online and you can validate them.
>>>>
>>>
>>> I did initially give such a state machine a try but it proved to be
>>> quite messy. A big reason for this is that the activity monitors unit
>>> has multiple counters that can be used for different purposes.
>>>
>>> The amu_feat per_cpu variable only flags that you have the AMU present
>>> for potential users (in this case FIE) to validate the counters they
>>> need for their respective usecase. For this reason I don't want to
>>> overload the meaning of amu_feat. For the same reason I'm not doing the
>>> validation of the counters in a generic way, but I'm tying it to the
>>> usecase for particular counters. For example, it would not matter if
>>> the instructions retired counter is not enabled from firmware for the
>>> usecase of FIE. For frequency invariance we only need the core and
>>> constant cycle counters and I'm making it the job of the user (arm64
>>> topology code) to do the checking.
>>>
>>> Secondly, for amu_scale_freq I could have added such a state machine,
>>> but I did not think it was useful. The only thing it would change is
>>> that I would not have to use the cpu_amu_fie variable in the data
>>> structure that gets passed to the work functions. The only way I would
>>> eliminate the second workqueue was if I did not do a check of all CPUs
>>> in a policy, as described above, and rewrite frequency invariance to
>>> work at CPU granularity and not policy granularity. This would eliminate
>>> the dependency on cpufreq policy all-together, so it would be worth
>>> doing if only for this reason alone :).
>>>
>>> But even in that case, it's probably not needed to have more than two
>>> states for amu_freq_scale.
>>>
>>> What do you think?
>>
>> I think currently we are the only users for this AMU and if there will
>> be another in the future, then we can start thinking about his proposed
>> changes. Let's cross that bridge when we come to it.
>>
>> Regarding the code, in the arch/arm64/cpufeature.c you can already
>> read the cycle registers. All the CPUs are going through that code
>> during start. If you use this fact in the late_initcall() all CPUs
>> should be checked and you can just ask for cpufreq policy, calculate the
>> max_freq ratio, set the per cpu config value to 'ready' state.
>>
>> Something like in the code below, it is on top of your patch set.
>>
>> ------------------------>8-------------------------------------
>>
>>
>> diff --git a/arch/arm64/kernel/cpufeature.c b/arch/arm64/kernel/cpufeature.c
>> index c639b3e052d7..837ea46d8867 100644
>> --- a/arch/arm64/kernel/cpufeature.c
>> +++ b/arch/arm64/kernel/cpufeature.c
>> @@ -1168,19 +1168,26 @@ static bool has_hw_dbm(const struct
>> arm64_cpu_capabilities *cap,
>>    * from the current cpu.
>>    *  - cpu_has_amu_feat()
>>    */
>> -static DEFINE_PER_CPU_READ_MOSTLY(u8, amu_feat);
>> -
>> -inline bool cpu_has_amu_feat(void)
>> -{
>> -	return !!this_cpu_read(amu_feat);
>> -}
>> +DECLARE_PER_CPU(u64, arch_const_cycles_prev);
>> +DECLARE_PER_CPU(u64, arch_core_cycles_prev);
>> +DECLARE_PER_CPU(u8, amu_scale_freq);
>>
>>   static void cpu_amu_enable(struct arm64_cpu_capabilities const *cap)
>>   {
>> +	u64 core_cnt, const_cnt;
>> +
>>   	if (has_cpuid_feature(cap, SCOPE_LOCAL_CPU)) {
>>   		pr_info("detected CPU%d: Activity Monitors Unit (AMU)\n",
>>   			smp_processor_id());
>> -		this_cpu_write(amu_feat, 1);
>> +		core_cnt = read_sysreg_s(SYS_AMEVCNTR0_CORE_EL0);
>> +		const_cnt = read_sysreg_s(SYS_AMEVCNTR0_CONST_EL0);
>> +
>> +		this_cpu_write(arch_core_cycles_prev, core_cnt);
>> +		this_cpu_write(arch_const_cycles_prev, const_cnt);
>> +
>> +		this_cpu_write(amu_scale_freq, 1);
>> +	} else {
>> +		this_cpu_write(amu_scale_freq, 2);
>>   	}
>>   }
> 
> 
> Yes, functionally this can be done here (it would need some extra checks
> on the initial values of core_cnt and const_cnt), but what I was saying
> in my previous comment is that I don't want to mix generic feature
> detection, which should happen here, with counter validation for
> frequency invariance. As you see, this would already bring here per-cpu
> variables for counters and amu_scale_freq flag, and I only see this
> getting more messy with the future use of more counters. I don't believe
> this code belongs here.
> 
> Looking a bit more over the code and checking against the new frequency
> invariance code for x86, there is a case of either doing this CPU
> validation in smp_prepare_cpus (separately for arm64 and x86) or calling
> an arch_init_freq_invariance() maybe in sched_init_smp to be defined with
> the proper frequency invariance counter initialisation code separately
> for x86 and arm64. I'll have to look more over the details to make sure
> this is feasible.

I have found that we could simply draw on from Mark's solution to
similar problem. In commit:

commit df857416a13734ed9356f6e4f0152d55e4fb748a
Author: Mark Rutland <mark.rutland@arm.com>
Date:   Wed Jul 16 16:32:44 2014 +0100

     arm64: cpuinfo: record cpu system register values

     Several kernel subsystems need to know details about CPU system 
register
     values, sometimes for CPUs other than that they are executing on. 
Rather
     than hard-coding system register accesses and cross-calls for these
     cases, this patch adds logic to record various system register 
values at
     boot-time. This may be used for feature reporting, firmware bug
     detection, etc.

     Separate hooks are added for the boot and hotplug paths to enable
     one-time intialisation and cold/warm boot value mismatch detection in
     later patches.

     Signed-off-by: Mark Rutland <mark.rutland@arm.com>
     Reviewed-by: Will Deacon <will.deacon@arm.com>
     Reviewed-by: Catalin Marinas <catalin.marinas@arm.com>
     Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>


He added cpuinfo_store_cpu() call in secondary_start_kernel()
[in arm64 smp.c]. Please check the file:
arch/arm64/kernel/cpuinfo.c

We can probably add our read-amu-regs-and-setup-invariance call
just below his cpuinfo_store_cpu.

Then the arm64 cpufeature.c would be clean, we will be called for
each cpu, late_initcal() will finish setup with edge case policy
check like in the init_amu_feature() code below.


> 
>>
>> diff --git a/arch/arm64/kernel/topology.c b/arch/arm64/kernel/topology.c
>> index 61f8264afec9..95b34085ae64 100644
>> --- a/arch/arm64/kernel/topology.c
>> +++ b/arch/arm64/kernel/topology.c
>> @@ -144,8 +144,8 @@ static struct cpu_amu_work __percpu *works;
>>   static cpumask_var_t cpus_to_visit;
>>
>>   static DEFINE_PER_CPU_READ_MOSTLY(unsigned long, arch_max_freq_scale);
>> -static DEFINE_PER_CPU(u64, arch_const_cycles_prev);
>> -static DEFINE_PER_CPU(u64, arch_core_cycles_prev);
>> +DEFINE_PER_CPU(u64, arch_const_cycles_prev);
>> +DEFINE_PER_CPU(u64, arch_core_cycles_prev);
>>   DECLARE_PER_CPU(u8, amu_scale_freq);
>>
>>   static void cpu_amu_fie_init_workfn(struct work_struct *work)
>> @@ -323,12 +323,64 @@ static int __init
>> register_fie_counters_cpufreq_notifier(void)
>>   }
>>   core_initcall(register_fie_counters_cpufreq_notifier);
>>
>> +static int __init init_amu_feature(void)
>> +{
>> +	struct cpufreq_policy *policy;
>> +	struct cpumask *checked_cpus;
>> +	int count, total;
>> +	int cpu, i;
>> +	s8 amu_config;
>> +	u64 ratio;
>> +
>> +	checked_cpus = kzalloc(cpumask_size(), GFP_KERNEL);
>> +	if (!checked_cpus)
>> +		return -ENOMEM;
>> +
>> +	for_each_possible_cpu(cpu) {
>> +		if (cpumask_test_cpu(cpu, checked_cpus))
>> +			continue;
>> +
>> +		policy = cpufreq_cpu_get(cpu);
>> +		if (!policy) {
>> +			pr_warn("No cpufreq policy found for CPU%d\n", cpu);
>> +			continue;
>> +		}
>> +
>> +		count = total = 0;
>> +
>> +		for_each_cpu(i, policy->related_cpus) {
>> +			amu_config = per_cpu(amu_scale_freq, i);
>> +			if (amu_config == 1)
>> +				count++;
>> +			total++;
>> +		}
>> +
>> +		amu_config = (total == count) ? 3 : 4;
>> +
>> +		ratio = (u64)arch_timer_get_rate() << (2 * SCHED_CAPACITY_SHIFT);
>> +		ratio = div64_u64(ratio, policy->cpuinfo.max_freq * 1000);
>> +
>> +		for_each_cpu(i, policy->related_cpus) {
>> +			per_cpu(arch_max_freq_scale, i) = (unsigned long)ratio;
>> +			per_cpu(amu_scale_freq, i) = amu_config;
>> +			cpumask_set_cpu(i, checked_cpus);
>> +		}
>> +
>> +		cpufreq_cpu_put(policy);
>> +	}
>> +
>> +	kfree(checked_cpus);
>> +
>> +	return 0;
>> +}
>> +late_initcall(init_amu_feature);
>> +
> 
> Yes, with the design I mentioned above, this CPU policy validation could
> move to a late_initcall and I could drop the workqueues and the extra
> data structure. Thanks for this!
> 
> Let me know what you think!
> 

One think is still open, the file drivers/base/arch_topology.c and
#ifdef in function arch_set_freq_scale().

Generally, if there is such need, it's better to put such stuff into the
header and make dual implementation not polluting generic code with:
#if defined(CONFIG_ARM64_XZY)
#endif
#if defined(CONFIG_POWERPC_ABC)
#endif
#if defined(CONFIG_x86_QAZ)
#endif
...


In our case we would need i.e. linux/topology.h because it includes
asm/topology.h, which might provide a needed symbol. At the end of
linux/topology.h we can have:

#ifndef arch_cpu_auto_scaling
static __always_inline
bool arch_cpu_auto_scaling(void) { return False; }
#endif

Then, when the symbol was missing and we got the default one,
it should be easily optimized by the compiler.

We could have a much cleaner function arch_set_freq_scale()
in drivers/base/ and all architecture will deal with specific
#ifdef CONFIG in their <asm/topology.h> implementations or
use default.

Example:
arch_set_freq_scale()
{
	unsigned long scale;
	int i;
	
	if (arch_cpu_auto_scaling(cpu))
		return;

	scale = (cur_freq << SCHED_CAPACITY_SHIFT) / max_freq;
	for_each_cpu(i, cpus)
		per_cpu(freq_scale, i) = scale;
}

Regards,
Lukasz







^ permalink raw reply	[flat|nested] 40+ messages in thread

* Re: [PATCH v2 3/6] arm64/kvm: disable access to AMU registers from kvm guests
  2019-12-18 18:26 ` [PATCH v2 3/6] arm64/kvm: disable access to AMU registers from kvm guests Ionela Voinescu
@ 2020-01-27 15:33   ` Valentin Schneider
  2020-01-28 15:48     ` Ionela Voinescu
  2020-01-28 17:26     ` Suzuki Kuruppassery Poulose
  0 siblings, 2 replies; 40+ messages in thread
From: Valentin Schneider @ 2020-01-27 15:33 UTC (permalink / raw)
  To: Ionela Voinescu, catalin.marinas, will, mark.rutland, maz,
	suzuki.poulose, sudeep.holla, dietmar.eggemann
  Cc: peterz, mingo, ggherdovich, vincent.guittot, linux-arm-kernel,
	linux-doc, linux-kernel, James Morse, Julien Thierry

On 18/12/2019 18:26, Ionela Voinescu wrote:
> diff --git a/arch/arm64/include/asm/kvm_arm.h b/arch/arm64/include/asm/kvm_arm.h
> index 6e5d839f42b5..dd20fb185d56 100644
> --- a/arch/arm64/include/asm/kvm_arm.h
> +++ b/arch/arm64/include/asm/kvm_arm.h
> @@ -266,10 +266,11 @@
>  #define CPTR_EL2_TFP_SHIFT 10
>  
>  /* Hyp Coprocessor Trap Register */
> -#define CPTR_EL2_TCPAC	(1 << 31)
> -#define CPTR_EL2_TTA	(1 << 20)
> -#define CPTR_EL2_TFP	(1 << CPTR_EL2_TFP_SHIFT)
>  #define CPTR_EL2_TZ	(1 << 8)
> +#define CPTR_EL2_TFP	(1 << CPTR_EL2_TFP_SHIFT)
> +#define CPTR_EL2_TTA	(1 << 20)
> +#define CPTR_EL2_TAM	(1 << 30)
> +#define CPTR_EL2_TCPAC	(1 << 31)

Nit: why the #define movement? Couldn't that just be added beneath
CPTR_EL2_TCPAC?

>  #define CPTR_EL2_RES1	0x000032ff /* known RES1 bits in CPTR_EL2 */
>  #define CPTR_EL2_DEFAULT	CPTR_EL2_RES1
>  
> diff --git a/arch/arm64/kvm/hyp/switch.c b/arch/arm64/kvm/hyp/switch.c
> index 72fbbd86eb5e..0bca87a2621f 100644
> --- a/arch/arm64/kvm/hyp/switch.c
> +++ b/arch/arm64/kvm/hyp/switch.c
> @@ -90,6 +90,17 @@ static void activate_traps_vhe(struct kvm_vcpu *vcpu)
>  	val = read_sysreg(cpacr_el1);
>  	val |= CPACR_EL1_TTA;
>  	val &= ~CPACR_EL1_ZEN;
> +
> +	/*
> +	 * With VHE enabled, we have HCR_EL2.{E2H,TGE} = {1,1}. Note that in
> +	 * this case CPACR_EL1 has the same bit layout as CPTR_EL2, and
> +	 * CPACR_EL1 accessing instructions are redefined to access CPTR_EL2.
> +	 * Therefore use CPTR_EL2.TAM bit reference to activate AMU register
> +	 * traps.
> +	 */
> +
> +	val |= CPTR_EL2_TAM;
> +

Hmm so this is a bit confusing for me, I've rewritten that part of the
email too many times (didn't help that I'm far from being a virt guru).
Rectifications are most welcome.


First, AFAICT we *don't* have HCR_EL2.TGE set anymore at this point, it's
cleared just a bit earlier in __activate_traps().


Then, your comment suggests that when we're running this code, CPACR_EL1
accesses are rerouted to CPTR_EL2. Annoyingly this isn't mentioned in
the doc of CPACR_EL1, but D5.6.3 does say

"""
When ARMv8.1-VHE is implemented, and HCR_EL2.E2H is set to 1, when executing
at EL2, some EL1 System register access instructions are redefined to access
the equivalent EL2 register.
"""

And CPACR_EL1 is part of these, so far so good. Now, the thing is
the doc for CPACR_EL1 *doesn't* mention any TAM bit - but CPTR_EL2 does.
I believe what *do* want here is to set CPTR_EL2.TAM (which IIUC we end
up doing via the rerouting).

So, providing I didn't get completely lost on the way, I have to ask:
why do we use CPACR_EL1 here? Couldn't we use CPTR_EL2 directly?


>  	if (update_fp_enabled(vcpu)) {
>  		if (vcpu_has_sve(vcpu))
>  			val |= CPACR_EL1_ZEN;
> diff --git a/arch/arm64/kvm/sys_regs.c b/arch/arm64/kvm/sys_regs.c
> index 9f2165937f7d..940ab9b4c98b 100644
> --- a/arch/arm64/kvm/sys_regs.c
> +++ b/arch/arm64/kvm/sys_regs.c
> @@ -1003,6 +1003,20 @@ static bool access_pmuserenr(struct kvm_vcpu *vcpu, struct sys_reg_params *p,
>  	{ SYS_DESC(SYS_PMEVTYPERn_EL0(n)),					\
>  	  access_pmu_evtyper, reset_unknown, (PMEVTYPER0_EL0 + n), }
>  
> +static bool access_amu(struct kvm_vcpu *vcpu, struct sys_reg_params *p,
> +			     const struct sys_reg_desc *r)
> +{
> +	kvm_inject_undefined(vcpu);
> +
> +	return false;
> +}
> +
> +/* Macro to expand the AMU counter and type registers*/
> +#define AMU_AMEVCNTR0_EL0(n) { SYS_DESC(SYS_AMEVCNTR0_EL0(n)), access_amu }
> +#define AMU_AMEVTYPE0_EL0(n) { SYS_DESC(SYS_AMEVTYPE0_EL0(n)), access_amu }
> +#define AMU_AMEVCNTR1_EL0(n) { SYS_DESC(SYS_AMEVCNTR1_EL0(n)), access_amu }
> +#define AMU_AMEVTYPE1_EL0(n) { SYS_DESC(SYS_AMEVTYPE1_EL0(n)), access_amu }
> +

You could save a *whopping* two lines with something like:

#define AMU_AMEVCNTR_EL0(group, n) { SYS_DESC(SYS_AMEVCNTR##group##_EL0(n)), access_amu }
#define AMU_AMEVTYPE_EL0(group, n) { SYS_DESC(SYS_AMEVTYPE##group##_EL0(n)), access_amu }

Though it doesn't help shortening the big register list below.

>  static bool trap_ptrauth(struct kvm_vcpu *vcpu,
>  			 struct sys_reg_params *p,
>  			 const struct sys_reg_desc *rd)
> @@ -1078,8 +1092,12 @@ static u64 read_id_reg(const struct kvm_vcpu *vcpu,
>  			 (u32)r->CRn, (u32)r->CRm, (u32)r->Op2);
>  	u64 val = raz ? 0 : read_sanitised_ftr_reg(id);
>  
> -	if (id == SYS_ID_AA64PFR0_EL1 && !vcpu_has_sve(vcpu)) {
> -		val &= ~(0xfUL << ID_AA64PFR0_SVE_SHIFT);
> +	if (id == SYS_ID_AA64PFR0_EL1) {
> +		if (!vcpu_has_sve(vcpu))
> +			val &= ~(0xfUL << ID_AA64PFR0_SVE_SHIFT);
> +		val &= ~(0xfUL << ID_AA64PFR0_AMU_SHIFT);
> +	} else if (id == SYS_ID_PFR0_EL1) {
> +		val &= ~(0xfUL << ID_PFR0_AMU_SHIFT);
>  	} else if (id == SYS_ID_AA64ISAR1_EL1 && !vcpu_has_ptrauth(vcpu)) {
>  		val &= ~((0xfUL << ID_AA64ISAR1_APA_SHIFT) |
>  			 (0xfUL << ID_AA64ISAR1_API_SHIFT) |

Could almost turn the thing into a switch case at this point.

^ permalink raw reply	[flat|nested] 40+ messages in thread

* Re: [PATCH v2 4/6] Documentation: arm64: document support for the AMU extension
  2019-12-18 18:26 ` [PATCH v2 4/6] Documentation: arm64: document support for the AMU extension Ionela Voinescu
@ 2020-01-27 16:47   ` Valentin Schneider
  2020-01-28 16:53     ` Ionela Voinescu
  2020-01-30 15:04   ` Suzuki Kuruppassery Poulose
  1 sibling, 1 reply; 40+ messages in thread
From: Valentin Schneider @ 2020-01-27 16:47 UTC (permalink / raw)
  To: Ionela Voinescu, catalin.marinas, will, mark.rutland, maz,
	suzuki.poulose, sudeep.holla, dietmar.eggemann
  Cc: peterz, mingo, ggherdovich, vincent.guittot, linux-arm-kernel,
	linux-doc, linux-kernel, Jonathan Corbet

On 18/12/2019 18:26, Ionela Voinescu wrote:
> +Basic support
> +-------------
> +
> +The kernel can safely run a mix of CPUs with and without support for the
> +activity monitors extension. Therefore, when CONFIG_ARM64_AMU_EXTN is
> +selected we unconditionally enable the capability to allow any late CPU
> +(secondary or hotplugged) to detect and use the feature.
> +
> +When the feature is detected on a CPU, a per-CPU variable (amu_feat) is
> +set, but this does not guarantee the correct functionality of the
> +counters, only the presence of the extension.
> +
> +Firmware (code running at higher exception levels, e.g. arm-tf) support is
> +needed to:
> + - Enable access for lower exception levels (EL2 and EL1) to the AMU
> +   registers.
> + - Enable the counters. If not enabled these will read as 0.

Just to make sure I understand - if AMUs are physically present but not
enabled by FW, we'll still
- see them as implemented in ID_AA64PFR0_EL1.AMU
- see some counters as available with e.g. AMCGCR_ELO.CG0NC > 0

But reading some AMEVCNTR<g><n> will return 0?

> + - Save/restore the counters before/after the CPU is being put/brought up
> +   from the 'off' power state.
> +
> +When using kernels that have this configuration enabled but boot with
> +broken firmware the user may experience panics or lockups when accessing
> +the counter registers.

Yikes

> Even if these symptoms are not observed, the
> +values returned by the register reads might not correctly reflect reality.
> +Most commonly, the counters will read as 0, indicating that they are not
> +enabled. If proper support is not provided in firmware it's best to disable
> +CONFIG_ARM64_AMU_EXTN.
> +

I haven't seen something that would try to catch this on the kernel side.
Can we try to detect that (e.g. at least one counter returns > 0) in
cpu_amu_enable() and thus not write to the CPU-local 'amu_feat'?

While we're on the topic of detecting broken stuff, what if some CPUs
implement some auxiliary counters that some others don't?

> +The fixed counters of AMUv1 are accessible though the following system
> +register definitions:
> + - SYS_AMEVCNTR0_CORE_EL0
> + - SYS_AMEVCNTR0_CONST_EL0
> + - SYS_AMEVCNTR0_INST_RET_EL0
> + - SYS_AMEVCNTR0_MEM_STALL_EL0
> +
> +Auxiliary platform specific counters can be accessed using
> +SYS_AMEVCNTR1_EL0(n), where n is a value between 0 and 15.
> +
> +Details can be found in: arch/arm64/include/asm/sysreg.h.
> +
> diff --git a/Documentation/arm64/booting.rst b/Documentation/arm64/booting.rst
> index 5d78a6f5b0ae..a3f1a47b6f1c 100644
> --- a/Documentation/arm64/booting.rst
> +++ b/Documentation/arm64/booting.rst
> @@ -248,6 +248,20 @@ Before jumping into the kernel, the following conditions must be met:
>      - HCR_EL2.APK (bit 40) must be initialised to 0b1
>      - HCR_EL2.API (bit 41) must be initialised to 0b1
>  
> +  For CPUs with Activity Monitors Unit v1 (AMUv1) extension present:
> +  - If EL3 is present:
> +    CPTR_EL3.TAM (bit 30) must be initialised to 0b0
> +    CPTR_EL2.TAM (bit 30) must be initialised to 0b0
> +    AMCNTENSET0_EL0 must be initialised to 0b1111

Nit: Or be a superset of the above, right? AIUI v1 only mandates the lower
4 bits to be set. Probably doesn't matter that much...


^ permalink raw reply	[flat|nested] 40+ messages in thread

* Re: [PATCH v2 1/6] arm64: add support for the AMU extension v1
  2020-01-24 12:00       ` Valentin Schneider
@ 2020-01-28 11:00         ` Ionela Voinescu
  0 siblings, 0 replies; 40+ messages in thread
From: Ionela Voinescu @ 2020-01-28 11:00 UTC (permalink / raw)
  To: Valentin Schneider
  Cc: catalin.marinas, will, mark.rutland, maz, suzuki.poulose,
	sudeep.holla, dietmar.eggemann, peterz, mingo, ggherdovich,
	vincent.guittot, linux-arm-kernel, linux-doc, linux-kernel

Hi Valentin,

On Friday 24 Jan 2020 at 12:00:25 (+0000), Valentin Schneider wrote:
> On 23/01/2020 18:32, Ionela Voinescu wrote:
> [...]
> > and later we can use information in
> > AMCGCR_EL0 to get the number of architected counters (n) and
> > AMEVTYPER0<n>_EL0 to find out the type. The same logic would apply to
> > the auxiliary counters.
> > 
> 
> Good, I think that's all we'll really need. I've not gone through the whole
> series (yet!) so I might've missed AMCGCR being used.
>

No, it's not used later in the patches either, specifically because
this is version 1 and we should be able to rely on these first 4
architected counters for all future versions of the AMU implementation.

> >>> @@ -1150,6 +1152,59 @@ static bool has_hw_dbm(const struct arm64_cpu_capabilities *cap,
> >>>  
> >>>  #endif
> >>>  
> >>> +#ifdef CONFIG_ARM64_AMU_EXTN
> >>> +
> >>> +/*
> >>> + * This per cpu variable only signals that the CPU implementation supports
> >>> + * the Activity Monitors Unit (AMU) but does not provide information
> >>> + * regarding all the events that it supports.
> >>> + * When this amu_feat per CPU variable is true, the user of this feature
> >>> + * can only rely on the presence of the 4 fixed counters. But this does
> >>> + * not guarantee that the counters are enabled or access to these counters
> >>> + * is provided by code executed at higher exception levels.
> >>> + *
> >>> + * Also, to ensure the safe use of this per_cpu variable, the following
> >>> + * accessor is defined to allow a read of amu_feat for the current cpu only
> >>> + * from the current cpu.
> >>> + *  - cpu_has_amu_feat()
> >>> + */
> >>> +static DEFINE_PER_CPU_READ_MOSTLY(u8, amu_feat);
> >>> +
> >>
> >> Why not bool?
> >>
> > 
> > I've changed it from bool after a sparse warning about expression using
> > sizeof(bool) and found this is due to sizeof(bool) being compiler
> > dependent. It does not change anything but I thought it might be a good
> > idea to define it as 8-bit unsigned and rely on fixed size.
> > 
> 
> I believe conveying the intent (a truth value) is more important than the
> underlying storage size in this case. It mostly matters when dealing with
> aggregates, but here it's just a free-standing variable.
> 
> We already have a few per-CPU boolean variables in arm64/kernel/fpsimd.c
> and the commits aren't even a year old, so I'd go for ignoring sparse this
> time around.
>

Will do!

Thanks,
Ionela.

^ permalink raw reply	[flat|nested] 40+ messages in thread

* Re: [PATCH v2 3/6] arm64/kvm: disable access to AMU registers from kvm guests
  2020-01-27 15:33   ` Valentin Schneider
@ 2020-01-28 15:48     ` Ionela Voinescu
  2020-01-28 17:26     ` Suzuki Kuruppassery Poulose
  1 sibling, 0 replies; 40+ messages in thread
From: Ionela Voinescu @ 2020-01-28 15:48 UTC (permalink / raw)
  To: Valentin Schneider
  Cc: catalin.marinas, will, mark.rutland, maz, suzuki.poulose,
	sudeep.holla, dietmar.eggemann, peterz, mingo, ggherdovich,
	vincent.guittot, linux-arm-kernel, linux-doc, linux-kernel,
	James Morse, Julien Thierry

On Monday 27 Jan 2020 at 15:33:26 (+0000), Valentin Schneider wrote:
> On 18/12/2019 18:26, Ionela Voinescu wrote:
> > diff --git a/arch/arm64/include/asm/kvm_arm.h b/arch/arm64/include/asm/kvm_arm.h
> > index 6e5d839f42b5..dd20fb185d56 100644
> > --- a/arch/arm64/include/asm/kvm_arm.h
> > +++ b/arch/arm64/include/asm/kvm_arm.h
> > @@ -266,10 +266,11 @@
> >  #define CPTR_EL2_TFP_SHIFT 10
> >  
> >  /* Hyp Coprocessor Trap Register */
> > -#define CPTR_EL2_TCPAC	(1 << 31)
> > -#define CPTR_EL2_TTA	(1 << 20)
> > -#define CPTR_EL2_TFP	(1 << CPTR_EL2_TFP_SHIFT)
> >  #define CPTR_EL2_TZ	(1 << 8)
> > +#define CPTR_EL2_TFP	(1 << CPTR_EL2_TFP_SHIFT)
> > +#define CPTR_EL2_TTA	(1 << 20)
> > +#define CPTR_EL2_TAM	(1 << 30)
> > +#define CPTR_EL2_TCPAC	(1 << 31)
> 
> Nit: why the #define movement? Couldn't that just be added beneath
> CPTR_EL2_TCPAC?
>

It was a 'while here' thing done wrong. I was looking at the CPACR bits
and it led me to believe that the order of bits in the rest of the file
was from least significant to most significant and I thought I'll
reorder this as well. But looking again I see that it was done
correctly the first time, according to most of the file. My bad!

> >  #define CPTR_EL2_RES1	0x000032ff /* known RES1 bits in CPTR_EL2 */
> >  #define CPTR_EL2_DEFAULT	CPTR_EL2_RES1
> >  
> > diff --git a/arch/arm64/kvm/hyp/switch.c b/arch/arm64/kvm/hyp/switch.c
> > index 72fbbd86eb5e..0bca87a2621f 100644
> > --- a/arch/arm64/kvm/hyp/switch.c
> > +++ b/arch/arm64/kvm/hyp/switch.c
> > @@ -90,6 +90,17 @@ static void activate_traps_vhe(struct kvm_vcpu *vcpu)
> >  	val = read_sysreg(cpacr_el1);
> >  	val |= CPACR_EL1_TTA;
> >  	val &= ~CPACR_EL1_ZEN;
> > +
> > +	/*
> > +	 * With VHE enabled, we have HCR_EL2.{E2H,TGE} = {1,1}. Note that in
> > +	 * this case CPACR_EL1 has the same bit layout as CPTR_EL2, and
> > +	 * CPACR_EL1 accessing instructions are redefined to access CPTR_EL2.
> > +	 * Therefore use CPTR_EL2.TAM bit reference to activate AMU register
> > +	 * traps.
> > +	 */
> > +
> > +	val |= CPTR_EL2_TAM;
> > +
> 
> Hmm so this is a bit confusing for me, I've rewritten that part of the
> email too many times (didn't help that I'm far from being a virt guru).
> Rectifications are most welcome.
> 

Yes, this is definitely not straight-forward. It took me a while to
retrace my steps in regards to this functionality as well.

> 
> First, AFAICT we *don't* have HCR_EL2.TGE set anymore at this point, it's
> cleared just a bit earlier in __activate_traps().
> 

First of all when I wrote the above I believed that when this function is
called we'll have HCR_EL2.{E2H,TGE} = {1,1}, which reflects running on
the host with general exceptions trapped to EL2. So thank you for the
correction.

But I don't believe running with TGE cleared changes anything at this
point. First of all I think we can only run the code here at EL2.
Initially I thought we might run it at EL1 for nested virtualisation
but for nested OSs we'll use NVHE so that problem goes away.

So when we run this code at EL2, accesses to EL1 registers are
redirected to their EL2 equivalents due to HCR_EL2.{E2H} = {1}, with no
impact from HCR_EL2.{TGE} = {0} in regards to the setting of the TAM bit
for CPACR_EL1/CPTR_EL2. Therefore, this code will result in AMU accesses
being trapped to EL2 when coming from EL0 or EL1 on the guest side, once
we enter guest.

> Then, your comment suggests that when we're running this code, CPACR_EL1
> accesses are rerouted to CPTR_EL2. Annoyingly this isn't mentioned in
> the doc of CPACR_EL1, but D5.6.3 does say
> 
> """
> When ARMv8.1-VHE is implemented, and HCR_EL2.E2H is set to 1, when executing
> at EL2, some EL1 System register access instructions are redefined to access
> the equivalent EL2 register.
> """
> 
> And CPACR_EL1 is part of these, so far so good. Now, the thing is
> the doc for CPACR_EL1 *doesn't* mention any TAM bit - but CPTR_EL2 does.
> I believe what *do* want here is to set CPTR_EL2.TAM (which IIUC we end
> up doing via the rerouting).
> 

Right! The error of my comment is that I believed that E2H and TGE
together determine the re-mapping or CPACR_EL1 to CPTR_EL2. But
actually, E2H determines this redirection when running at EL2, while TGE
only determines the current trapping behaviour: if we run with TGE=0,
we're running on the guest and CPACR_EL1 takes effect, and when we run
on the host with TGE=1, CPTR_EL2 takes effect.

I believe the reason CPACR_EL1 does not have a TAM bit is that for
trapping at EL1 we have the AMU register AMUSERENR_EL0 to trap accesses
from EL0.

When we run on the host side with HCR_EL2.{E2H,TGE} = {1,1}, the
CPTR_EL2.TAM bit takes effect.

I will modify my comment.

> So, providing I didn't get completely lost on the way, I have to ask:
> why do we use CPACR_EL1 here? Couldn't we use CPTR_EL2 directly?
>

No, all good so far :). I believe the reason is to keep the kernel as
generic as possible with the accesses to EL1 registers where a generic
kernel should be running. The fact that with VHE we know to be running
at EL2 and this code is only called at EL2 is more of an implementation
detail that should be hidden behind the VHE abstraction.


This being said I'm still not sure if I should be using here a
CPTR_EL2.TAM bit or a CPACR_EL1.TAM bit. Functionally it would be the
same but the use of one or another will 'break' some kind of
abstraction logic :).

Let me know if you have a preference.


>
> >  	if (update_fp_enabled(vcpu)) {
> >  		if (vcpu_has_sve(vcpu))
> >  			val |= CPACR_EL1_ZEN;
> > diff --git a/arch/arm64/kvm/sys_regs.c b/arch/arm64/kvm/sys_regs.c
> > index 9f2165937f7d..940ab9b4c98b 100644
> > --- a/arch/arm64/kvm/sys_regs.c
> > +++ b/arch/arm64/kvm/sys_regs.c
> > @@ -1003,6 +1003,20 @@ static bool access_pmuserenr(struct kvm_vcpu *vcpu, struct sys_reg_params *p,
> >  	{ SYS_DESC(SYS_PMEVTYPERn_EL0(n)),					\
> >  	  access_pmu_evtyper, reset_unknown, (PMEVTYPER0_EL0 + n), }
> >  
> > +static bool access_amu(struct kvm_vcpu *vcpu, struct sys_reg_params *p,
> > +			     const struct sys_reg_desc *r)
> > +{
> > +	kvm_inject_undefined(vcpu);
> > +
> > +	return false;
> > +}
> > +
> > +/* Macro to expand the AMU counter and type registers*/
> > +#define AMU_AMEVCNTR0_EL0(n) { SYS_DESC(SYS_AMEVCNTR0_EL0(n)), access_amu }
> > +#define AMU_AMEVTYPE0_EL0(n) { SYS_DESC(SYS_AMEVTYPE0_EL0(n)), access_amu }
> > +#define AMU_AMEVCNTR1_EL0(n) { SYS_DESC(SYS_AMEVCNTR1_EL0(n)), access_amu }
> > +#define AMU_AMEVTYPE1_EL0(n) { SYS_DESC(SYS_AMEVTYPE1_EL0(n)), access_amu }
> > +
> 
> You could save a *whopping* two lines with something like:
> 
> #define AMU_AMEVCNTR_EL0(group, n) { SYS_DESC(SYS_AMEVCNTR##group##_EL0(n)), access_amu }
> #define AMU_AMEVTYPE_EL0(group, n) { SYS_DESC(SYS_AMEVTYPE##group##_EL0(n)), access_amu }
> 

Will do!

> Though it doesn't help shortening the big register list below.
> 
> >  static bool trap_ptrauth(struct kvm_vcpu *vcpu,
> >  			 struct sys_reg_params *p,
> >  			 const struct sys_reg_desc *rd)
> > @@ -1078,8 +1092,12 @@ static u64 read_id_reg(const struct kvm_vcpu *vcpu,
> >  			 (u32)r->CRn, (u32)r->CRm, (u32)r->Op2);
> >  	u64 val = raz ? 0 : read_sanitised_ftr_reg(id);
> >  
> > -	if (id == SYS_ID_AA64PFR0_EL1 && !vcpu_has_sve(vcpu)) {
> > -		val &= ~(0xfUL << ID_AA64PFR0_SVE_SHIFT);
> > +	if (id == SYS_ID_AA64PFR0_EL1) {
> > +		if (!vcpu_has_sve(vcpu))
> > +			val &= ~(0xfUL << ID_AA64PFR0_SVE_SHIFT);
> > +		val &= ~(0xfUL << ID_AA64PFR0_AMU_SHIFT);
> > +	} else if (id == SYS_ID_PFR0_EL1) {
> > +		val &= ~(0xfUL << ID_PFR0_AMU_SHIFT);
> >  	} else if (id == SYS_ID_AA64ISAR1_EL1 && !vcpu_has_ptrauth(vcpu)) {
> >  		val &= ~((0xfUL << ID_AA64ISAR1_APA_SHIFT) |
> >  			 (0xfUL << ID_AA64ISAR1_API_SHIFT) |
> 
> Could almost turn the thing into a switch case at this point.


Right! It would definitely read better.

Thanks,
Ionela.

^ permalink raw reply	[flat|nested] 40+ messages in thread

* Re: [PATCH v2 1/6] arm64: add support for the AMU extension v1
  2019-12-18 18:26 ` [PATCH v2 1/6] arm64: add support for the AMU extension v1 Ionela Voinescu
  2020-01-23 17:04   ` Valentin Schneider
@ 2020-01-28 16:34   ` Suzuki Kuruppassery Poulose
  2020-01-29 16:42     ` Ionela Voinescu
  1 sibling, 1 reply; 40+ messages in thread
From: Suzuki Kuruppassery Poulose @ 2020-01-28 16:34 UTC (permalink / raw)
  To: Ionela Voinescu, catalin.marinas, will, mark.rutland, maz,
	sudeep.holla, dietmar.eggemann
  Cc: peterz, mingo, ggherdovich, vincent.guittot, linux-arm-kernel,
	linux-doc, linux-kernel

On 18/12/2019 18:26, Ionela Voinescu wrote:
> The activity monitors extension is an optional extension introduced
> by the ARMv8.4 CPU architecture. This implements basic support for
> version 1 of the activity monitors architecture, AMUv1.
> 
> This support includes:
> - Extension detection on each CPU (boot, secondary, hotplugged)
> - Register interface for AMU aarch64 registers
> - (while here) create defines for ID_PFR0_EL1 fields when adding
>    the AMU field information.
> 
> Signed-off-by: Ionela Voinescu <ionela.voinescu@arm.com>
> Cc: Catalin Marinas <catalin.marinas@arm.com>
> Cc: Will Deacon <will@kernel.org>
> Cc: Suzuki K Poulose <suzuki.poulose@arm.com>
> Cc: Marc Zyngier <maz@kernel.org>
> Cc: Mark Rutland <mark.rutland@arm.com>
> ---
>   arch/arm64/Kconfig                  | 27 ++++++++++
>   arch/arm64/include/asm/cpucaps.h    |  3 +-
>   arch/arm64/include/asm/cpufeature.h |  4 ++
>   arch/arm64/include/asm/sysreg.h     | 44 ++++++++++++++++
>   arch/arm64/kernel/cpufeature.c      | 81 +++++++++++++++++++++++++++--
>   5 files changed, 154 insertions(+), 5 deletions(-)
> 
> diff --git a/arch/arm64/Kconfig b/arch/arm64/Kconfig
> index ac31ed6184d0..6ae7bfa5812e 100644
> --- a/arch/arm64/Kconfig
> +++ b/arch/arm64/Kconfig
> @@ -1485,6 +1485,33 @@ config ARM64_PTR_AUTH
>   
>   endmenu
>   
> +menu "ARMv8.4 architectural features"
> +
> +config ARM64_AMU_EXTN
> +	bool "Enable support for the Activity Monitors Unit CPU extension"
> +	default y
> +	help
> +          The activity monitors extension is an optional extension introduced
> +          by the ARMv8.4 CPU architecture. This enables support for version 1
> +          of the activity monitors architecture, AMUv1.
> +
> +          To enable the use of this extension on CPUs that implement it, say Y.
> +
> +          Note that for architectural reasons, firmware _must_ implement AMU
> +          support when running on CPUs that present the activity monitors
> +          extension. The required support is present in:
> +            * Version 1.5 and later of the ARM Trusted Firmware
> +
> +          For kernels that have this configuration enabled but boot with broken
> +          firmware, you may need to say N here until the firmware is fixed.
> +          Otherwise you may experience firmware panics or lockups when
> +          accessing the counter registers. Even if you are not observing these
> +          symptoms, the values returned by the register reads might not
> +          correctly reflect reality. Most commonly, the value read will be 0,
> +          indicating that the counter is not enabled.
> +
> +endmenu
> +
>   config ARM64_SVE
>   	bool "ARM Scalable Vector Extension support"
>   	default y
> diff --git a/arch/arm64/include/asm/cpucaps.h b/arch/arm64/include/asm/cpucaps.h
> index b92683871119..7dde890bde50 100644
> --- a/arch/arm64/include/asm/cpucaps.h
> +++ b/arch/arm64/include/asm/cpucaps.h
> @@ -56,7 +56,8 @@
>   #define ARM64_WORKAROUND_CAVIUM_TX2_219_PRFM	46
>   #define ARM64_WORKAROUND_1542419		47
>   #define ARM64_WORKAROUND_1319367		48
> +#define ARM64_HAS_AMU_EXTN			49
>   
> -#define ARM64_NCAPS				49
> +#define ARM64_NCAPS				50
>   
>   #endif /* __ASM_CPUCAPS_H */
> diff --git a/arch/arm64/include/asm/cpufeature.h b/arch/arm64/include/asm/cpufeature.h
> index 4261d55e8506..b89e799d6972 100644
> --- a/arch/arm64/include/asm/cpufeature.h
> +++ b/arch/arm64/include/asm/cpufeature.h
> @@ -673,6 +673,10 @@ static inline bool cpu_has_hw_af(void)
>   						ID_AA64MMFR1_HADBS_SHIFT);
>   }
>   
> +#ifdef CONFIG_ARM64_AMU_EXTN
> +extern inline bool cpu_has_amu_feat(void);
> +#endif
> +
>   #endif /* __ASSEMBLY__ */
>   
>   #endif
> diff --git a/arch/arm64/include/asm/sysreg.h b/arch/arm64/include/asm/sysreg.h
> index 6e919fafb43d..bfcc87953a68 100644
> --- a/arch/arm64/include/asm/sysreg.h
> +++ b/arch/arm64/include/asm/sysreg.h
> @@ -382,6 +382,42 @@
>   #define SYS_TPIDR_EL0			sys_reg(3, 3, 13, 0, 2)
>   #define SYS_TPIDRRO_EL0			sys_reg(3, 3, 13, 0, 3)
>   
> +/* Definitions for system register interface to AMU for ARMv8.4 onwards */
> +#define SYS_AM_EL0(crm, op2)		sys_reg(3, 3, 13, crm, op2)
> +#define SYS_AMCR_EL0			SYS_AM_EL0(2, 0)
> +#define SYS_AMCFGR_EL0			SYS_AM_EL0(2, 1)
> +#define SYS_AMCGCR_EL0			SYS_AM_EL0(2, 2)
> +#define SYS_AMUSERENR_EL0		SYS_AM_EL0(2, 3)
> +#define SYS_AMCNTENCLR0_EL0		SYS_AM_EL0(2, 4)
> +#define SYS_AMCNTENSET0_EL0		SYS_AM_EL0(2, 5)
> +#define SYS_AMCNTENCLR1_EL0		SYS_AM_EL0(3, 0)
> +#define SYS_AMCNTENSET1_EL0		SYS_AM_EL0(3, 1)
> +
> +/*
> + * Group 0 of activity monitors (architected):
> + *                op0 CRn   op1   op2     CRm
> + * Counter:       11  1101  011   n<2:0>  010:n<3>
> + * Type:          11  1101  011   n<2:0>  011:n<3>
> + * n: 0-3
> + *
> + * Group 1 of activity monitors (auxiliary):
> + *                op0 CRn   op1   op2     CRm
> + * Counter:       11  1101  011   n<2:0>  110:n<3>
> + * Type:          11  1101  011   n<2:0>  111:n<3>
> + * n: 0-15
> + */
> +
> +#define SYS_AMEVCNTR0_EL0(n)            SYS_AM_EL0(4 + ((n) >> 3), (n) & 0x7)
> +#define SYS_AMEVTYPE0_EL0(n)            SYS_AM_EL0(6 + ((n) >> 3), (n) & 0x7)
> +#define SYS_AMEVCNTR1_EL0(n)            SYS_AM_EL0(12 + ((n) >> 3), (n) & 0x7)
> +#define SYS_AMEVTYPE1_EL0(n)            SYS_AM_EL0(14 + ((n) >> 3), (n) & 0x7)
> +
> +/* V1: Fixed (architecturally defined) activity monitors */
> +#define SYS_AMEVCNTR0_CORE_EL0          SYS_AMEVCNTR0_EL0(0)
> +#define SYS_AMEVCNTR0_CONST_EL0         SYS_AMEVCNTR0_EL0(1)
> +#define SYS_AMEVCNTR0_INST_RET_EL0      SYS_AMEVCNTR0_EL0(2)
> +#define SYS_AMEVCNTR0_MEM_STALL         SYS_AMEVCNTR0_EL0(3)
> +
>   #define SYS_CNTFRQ_EL0			sys_reg(3, 3, 14, 0, 0)
>   
>   #define SYS_CNTP_TVAL_EL0		sys_reg(3, 3, 14, 2, 0)
> @@ -577,6 +613,7 @@
>   #define ID_AA64PFR0_CSV3_SHIFT		60
>   #define ID_AA64PFR0_CSV2_SHIFT		56
>   #define ID_AA64PFR0_DIT_SHIFT		48
> +#define ID_AA64PFR0_AMU_SHIFT		44
>   #define ID_AA64PFR0_SVE_SHIFT		32
>   #define ID_AA64PFR0_RAS_SHIFT		28
>   #define ID_AA64PFR0_GIC_SHIFT		24
> @@ -587,6 +624,7 @@
>   #define ID_AA64PFR0_EL1_SHIFT		4
>   #define ID_AA64PFR0_EL0_SHIFT		0
>   
> +#define ID_AA64PFR0_AMU			0x1
>   #define ID_AA64PFR0_SVE			0x1
>   #define ID_AA64PFR0_RAS_V1		0x1
>   #define ID_AA64PFR0_FP_NI		0xf
> @@ -709,6 +747,12 @@
>   #define ID_AA64MMFR0_TGRAN16_NI		0x0
>   #define ID_AA64MMFR0_TGRAN16_SUPPORTED	0x1
>   
> +#define ID_PFR0_AMU_SHIFT		20
> +#define ID_PFR0_STATE3_SHIFT		12
> +#define ID_PFR0_STATE2_SHIFT		8
> +#define ID_PFR0_STATE1_SHIFT		4
> +#define ID_PFR0_STATE0_SHIFT		0
> +
>   #if defined(CONFIG_ARM64_4K_PAGES)
>   #define ID_AA64MMFR0_TGRAN_SHIFT	ID_AA64MMFR0_TGRAN4_SHIFT
>   #define ID_AA64MMFR0_TGRAN_SUPPORTED	ID_AA64MMFR0_TGRAN4_SUPPORTED
> diff --git a/arch/arm64/kernel/cpufeature.c b/arch/arm64/kernel/cpufeature.c
> index 04cf64e9f0c9..c639b3e052d7 100644
> --- a/arch/arm64/kernel/cpufeature.c
> +++ b/arch/arm64/kernel/cpufeature.c
> @@ -156,6 +156,7 @@ static const struct arm64_ftr_bits ftr_id_aa64pfr0[] = {
>   	ARM64_FTR_BITS(FTR_HIDDEN, FTR_NONSTRICT, FTR_LOWER_SAFE, ID_AA64PFR0_CSV3_SHIFT, 4, 0),
>   	ARM64_FTR_BITS(FTR_HIDDEN, FTR_NONSTRICT, FTR_LOWER_SAFE, ID_AA64PFR0_CSV2_SHIFT, 4, 0),
>   	ARM64_FTR_BITS(FTR_VISIBLE, FTR_STRICT, FTR_LOWER_SAFE, ID_AA64PFR0_DIT_SHIFT, 4, 0),
> +	ARM64_FTR_BITS(FTR_HIDDEN, FTR_NONSTRICT, FTR_LOWER_SAFE, ID_AA64PFR0_AMU_SHIFT, 4, 0),
>   	ARM64_FTR_BITS(FTR_VISIBLE_IF_IS_ENABLED(CONFIG_ARM64_SVE),
>   				   FTR_STRICT, FTR_LOWER_SAFE, ID_AA64PFR0_SVE_SHIFT, 4, 0),
>   	ARM64_FTR_BITS(FTR_HIDDEN, FTR_STRICT, FTR_LOWER_SAFE, ID_AA64PFR0_RAS_SHIFT, 4, 0),
> @@ -314,10 +315,11 @@ static const struct arm64_ftr_bits ftr_id_mmfr4[] = {
>   };
>   
>   static const struct arm64_ftr_bits ftr_id_pfr0[] = {
> -	ARM64_FTR_BITS(FTR_HIDDEN, FTR_STRICT, FTR_LOWER_SAFE, 12, 4, 0),		/* State3 */
> -	ARM64_FTR_BITS(FTR_HIDDEN, FTR_STRICT, FTR_LOWER_SAFE, 8, 4, 0),		/* State2 */
> -	ARM64_FTR_BITS(FTR_HIDDEN, FTR_STRICT, FTR_LOWER_SAFE, 4, 4, 0),		/* State1 */
> -	ARM64_FTR_BITS(FTR_HIDDEN, FTR_STRICT, FTR_LOWER_SAFE, 0, 4, 0),		/* State0 */
> +	ARM64_FTR_BITS(FTR_HIDDEN, FTR_STRICT, FTR_LOWER_SAFE, ID_PFR0_AMU_SHIFT, 4, 0),

Why is this STRICT while the aa64pfr0 field is NON_STRICT ? On the other
hand, do we need this entry ? Do we plan to support 32bit guests using
AMU counters ? If we do, we may need to cap this field for the guests.

Also, fyi, please note that there may be conflicts with another series 
from Anshuman which cleans up the tables and "naming" the shifts. [1].
[1] purposefully hides the AMU from ID_PFR0 due to the above reasoning.

> +	ARM64_FTR_BITS(FTR_HIDDEN, FTR_STRICT, FTR_LOWER_SAFE, ID_PFR0_STATE3_SHIFT, 4, 0),
> +	ARM64_FTR_BITS(FTR_HIDDEN, FTR_STRICT, FTR_LOWER_SAFE, ID_PFR0_STATE2_SHIFT, 4, 0),
> +	ARM64_FTR_BITS(FTR_HIDDEN, FTR_STRICT, FTR_LOWER_SAFE, ID_PFR0_STATE1_SHIFT, 4, 0),
> +	ARM64_FTR_BITS(FTR_HIDDEN, FTR_STRICT, FTR_LOWER_SAFE, ID_PFR0_STATE0_SHIFT, 4, 0),
>   	ARM64_FTR_END,
>   };
>   
> @@ -1150,6 +1152,59 @@ static bool has_hw_dbm(const struct arm64_cpu_capabilities *cap,
>   
>   #endif
>   
> +#ifdef CONFIG_ARM64_AMU_EXTN
> +
> +/*
> + * This per cpu variable only signals that the CPU implementation supports
> + * the Activity Monitors Unit (AMU) but does not provide information
> + * regarding all the events that it supports.
> + * When this amu_feat per CPU variable is true, the user of this feature
> + * can only rely on the presence of the 4 fixed counters. But this does
> + * not guarantee that the counters are enabled or access to these counters
> + * is provided by code executed at higher exception levels.
> + *
> + * Also, to ensure the safe use of this per_cpu variable, the following
> + * accessor is defined to allow a read of amu_feat for the current cpu only
> + * from the current cpu.
> + *  - cpu_has_amu_feat()
> + */
> +static DEFINE_PER_CPU_READ_MOSTLY(u8, amu_feat);
> +
> +inline bool cpu_has_amu_feat(void)
> +{
> +	return !!this_cpu_read(amu_feat);
> +}
> +

minor nit: Or you may use a cpumask_t set of CPUs where AMU is
available. But if you plan to extend this for the future AMU version
tracking the mask may not be sufficient.

[1] 
http://lists.infradead.org/pipermail/linux-arm-kernel/2020-January/708287.html


The rest looks fine to me.

Suzuki

^ permalink raw reply	[flat|nested] 40+ messages in thread

* Re: [PATCH v2 4/6] Documentation: arm64: document support for the AMU extension
  2020-01-27 16:47   ` Valentin Schneider
@ 2020-01-28 16:53     ` Ionela Voinescu
  2020-01-28 18:36       ` Valentin Schneider
  0 siblings, 1 reply; 40+ messages in thread
From: Ionela Voinescu @ 2020-01-28 16:53 UTC (permalink / raw)
  To: Valentin Schneider
  Cc: catalin.marinas, will, mark.rutland, maz, suzuki.poulose,
	sudeep.holla, dietmar.eggemann, peterz, mingo, ggherdovich,
	vincent.guittot, linux-arm-kernel, linux-doc, linux-kernel,
	Jonathan Corbet

On Monday 27 Jan 2020 at 16:47:29 (+0000), Valentin Schneider wrote:
> On 18/12/2019 18:26, Ionela Voinescu wrote:
> > +Basic support
> > +-------------
> > +
> > +The kernel can safely run a mix of CPUs with and without support for the
> > +activity monitors extension. Therefore, when CONFIG_ARM64_AMU_EXTN is
> > +selected we unconditionally enable the capability to allow any late CPU
> > +(secondary or hotplugged) to detect and use the feature.
> > +
> > +When the feature is detected on a CPU, a per-CPU variable (amu_feat) is
> > +set, but this does not guarantee the correct functionality of the
> > +counters, only the presence of the extension.
> > +
> > +Firmware (code running at higher exception levels, e.g. arm-tf) support is
> > +needed to:
> > + - Enable access for lower exception levels (EL2 and EL1) to the AMU
> > +   registers.
> > + - Enable the counters. If not enabled these will read as 0.
> 
> Just to make sure I understand - if AMUs are physically present but not
> enabled by FW, we'll still
> - see them as implemented in ID_AA64PFR0_EL1.AMU

Yes, this feature register only shows the physical presence on the unit
in hardware.

> - see some counters as available with e.g. AMCGCR_ELO.CG0NC > 0
> 

Yes, the same as above, this only shows their physical presence. For
AMUv1 - AMCGCR_ELO.CG0NC: the value of this field is set to 4.
AMCGCR_ELO.CG1NC will show the number of auxiliary counters implemented
in hardware.

> But reading some AMEVCNTR<g><n> will return 0?

Or you won't be able to access them at all. Lacking firmware support
accesses to AMU registers could be trapped in EL3. If access for EL1 and
EL2 is enabled from EL3, it's still possible that the counters
themselves are not enabled - that means they are not enabled to count
the events they are designed to be counting. That's why in this case the
event counter register could read 0.

But if we read 0, it does not necessarily mean that the counter is
disabled. It could also mean that the events is meant to count did not
happen yet.

> 
> > + - Save/restore the counters before/after the CPU is being put/brought up
> > +   from the 'off' power state.
> > +
> > +When using kernels that have this configuration enabled but boot with
> > +broken firmware the user may experience panics or lockups when accessing
> > +the counter registers.
> 
> Yikes
> 
> > Even if these symptoms are not observed, the
> > +values returned by the register reads might not correctly reflect reality.
> > +Most commonly, the counters will read as 0, indicating that they are not
> > +enabled. If proper support is not provided in firmware it's best to disable
> > +CONFIG_ARM64_AMU_EXTN.
> > +
> 
> I haven't seen something that would try to catch this on the kernel side.
> Can we try to detect that (e.g. at least one counter returns > 0) in
> cpu_amu_enable() and thus not write to the CPU-local 'amu_feat'?
> 

I'm reluctant to do this especially given that platforms might choose to
keep some counters disabled while enabling some counters that might not
have counted any events by the time we reach cpu_enable. We would end up
mistakenly disabling the feature. I would rather leave the validation of
the counters to be done at the location and for the purpose of their
use: see patch 6/6 - the use of counters for frequency invariance.

> While we're on the topic of detecting broken stuff, what if some CPUs
> implement some auxiliary counters that some others don't?
> 

I think it should be up to the user of that counter to decide if the
usecase is at CPU level or system level. My intention of this base
support was to keep it simple and allow users of some counters to
decide on their own how to validate and make use of either architected
or auxiliary counters.

For example, in the case of frequency invariance, given a platform that
does not support cpufreq based invariance, I would validate all CPUs for
the use of AMU core and constant counters. If it happens that some CPUs
do not support those counters or they are not enabled, we'd have to
disable frequency invariance at system level.

For some other scenarios only partial support is needed - only a subset
of CPUs need to support the counters for their use to be feasible.

But I believe only the user of the counters can decide, whether this is
happening in architecture code, driver code, generic code.

> > +The fixed counters of AMUv1 are accessible though the following system
> > +register definitions:
> > + - SYS_AMEVCNTR0_CORE_EL0
> > + - SYS_AMEVCNTR0_CONST_EL0
> > + - SYS_AMEVCNTR0_INST_RET_EL0
> > + - SYS_AMEVCNTR0_MEM_STALL_EL0
> > +
> > +Auxiliary platform specific counters can be accessed using
> > +SYS_AMEVCNTR1_EL0(n), where n is a value between 0 and 15.
> > +
> > +Details can be found in: arch/arm64/include/asm/sysreg.h.
> > +
> > diff --git a/Documentation/arm64/booting.rst b/Documentation/arm64/booting.rst
> > index 5d78a6f5b0ae..a3f1a47b6f1c 100644
> > --- a/Documentation/arm64/booting.rst
> > +++ b/Documentation/arm64/booting.rst
> > @@ -248,6 +248,20 @@ Before jumping into the kernel, the following conditions must be met:
> >      - HCR_EL2.APK (bit 40) must be initialised to 0b1
> >      - HCR_EL2.API (bit 41) must be initialised to 0b1
> >  
> > +  For CPUs with Activity Monitors Unit v1 (AMUv1) extension present:
> > +  - If EL3 is present:
> > +    CPTR_EL3.TAM (bit 30) must be initialised to 0b0
> > +    CPTR_EL2.TAM (bit 30) must be initialised to 0b0
> > +    AMCNTENSET0_EL0 must be initialised to 0b1111
> 
> Nit: Or be a superset of the above, right? AIUI v1 only mandates the lower
> 4 bits to be set. Probably doesn't matter that much...
> 

Right! This is more of a guideline: it can be a subset as well, if
platforms don't want some counters enabled. It can set the lower 4 bits
for enablement of all 4 architecture counters for v1, or more for future
versions with more architected counters.

Thanks,
Ionela.



^ permalink raw reply	[flat|nested] 40+ messages in thread

* Re: [PATCH v2 3/6] arm64/kvm: disable access to AMU registers from kvm guests
  2020-01-27 15:33   ` Valentin Schneider
  2020-01-28 15:48     ` Ionela Voinescu
@ 2020-01-28 17:26     ` Suzuki Kuruppassery Poulose
  2020-01-28 17:37       ` Valentin Schneider
  1 sibling, 1 reply; 40+ messages in thread
From: Suzuki Kuruppassery Poulose @ 2020-01-28 17:26 UTC (permalink / raw)
  To: Valentin Schneider, Ionela Voinescu, catalin.marinas, will,
	mark.rutland, maz, sudeep.holla, dietmar.eggemann
  Cc: peterz, mingo, ggherdovich, vincent.guittot, linux-arm-kernel,
	linux-doc, linux-kernel, James Morse, Julien Thierry

On 27/01/2020 15:33, Valentin Schneider wrote:
> On 18/12/2019 18:26, Ionela Voinescu wrote:
>> diff --git a/arch/arm64/include/asm/kvm_arm.h b/arch/arm64/include/asm/kvm_arm.h
>> index 6e5d839f42b5..dd20fb185d56 100644
>> --- a/arch/arm64/include/asm/kvm_arm.h
>> +++ b/arch/arm64/include/asm/kvm_arm.h
>> @@ -266,10 +266,11 @@
>>   #define CPTR_EL2_TFP_SHIFT 10
>>   
>>   /* Hyp Coprocessor Trap Register */
>> -#define CPTR_EL2_TCPAC	(1 << 31)
>> -#define CPTR_EL2_TTA	(1 << 20)
>> -#define CPTR_EL2_TFP	(1 << CPTR_EL2_TFP_SHIFT)
>>   #define CPTR_EL2_TZ	(1 << 8)
>> +#define CPTR_EL2_TFP	(1 << CPTR_EL2_TFP_SHIFT)
>> +#define CPTR_EL2_TTA	(1 << 20)
>> +#define CPTR_EL2_TAM	(1 << 30)
>> +#define CPTR_EL2_TCPAC	(1 << 31)
> 
> Nit: why the #define movement? Couldn't that just be added beneath
> CPTR_EL2_TCPAC?
> 
>>   #define CPTR_EL2_RES1	0x000032ff /* known RES1 bits in CPTR_EL2 */
>>   #define CPTR_EL2_DEFAULT	CPTR_EL2_RES1
>>   
>> diff --git a/arch/arm64/kvm/hyp/switch.c b/arch/arm64/kvm/hyp/switch.c
>> index 72fbbd86eb5e..0bca87a2621f 100644
>> --- a/arch/arm64/kvm/hyp/switch.c
>> +++ b/arch/arm64/kvm/hyp/switch.c
>> @@ -90,6 +90,17 @@ static void activate_traps_vhe(struct kvm_vcpu *vcpu)
>>   	val = read_sysreg(cpacr_el1);
>>   	val |= CPACR_EL1_TTA;
>>   	val &= ~CPACR_EL1_ZEN;
>> +
>> +	/*
>> +	 * With VHE enabled, we have HCR_EL2.{E2H,TGE} = {1,1}. Note that in
>> +	 * this case CPACR_EL1 has the same bit layout as CPTR_EL2, and
>> +	 * CPACR_EL1 accessing instructions are redefined to access CPTR_EL2.
>> +	 * Therefore use CPTR_EL2.TAM bit reference to activate AMU register
>> +	 * traps.
>> +	 */
>> +
>> +	val |= CPTR_EL2_TAM;
>> +
> 
> Hmm so this is a bit confusing for me, I've rewritten that part of the
> email too many times (didn't help that I'm far from being a virt guru).
> Rectifications are most welcome.
> 
> 
> First, AFAICT we *don't* have HCR_EL2.TGE set anymore at this point, it's
> cleared just a bit earlier in __activate_traps().
> 
> 
> Then, your comment suggests that when we're running this code, CPACR_EL1
> accesses are rerouted to CPTR_EL2. Annoyingly this isn't mentioned in
> the doc of CPACR_EL1, but D5.6.3 does say
> 
> """
> When ARMv8.1-VHE is implemented, and HCR_EL2.E2H is set to 1, when executing
> at EL2, some EL1 System register access instructions are redefined to access
> the equivalent EL2 register.
> """
> 
> And CPACR_EL1 is part of these, so far so good. Now, the thing is
> the doc for CPACR_EL1 *doesn't* mention any TAM bit - but CPTR_EL2 does.
> I believe what *do* want here is to set CPTR_EL2.TAM (which IIUC we end
> up doing via the rerouting).
> 
> So, providing I didn't get completely lost on the way, I have to ask:
> why do we use CPACR_EL1 here? Couldn't we use CPTR_EL2 directly?

Part of the reason is, CPTR_EL2 has different layout depending on
whether HCR_EL2.E2H == 1. e.g, CPTR_EL2.TTA move from Bit[28] to Bit[20].

So, to keep it simple, CPTR_EL2 is used for non-VHE code with the shifts
as defined by the "CPTR_EL2 when E2H=0"

if E2H == 1, CPTR_EL2 takes the layout of CPACR_EL1 and "overrides" some
of the RES0 bits in CPACR_EL1 with EL2 controls (e.g: TAM, TCPAC).
Thus we use CPACR_EL1 to keep the "shifts" non-conflicting (e.g, ZEN)
and is the right thing to do.

It is a bit confusing, but we are doing the right thing. May be we could 
improve the comment like :

	/*
	 * With VHE (HCR.E2H == 1), CPTR_EL2 has the same layout as
	 * CPACR_EL1, except for some missing controls, such as TAM.
	 * And accesses to CPACR_EL1 are routed to CPTR_EL2.
	 * Also CPTR_EL2.TAM has the same position with or without
	 * HCR.E2H == 1. Therefore, use CPTR_EL2.TAM here for
	 * trapping the AMU accesses.
	 */

Suzuki

^ permalink raw reply	[flat|nested] 40+ messages in thread

* Re: [PATCH v2 6/6] arm64: use activity monitors for frequency invariance
  2020-01-24 15:17           ` Lukasz Luba
@ 2020-01-28 17:36             ` Ionela Voinescu
  0 siblings, 0 replies; 40+ messages in thread
From: Ionela Voinescu @ 2020-01-28 17:36 UTC (permalink / raw)
  To: Lukasz Luba
  Cc: catalin.marinas, will, mark.rutland, maz, suzuki.poulose,
	sudeep.holla, dietmar.eggemann, peterz, mingo, ggherdovich,
	vincent.guittot, linux-arm-kernel, linux-doc, linux-kernel

Hi Lukasz,

On Friday 24 Jan 2020 at 15:17:48 (+0000), Lukasz Luba wrote:
[..]
> > >   static void cpu_amu_enable(struct arm64_cpu_capabilities const *cap)
> > >   {
> > > +	u64 core_cnt, const_cnt;
> > > +
> > >   	if (has_cpuid_feature(cap, SCOPE_LOCAL_CPU)) {
> > >   		pr_info("detected CPU%d: Activity Monitors Unit (AMU)\n",
> > >   			smp_processor_id());
> > > -		this_cpu_write(amu_feat, 1);
> > > +		core_cnt = read_sysreg_s(SYS_AMEVCNTR0_CORE_EL0);
> > > +		const_cnt = read_sysreg_s(SYS_AMEVCNTR0_CONST_EL0);
> > > +
> > > +		this_cpu_write(arch_core_cycles_prev, core_cnt);
> > > +		this_cpu_write(arch_const_cycles_prev, const_cnt);
> > > +
> > > +		this_cpu_write(amu_scale_freq, 1);
> > > +	} else {
> > > +		this_cpu_write(amu_scale_freq, 2);
> > >   	}
> > >   }
> > 
> > 
> > Yes, functionally this can be done here (it would need some extra checks
> > on the initial values of core_cnt and const_cnt), but what I was saying
> > in my previous comment is that I don't want to mix generic feature
> > detection, which should happen here, with counter validation for
> > frequency invariance. As you see, this would already bring here per-cpu
> > variables for counters and amu_scale_freq flag, and I only see this
> > getting more messy with the future use of more counters. I don't believe
> > this code belongs here.
> > 
> > Looking a bit more over the code and checking against the new frequency
> > invariance code for x86, there is a case of either doing this CPU
> > validation in smp_prepare_cpus (separately for arm64 and x86) or calling
> > an arch_init_freq_invariance() maybe in sched_init_smp to be defined with
> > the proper frequency invariance counter initialisation code separately
> > for x86 and arm64. I'll have to look more over the details to make sure
> > this is feasible.
> 
> I have found that we could simply draw on from Mark's solution to
> similar problem. In commit:
> 
> commit df857416a13734ed9356f6e4f0152d55e4fb748a
> Author: Mark Rutland <mark.rutland@arm.com>
> Date:   Wed Jul 16 16:32:44 2014 +0100
> 
>     arm64: cpuinfo: record cpu system register values
> 
>     Several kernel subsystems need to know details about CPU system register
>     values, sometimes for CPUs other than that they are executing on. Rather
>     than hard-coding system register accesses and cross-calls for these
>     cases, this patch adds logic to record various system register values at
>     boot-time. This may be used for feature reporting, firmware bug
>     detection, etc.
> 
>     Separate hooks are added for the boot and hotplug paths to enable
>     one-time intialisation and cold/warm boot value mismatch detection in
>     later patches.
> 
>     Signed-off-by: Mark Rutland <mark.rutland@arm.com>
>     Reviewed-by: Will Deacon <will.deacon@arm.com>
>     Reviewed-by: Catalin Marinas <catalin.marinas@arm.com>
>     Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>
> 
> 
> He added cpuinfo_store_cpu() call in secondary_start_kernel()
> [in arm64 smp.c]. Please check the file:
> arch/arm64/kernel/cpuinfo.c
> 
> We can probably add our read-amu-regs-and-setup-invariance call
> just below his cpuinfo_store_cpu.
> 
> Then the arm64 cpufeature.c would be clean, we will be called for
> each cpu, late_initcal() will finish setup with edge case policy
> check like in the init_amu_feature() code below.
> 

Yes, this should work: calling a AMU per_cpu validation function in
setup_processor for the boot CPU and in secondary_start_kernel for
secondary and hotplugged CPUs.

I would still like to bring this closer to the scheduler
(sched_init_smp) as frequency invariance is a functionality needed by
the scheduler and its initialisation should be part of scheduler init
code. But this together with needed interfaces for other architectures
can be done in a separate patchset that is not so AMU/arm64 specific.

[..]
> > 
> > Yes, with the design I mentioned above, this CPU policy validation could
> > move to a late_initcall and I could drop the workqueues and the extra
> > data structure. Thanks for this!
> > 
> > Let me know what you think!
> > 
> 
> One think is still open, the file drivers/base/arch_topology.c and
> #ifdef in function arch_set_freq_scale().
> 
> Generally, if there is such need, it's better to put such stuff into the
> header and make dual implementation not polluting generic code with:
> #if defined(CONFIG_ARM64_XZY)
> #endif
> #if defined(CONFIG_POWERPC_ABC)
> #endif
> #if defined(CONFIG_x86_QAZ)
> #endif
> ...
> 
> 
> In our case we would need i.e. linux/topology.h because it includes
> asm/topology.h, which might provide a needed symbol. At the end of
> linux/topology.h we can have:
> 
> #ifndef arch_cpu_auto_scaling
> static __always_inline
> bool arch_cpu_auto_scaling(void) { return False; }
> #endif
> 
> Then, when the symbol was missing and we got the default one,
> it should be easily optimized by the compiler.
> 
> We could have a much cleaner function arch_set_freq_scale()
> in drivers/base/ and all architecture will deal with specific
> #ifdef CONFIG in their <asm/topology.h> implementations or
> use default.
> 
> Example:
> arch_set_freq_scale()
> {
> 	unsigned long scale;
> 	int i;
> 	
> 	if (arch_cpu_auto_scaling(cpu))
> 		return;
> 
> 	scale = (cur_freq << SCHED_CAPACITY_SHIFT) / max_freq;
> 	for_each_cpu(i, cpus)
> 		per_cpu(freq_scale, i) = scale;
> }
> 
> Regards,
> Lukasz
>

Okay, it does look nice and clean. Let me give this a try in v3.

Thank you very much,
Ionela.

^ permalink raw reply	[flat|nested] 40+ messages in thread

* Re: [PATCH v2 3/6] arm64/kvm: disable access to AMU registers from kvm guests
  2020-01-28 17:26     ` Suzuki Kuruppassery Poulose
@ 2020-01-28 17:37       ` Valentin Schneider
  2020-01-28 17:52         ` Ionela Voinescu
  0 siblings, 1 reply; 40+ messages in thread
From: Valentin Schneider @ 2020-01-28 17:37 UTC (permalink / raw)
  To: Suzuki Kuruppassery Poulose, Ionela Voinescu, catalin.marinas,
	will, mark.rutland, maz, sudeep.holla, dietmar.eggemann
  Cc: peterz, mingo, ggherdovich, vincent.guittot, linux-arm-kernel,
	linux-doc, linux-kernel, James Morse, Julien Thierry

Hi Suzuki,

On 28/01/2020 17:26, Suzuki Kuruppassery Poulose wrote:
>> So, providing I didn't get completely lost on the way, I have to ask:
>> why do we use CPACR_EL1 here? Couldn't we use CPTR_EL2 directly?
> 
> Part of the reason is, CPTR_EL2 has different layout depending on
> whether HCR_EL2.E2H == 1. e.g, CPTR_EL2.TTA move from Bit[28] to Bit[20].
> 
> So, to keep it simple, CPTR_EL2 is used for non-VHE code with the shifts
> as defined by the "CPTR_EL2 when E2H=0"
> 
> if E2H == 1, CPTR_EL2 takes the layout of CPACR_EL1 and "overrides" some
> of the RES0 bits in CPACR_EL1 with EL2 controls (e.g: TAM, TCPAC).
> Thus we use CPACR_EL1 to keep the "shifts" non-conflicting (e.g, ZEN)
> and is the right thing to do.
> 
> It is a bit confusing, but we are doing the right thing. May be we could improve the comment like :
> 
>     /*
>      * With VHE (HCR.E2H == 1), CPTR_EL2 has the same layout as
>      * CPACR_EL1, except for some missing controls, such as TAM.
>      * And accesses to CPACR_EL1 are routed to CPTR_EL2.
>      * Also CPTR_EL2.TAM has the same position with or without
>      * HCR.E2H == 1. Therefore, use CPTR_EL2.TAM here for
>      * trapping the AMU accesses.
>      */
> 

Thanks for clearing this up! I also bothered MarcZ in the meantime who
also cleared up some of my confusion (including which layout takes effect).

So yeah, I think what we want here is to keep using CPTR_EL2_TAM but have a
comment that explains why (which you just provided!).

> Suzuki

^ permalink raw reply	[flat|nested] 40+ messages in thread

* Re: [PATCH v2 3/6] arm64/kvm: disable access to AMU registers from kvm guests
  2020-01-28 17:37       ` Valentin Schneider
@ 2020-01-28 17:52         ` Ionela Voinescu
  0 siblings, 0 replies; 40+ messages in thread
From: Ionela Voinescu @ 2020-01-28 17:52 UTC (permalink / raw)
  To: Valentin Schneider
  Cc: Suzuki Kuruppassery Poulose, catalin.marinas, will, mark.rutland,
	maz, sudeep.holla, dietmar.eggemann, peterz, mingo, ggherdovich,
	vincent.guittot, linux-arm-kernel, linux-doc, linux-kernel,
	James Morse, Julien Thierry

On Tuesday 28 Jan 2020 at 17:37:04 (+0000), Valentin Schneider wrote:
> Hi Suzuki,
> 
> On 28/01/2020 17:26, Suzuki Kuruppassery Poulose wrote:
> >> So, providing I didn't get completely lost on the way, I have to ask:
> >> why do we use CPACR_EL1 here? Couldn't we use CPTR_EL2 directly?
> > 
> > Part of the reason is, CPTR_EL2 has different layout depending on
> > whether HCR_EL2.E2H == 1. e.g, CPTR_EL2.TTA move from Bit[28] to Bit[20].
> > 
> > So, to keep it simple, CPTR_EL2 is used for non-VHE code with the shifts
> > as defined by the "CPTR_EL2 when E2H=0"
> > 
> > if E2H == 1, CPTR_EL2 takes the layout of CPACR_EL1 and "overrides" some
> > of the RES0 bits in CPACR_EL1 with EL2 controls (e.g: TAM, TCPAC).
> > Thus we use CPACR_EL1 to keep the "shifts" non-conflicting (e.g, ZEN)
> > and is the right thing to do.
> > 
> > It is a bit confusing, but we are doing the right thing. May be we could improve the comment like :
> > 
> >     /*
> >      * With VHE (HCR.E2H == 1), CPTR_EL2 has the same layout as
> >      * CPACR_EL1, except for some missing controls, such as TAM.
> >      * And accesses to CPACR_EL1 are routed to CPTR_EL2.
> >      * Also CPTR_EL2.TAM has the same position with or without
> >      * HCR.E2H == 1. Therefore, use CPTR_EL2.TAM here for
> >      * trapping the AMU accesses.
> >      */
> >

Thanks Suzuki, this makes sense!

Ionela.

> 
> Thanks for clearing this up! I also bothered MarcZ in the meantime who
> also cleared up some of my confusion (including which layout takes effect).
> 
> So yeah, I think what we want here is to keep using CPTR_EL2_TAM but have a
> comment that explains why (which you just provided!).
> 
> > Suzuki

^ permalink raw reply	[flat|nested] 40+ messages in thread

* Re: [PATCH v2 4/6] Documentation: arm64: document support for the AMU extension
  2020-01-28 16:53     ` Ionela Voinescu
@ 2020-01-28 18:36       ` Valentin Schneider
  0 siblings, 0 replies; 40+ messages in thread
From: Valentin Schneider @ 2020-01-28 18:36 UTC (permalink / raw)
  To: Ionela Voinescu
  Cc: catalin.marinas, will, mark.rutland, maz, suzuki.poulose,
	sudeep.holla, dietmar.eggemann, peterz, mingo, ggherdovich,
	vincent.guittot, linux-arm-kernel, linux-doc, linux-kernel,
	Jonathan Corbet

On 28/01/2020 16:53, Ionela Voinescu wrote:
> Or you won't be able to access them at all. Lacking firmware support
> accesses to AMU registers could be trapped in EL3. If access for EL1 and
> EL2 is enabled from EL3, it's still possible that the counters
> themselves are not enabled - that means they are not enabled to count
> the events they are designed to be counting. That's why in this case the
> event counter register could read 0.
> 
> But if we read 0, it does not necessarily mean that the counter is
> disabled. It could also mean that the events is meant to count did not
> happen yet.
> 

Right, which (as we discussed offline) is quite likely to happen if/when
we get stuff like SVE counters and we try to read them at boot time. Might
be worth adding a small note about that (0 != disabled).

>> I haven't seen something that would try to catch this on the kernel side.
>> Can we try to detect that (e.g. at least one counter returns > 0) in
>> cpu_amu_enable() and thus not write to the CPU-local 'amu_feat'?
>>
> 
> I'm reluctant to do this especially given that platforms might choose to
> keep some counters disabled while enabling some counters that might not
> have counted any events by the time we reach cpu_enable. We would end up
> mistakenly disabling the feature. I would rather leave the validation of
> the counters to be done at the location and for the purpose of their
> use: see patch 6/6 - the use of counters for frequency invariance.
> 

Hmph, I'm a bit torn on that one. It would be really nice to provide *some*
amount of sanity checking at core level - e.g. by checking that at least
one of the four architected counters reads non-zero. But as you say these
could be disabled, while some other arch/aux counter is enabled, and we
could then mistakenly disable the feature. So we can't really do much
unless we handle *each* individual counter. Oh well :/

>> While we're on the topic of detecting broken stuff, what if some CPUs
>> implement some auxiliary counters that some others don't?
>>
> 
> I think it should be up to the user of that counter to decide if the
> usecase is at CPU level or system level. My intention of this base
> support was to keep it simple and allow users of some counters to
> decide on their own how to validate and make use of either architected
> or auxiliary counters.
> 
> For example, in the case of frequency invariance, given a platform that
> does not support cpufreq based invariance, I would validate all CPUs for
> the use of AMU core and constant counters. If it happens that some CPUs
> do not support those counters or they are not enabled, we'd have to
> disable frequency invariance at system level.
> 
> For some other scenarios only partial support is needed - only a subset
> of CPUs need to support the counters for their use to be feasible.
> 
> But I believe only the user of the counters can decide, whether this is
> happening in architecture code, driver code, generic code.
> 

Right, the FIE support is actually a good example of that, I think.

^ permalink raw reply	[flat|nested] 40+ messages in thread

* Re: [PATCH v2 1/6] arm64: add support for the AMU extension v1
  2020-01-28 16:34   ` Suzuki Kuruppassery Poulose
@ 2020-01-29 16:42     ` Ionela Voinescu
  0 siblings, 0 replies; 40+ messages in thread
From: Ionela Voinescu @ 2020-01-29 16:42 UTC (permalink / raw)
  To: Suzuki Kuruppassery Poulose
  Cc: catalin.marinas, will, mark.rutland, maz, sudeep.holla,
	dietmar.eggemann, peterz, mingo, ggherdovich, vincent.guittot,
	linux-arm-kernel, linux-doc, linux-kernel

Hi Suzuki,

On Tuesday 28 Jan 2020 at 16:34:24 (+0000), Suzuki Kuruppassery Poulose wrote:
> > --- a/arch/arm64/kernel/cpufeature.c
> > +++ b/arch/arm64/kernel/cpufeature.c
> > @@ -156,6 +156,7 @@ static const struct arm64_ftr_bits ftr_id_aa64pfr0[] = {
> >   	ARM64_FTR_BITS(FTR_HIDDEN, FTR_NONSTRICT, FTR_LOWER_SAFE, ID_AA64PFR0_CSV3_SHIFT, 4, 0),
> >   	ARM64_FTR_BITS(FTR_HIDDEN, FTR_NONSTRICT, FTR_LOWER_SAFE, ID_AA64PFR0_CSV2_SHIFT, 4, 0),
> >   	ARM64_FTR_BITS(FTR_VISIBLE, FTR_STRICT, FTR_LOWER_SAFE, ID_AA64PFR0_DIT_SHIFT, 4, 0),
> > +	ARM64_FTR_BITS(FTR_HIDDEN, FTR_NONSTRICT, FTR_LOWER_SAFE, ID_AA64PFR0_AMU_SHIFT, 4, 0),
> >   	ARM64_FTR_BITS(FTR_VISIBLE_IF_IS_ENABLED(CONFIG_ARM64_SVE),
> >   				   FTR_STRICT, FTR_LOWER_SAFE, ID_AA64PFR0_SVE_SHIFT, 4, 0),
> >   	ARM64_FTR_BITS(FTR_HIDDEN, FTR_STRICT, FTR_LOWER_SAFE, ID_AA64PFR0_RAS_SHIFT, 4, 0),
> > @@ -314,10 +315,11 @@ static const struct arm64_ftr_bits ftr_id_mmfr4[] = {
> >   };
> >   static const struct arm64_ftr_bits ftr_id_pfr0[] = {
> > -	ARM64_FTR_BITS(FTR_HIDDEN, FTR_STRICT, FTR_LOWER_SAFE, 12, 4, 0),		/* State3 */
> > -	ARM64_FTR_BITS(FTR_HIDDEN, FTR_STRICT, FTR_LOWER_SAFE, 8, 4, 0),		/* State2 */
> > -	ARM64_FTR_BITS(FTR_HIDDEN, FTR_STRICT, FTR_LOWER_SAFE, 4, 4, 0),		/* State1 */
> > -	ARM64_FTR_BITS(FTR_HIDDEN, FTR_STRICT, FTR_LOWER_SAFE, 0, 4, 0),		/* State0 */
> > +	ARM64_FTR_BITS(FTR_HIDDEN, FTR_STRICT, FTR_LOWER_SAFE, ID_PFR0_AMU_SHIFT, 4, 0),
> 
> Why is this STRICT while the aa64pfr0 field is NON_STRICT ? On the other
> hand, do we need this entry ? Do we plan to support 32bit guests using
> AMU counters ? If we do, we may need to cap this field for the guests.
>

No, we do not need this entry at all. This is an artifact left from
testing which I'll remove. The ID register is already modified to hide
the presence of AMU for both 32bit and 64bit guests (patch 3/6), and
this was supposed to be here just to validate that the capping of this
field for the guest does its job.

> Also, fyi, please note that there may be conflicts with another series from
> Anshuman which cleans up the tables and "naming" the shifts. [1].
> [1] purposefully hides the AMU from ID_PFR0 due to the above reasoning.
> 

Thanks, that's fine.

> > +	ARM64_FTR_BITS(FTR_HIDDEN, FTR_STRICT, FTR_LOWER_SAFE, ID_PFR0_STATE3_SHIFT, 4, 0),
> > +	ARM64_FTR_BITS(FTR_HIDDEN, FTR_STRICT, FTR_LOWER_SAFE, ID_PFR0_STATE2_SHIFT, 4, 0),
> > +	ARM64_FTR_BITS(FTR_HIDDEN, FTR_STRICT, FTR_LOWER_SAFE, ID_PFR0_STATE1_SHIFT, 4, 0),
> > +	ARM64_FTR_BITS(FTR_HIDDEN, FTR_STRICT, FTR_LOWER_SAFE, ID_PFR0_STATE0_SHIFT, 4, 0),
> >   	ARM64_FTR_END,
> >   };
> > @@ -1150,6 +1152,59 @@ static bool has_hw_dbm(const struct arm64_cpu_capabilities *cap,
> >   #endif
> > +#ifdef CONFIG_ARM64_AMU_EXTN
> > +
> > +/*
> > + * This per cpu variable only signals that the CPU implementation supports
> > + * the Activity Monitors Unit (AMU) but does not provide information
> > + * regarding all the events that it supports.
> > + * When this amu_feat per CPU variable is true, the user of this feature
> > + * can only rely on the presence of the 4 fixed counters. But this does
> > + * not guarantee that the counters are enabled or access to these counters
> > + * is provided by code executed at higher exception levels.
> > + *
> > + * Also, to ensure the safe use of this per_cpu variable, the following
> > + * accessor is defined to allow a read of amu_feat for the current cpu only
> > + * from the current cpu.
> > + *  - cpu_has_amu_feat()
> > + */
> > +static DEFINE_PER_CPU_READ_MOSTLY(u8, amu_feat);
> > +
> > +inline bool cpu_has_amu_feat(void)
> > +{
> > +	return !!this_cpu_read(amu_feat);
> > +}
> > +
> 
> minor nit: Or you may use a cpumask_t set of CPUs where AMU is
> available. But if you plan to extend this for the future AMU version
> tracking the mask may not be sufficient.
> 

To be honest, I would like not to have to use information about AMU
version for future support, but yes, it would be good to have the
possibility, just in case.


> [1] http://lists.infradead.org/pipermail/linux-arm-kernel/2020-January/708287.html
> 
> 
> The rest looks fine to me.
> 
> Suzuki

Thank you very much for the review,
Ionela.


^ permalink raw reply	[flat|nested] 40+ messages in thread

* Re: [PATCH v2 6/6] arm64: use activity monitors for frequency invariance
  2019-12-18 18:26 ` [PATCH v2 6/6] arm64: use activity monitors for " Ionela Voinescu
  2020-01-23 11:49   ` Lukasz Luba
@ 2020-01-29 17:13   ` Valentin Schneider
  2020-01-29 17:52     ` Ionela Voinescu
  2020-01-29 23:39     ` Valentin Schneider
  1 sibling, 2 replies; 40+ messages in thread
From: Valentin Schneider @ 2020-01-29 17:13 UTC (permalink / raw)
  To: Ionela Voinescu, catalin.marinas, will, mark.rutland, maz,
	suzuki.poulose, sudeep.holla, dietmar.eggemann
  Cc: peterz, mingo, ggherdovich, vincent.guittot, linux-arm-kernel,
	linux-doc, linux-kernel

Only commenting on the bits that should be there regardless of using the
workqueues or not;

On 18/12/2019 18:26, Ionela Voinescu wrote:
> +static void cpu_amu_fie_init_workfn(struct work_struct *work)
> +{
> +	u64 core_cnt, const_cnt, ratio;
> +	struct cpu_amu_work *amu_work;
> +	int cpu = smp_processor_id();
> +
> +	if (!cpu_has_amu_feat()) {
> +		pr_debug("CPU%d: counters are not supported.\n", cpu);
> +		return;
> +	}
> +
> +	core_cnt = read_sysreg_s(SYS_AMEVCNTR0_CORE_EL0);
> +	const_cnt = read_sysreg_s(SYS_AMEVCNTR0_CONST_EL0);
> +
> +	if (unlikely(!core_cnt || !const_cnt)) {
> +		pr_err("CPU%d: cycle counters are not enabled.\n", cpu);
> +		return;
> +	}
> +
> +	amu_work = container_of(work, struct cpu_amu_work, cpu_work);
> +	if (unlikely(!(amu_work->cpuinfo_max_freq))) {
> +		pr_err("CPU%d: invalid maximum frequency.\n", cpu);
> +		return;
> +	}
> +
> +	/*
> +	 * Pre-compute the fixed ratio between the frequency of the
> +	 * constant counter and the maximum frequency of the CPU (hz).

I can't resist: s/hz/Hz/

> +	 */
> +	ratio = (u64)arch_timer_get_rate() << (2 * SCHED_CAPACITY_SHIFT);
> +	ratio = div64_u64(ratio, amu_work->cpuinfo_max_freq * 1000);

Nit: we're missing a comment somewhere that the unit of this is in kHz
(which explains the * 1000).

> +	this_cpu_write(arch_max_freq_scale, (unsigned long)ratio);
> +

Okay so what we get in the tick is:

  /\ core
  --------
  /\ const

And we want that to be SCHED_CAPACITY_SCALE when running at max freq. IOW we
want to turn

  max_freq
  ----------
  const_freq

into SCHED_CAPACITY_SCALE, so we can just multiply that by:

  const_freq
  ---------- * SCHED_CAPACITY_SCALE
  max_freq

But what the ratio you are storing here is 

                          const_freq
  arch_max_freq_scale =   ---------- * SCHED_CAPACITY_SCALE²
                           max_freq

(because x << 2 * SCHED_CAPACITY_SHIFT == x << 20)


In topology_freq_scale_tick() you end up doing

  /\ core   arch_max_freq_scale
  ------- * --------------------
  /\ const  SCHED_CAPACITY_SCALE

which gives us what we want (SCHED_CAPACITY_SCALE at max freq).


Now, the reason why we multiply our ratio by the square of
SCHED_CAPACITY_SCALE was not obvious to me, but you pointed me out that the
frequency of the arch timer can be *really* low compared to the max CPU freq.

For instance on h960:

  [    0.000000] arch_timer: cp15 timer(s) running at 1.92MHz (phys)

  $ root@valsch-h960:~# cat /sys/devices/system/cpu/cpufreq/policy4/cpuinfo_max_freq 
  2362000

So our ratio would be

  1'920'000 * 1024
  ----------------
    2'362'000'000

Which is ~0.83, so that becomes simply 0...


I had a brief look at the Arm ARM, for the arch timer it says it is
"typically in the range 1-50MHz", but then also gives an example with 20KHz
in a low-power mode.

If we take say 5GHz max CPU frequency, our lower bound for the arch timer
(with that SCHED_CAPACITY_SCALE² trick) is about ~4.768KHz. It's not *too*
far from that 20KHz, but I'm not sure we would actually be executing stuff
in that low-power mode.

Long story short, we're probably fine, but it would nice to shove some of
the above into comments (especially the SCHED_CAPACITY_SCALE² trick)

^ permalink raw reply	[flat|nested] 40+ messages in thread

* Re: [PATCH v2 6/6] arm64: use activity monitors for frequency invariance
  2020-01-29 17:13   ` Valentin Schneider
@ 2020-01-29 17:52     ` Ionela Voinescu
  2020-01-29 23:39     ` Valentin Schneider
  1 sibling, 0 replies; 40+ messages in thread
From: Ionela Voinescu @ 2020-01-29 17:52 UTC (permalink / raw)
  To: Valentin Schneider
  Cc: catalin.marinas, will, mark.rutland, maz, suzuki.poulose,
	sudeep.holla, dietmar.eggemann, peterz, mingo, ggherdovich,
	vincent.guittot, linux-arm-kernel, linux-doc, linux-kernel

Hi Valentin,

On Wednesday 29 Jan 2020 at 17:13:53 (+0000), Valentin Schneider wrote:
> Only commenting on the bits that should be there regardless of using the
> workqueues or not;
> 
> On 18/12/2019 18:26, Ionela Voinescu wrote:
> > +static void cpu_amu_fie_init_workfn(struct work_struct *work)
> > +{
> > +	u64 core_cnt, const_cnt, ratio;
> > +	struct cpu_amu_work *amu_work;
> > +	int cpu = smp_processor_id();
> > +
> > +	if (!cpu_has_amu_feat()) {
> > +		pr_debug("CPU%d: counters are not supported.\n", cpu);
> > +		return;
> > +	}
> > +
> > +	core_cnt = read_sysreg_s(SYS_AMEVCNTR0_CORE_EL0);
> > +	const_cnt = read_sysreg_s(SYS_AMEVCNTR0_CONST_EL0);
> > +
> > +	if (unlikely(!core_cnt || !const_cnt)) {
> > +		pr_err("CPU%d: cycle counters are not enabled.\n", cpu);
> > +		return;
> > +	}
> > +
> > +	amu_work = container_of(work, struct cpu_amu_work, cpu_work);
> > +	if (unlikely(!(amu_work->cpuinfo_max_freq))) {
> > +		pr_err("CPU%d: invalid maximum frequency.\n", cpu);
> > +		return;
> > +	}
> > +
> > +	/*
> > +	 * Pre-compute the fixed ratio between the frequency of the
> > +	 * constant counter and the maximum frequency of the CPU (hz).
> 
> I can't resist: s/hz/Hz/
> 
> > +	 */
> > +	ratio = (u64)arch_timer_get_rate() << (2 * SCHED_CAPACITY_SHIFT);
> > +	ratio = div64_u64(ratio, amu_work->cpuinfo_max_freq * 1000);
> 
> Nit: we're missing a comment somewhere that the unit of this is in kHz
> (which explains the * 1000).
> 

Will do! The previous comment that explained this was ".. while
ensuring max_freq is converted to HZ.", but I believed it as too
clear and replaced it with the obscure "(hz)". I'll revert :).

> > +	this_cpu_write(arch_max_freq_scale, (unsigned long)ratio);
> > +
> 
> Okay so what we get in the tick is:
> 
>   /\ core
>   --------
>   /\ const
> 
> And we want that to be SCHED_CAPACITY_SCALE when running at max freq. IOW we
> want to turn
> 
>   max_freq
>   ----------
>   const_freq
> 
> into SCHED_CAPACITY_SCALE, so we can just multiply that by:
> 
>   const_freq
>   ---------- * SCHED_CAPACITY_SCALE
>   max_freq
> 
> But what the ratio you are storing here is 
> 
>                           const_freq
>   arch_max_freq_scale =   ---------- * SCHED_CAPACITY_SCALE²
>                            max_freq
> 
> (because x << 2 * SCHED_CAPACITY_SHIFT == x << 20)
> 
> 
> In topology_freq_scale_tick() you end up doing
> 
>   /\ core   arch_max_freq_scale
>   ------- * --------------------
>   /\ const  SCHED_CAPACITY_SCALE
> 
> which gives us what we want (SCHED_CAPACITY_SCALE at max freq).
> 
> 
> Now, the reason why we multiply our ratio by the square of
> SCHED_CAPACITY_SCALE was not obvious to me, but you pointed me out that the
> frequency of the arch timer can be *really* low compared to the max CPU freq.
> 
> For instance on h960:
> 
>   [    0.000000] arch_timer: cp15 timer(s) running at 1.92MHz (phys)
> 
>   $ root@valsch-h960:~# cat /sys/devices/system/cpu/cpufreq/policy4/cpuinfo_max_freq 
>   2362000
> 
> So our ratio would be
> 
>   1'920'000 * 1024
>   ----------------
>     2'362'000'000
> 
> Which is ~0.83, so that becomes simply 0...
> 
> 
> I had a brief look at the Arm ARM, for the arch timer it says it is
> "typically in the range 1-50MHz", but then also gives an example with 20KHz
> in a low-power mode.
> 
> If we take say 5GHz max CPU frequency, our lower bound for the arch timer
> (with that SCHED_CAPACITY_SCALE² trick) is about ~4.768KHz. It's not *too*
> far from that 20KHz, but I'm not sure we would actually be executing stuff
> in that low-power mode.
> 
> Long story short, we're probably fine, but it would nice to shove some of
> the above into comments (especially the SCHED_CAPACITY_SCALE² trick)

Okay, I'll add some of this documentation as comments in the patches. I
thought about doing it but I was not sure it justified the line count.
But if it saves people at least the hassle to unpack this computation to
understand the logic, it will be worth it.

Thank you for the thorough review,
Ionela.

^ permalink raw reply	[flat|nested] 40+ messages in thread

* Re: [PATCH v2 5/6] TEMP: sched: add interface for counter-based frequency invariance
  2019-12-18 18:26 ` [PATCH v2 5/6] TEMP: sched: add interface for counter-based frequency invariance Ionela Voinescu
@ 2020-01-29 19:37   ` Peter Zijlstra
  2020-01-30 15:33     ` Ionela Voinescu
  0 siblings, 1 reply; 40+ messages in thread
From: Peter Zijlstra @ 2020-01-29 19:37 UTC (permalink / raw)
  To: Ionela Voinescu
  Cc: catalin.marinas, will, mark.rutland, maz, suzuki.poulose,
	sudeep.holla, dietmar.eggemann, mingo, ggherdovich,
	vincent.guittot, linux-arm-kernel, linux-doc, linux-kernel,
	Juri Lelli

On Wed, Dec 18, 2019 at 06:26:06PM +0000, Ionela Voinescu wrote:
> To be noted that this patch is a temporary one. It introduces the
> interface added by the patches at [1] to allow update of the frequency
> invariance scale factor based on counters. If [1] is merged there is
> not need for this patch.
> 
> For platforms that support counters (x86 - APERF/MPERF, arm64 - AMU
> counters) the frequency invariance correction factor can be obtained
> using a core counter and a fixed counter to get information on the
> performance (frequency based only) obtained in a period of time. This
> will more accurately reflect the actual current frequency of the CPU,
> compared with the alternative implementation that reflects the request
> of a performance level from the OS through the cpufreq framework
> (arch_set_freq_scale).
> 
> Therefore, introduce an interface - arch_scale_freq_tick, to be
> implemented by each architecture and called for each CPU on the tick
> to update the scale factor based on the delta in the counter values,
> if counter support is present on the CPU.
> 
> Either because reading counters is expensive or because reading
> counters from remote CPUs is not possible or is expensive, only
> update the counter based frequency scale factor on the tick for
> now. A tick based update will definitely be necessary either due to
> it being the only point of update for certain architectures or in
> order to cache the counter values for a particular CPU, if a
> further update from that CPU is not possible.
> 
> [1]
> https://lore.kernel.org/lkml/20191113124654.18122-1-ggherdovich@suse.cz/

FWIW, those patches just landed in tip/sched/core

^ permalink raw reply	[flat|nested] 40+ messages in thread

* Re: [PATCH v2 6/6] arm64: use activity monitors for frequency invariance
  2020-01-29 17:13   ` Valentin Schneider
  2020-01-29 17:52     ` Ionela Voinescu
@ 2020-01-29 23:39     ` Valentin Schneider
  2020-01-30 15:49       ` Ionela Voinescu
  1 sibling, 1 reply; 40+ messages in thread
From: Valentin Schneider @ 2020-01-29 23:39 UTC (permalink / raw)
  To: Ionela Voinescu, catalin.marinas, will, mark.rutland, maz,
	suzuki.poulose, sudeep.holla, dietmar.eggemann
  Cc: peterz, mingo, ggherdovich, vincent.guittot, linux-arm-kernel,
	linux-doc, linux-kernel

On 29/01/2020 17:13, Valentin Schneider wrote:
> I had a brief look at the Arm ARM, for the arch timer it says it is
> "typically in the range 1-50MHz", but then also gives an example with 20KHz
> in a low-power mode.
> 
> If we take say 5GHz max CPU frequency, our lower bound for the arch timer
> (with that SCHED_CAPACITY_SCALE² trick) is about ~4.768KHz. It's not *too*
> far from that 20KHz, but I'm not sure we would actually be executing stuff
> in that low-power mode.
> 

I mixed up a few things in there; that low-power mode is supposed to do
higher increments, so it would emulate a similar frequency as the non-low-power
mode. Thus the actual frequency matters less than what is reported in CNTFRQ
(though we hope to get the behaviour we're told we should see), so we should
be quite safe from that ~5KHz value. Still, to make it obvious, I don't think
something like this would hurt:

---
diff --git a/drivers/clocksource/arm_arch_timer.c b/drivers/clocksource/arm_arch_timer.c
index 9a5464c625b45..a72832093575a 100644
--- a/drivers/clocksource/arm_arch_timer.c
+++ b/drivers/clocksource/arm_arch_timer.c
@@ -885,6 +885,17 @@ static int arch_timer_starting_cpu(unsigned int cpu)
 	return 0;
 }
 
+static int validate_timer_rate(void)
+{
+	if (!arch_timer_rate)
+		return 1;
+
+	/* Arch timer frequency < 1MHz is shady */
+	WARN_ON(arch_timer_rate < 1000000);
+
+	return 0;
+}
+
 /*
  * For historical reasons, when probing with DT we use whichever (non-zero)
  * rate was probed first, and don't verify that others match. If the first node
@@ -900,7 +911,7 @@ static void arch_timer_of_configure_rate(u32 rate, struct device_node *np)
 		arch_timer_rate = rate;
 
 	/* Check the timer frequency. */
-	if (arch_timer_rate == 0)
+	if (validate_timer_rate())
 		pr_warn("frequency not available\n");
 }
 
@@ -1594,7 +1605,7 @@ static int __init arch_timer_acpi_init(struct acpi_table_header *table)
 	 * CNTFRQ value. This *must* be correct.
 	 */
 	arch_timer_rate = arch_timer_get_cntfrq();
-	if (!arch_timer_rate) {
+	if (validate_timer_rate()) {
 		pr_err(FW_BUG "frequency not available.\n");
 		return -EINVAL;
 	}
---

> Long story short, we're probably fine, but it would nice to shove some of
> the above into comments (especially the SCHED_CAPACITY_SCALE² trick)
> 

^ permalink raw reply	[flat|nested] 40+ messages in thread

* Re: [PATCH v2 4/6] Documentation: arm64: document support for the AMU extension
  2019-12-18 18:26 ` [PATCH v2 4/6] Documentation: arm64: document support for the AMU extension Ionela Voinescu
  2020-01-27 16:47   ` Valentin Schneider
@ 2020-01-30 15:04   ` Suzuki Kuruppassery Poulose
  2020-01-30 16:45     ` Ionela Voinescu
  1 sibling, 1 reply; 40+ messages in thread
From: Suzuki Kuruppassery Poulose @ 2020-01-30 15:04 UTC (permalink / raw)
  To: Ionela Voinescu, catalin.marinas, will, mark.rutland, maz,
	sudeep.holla, dietmar.eggemann
  Cc: peterz, mingo, ggherdovich, vincent.guittot, linux-arm-kernel,
	linux-doc, linux-kernel, Jonathan Corbet

Hi Ionela,

On 18/12/2019 18:26, Ionela Voinescu wrote:
> The activity monitors extension is an optional extension introduced
> by the ARMv8.4 CPU architecture.
> 
> Add initial documentation for the AMUv1 extension:
>   - arm64/amu.txt: AMUv1 documentation
>   - arm64/booting.txt: system registers initialisation
>   - arm64/cpu-feature-registers.txt: visibility to userspace

We have stopped adding "invisible" fields to the list. So, you
can drop the changes to cpu-feature-registers.txt.

> 
> Signed-off-by: Ionela Voinescu <ionela.voinescu@arm.com>
> Cc: Catalin Marinas <catalin.marinas@arm.com>
> Cc: Will Deacon <will@kernel.org>
> Cc: Jonathan Corbet <corbet@lwn.net>
> ---
>   Documentation/arm64/amu.rst                   | 107 ++++++++++++++++++
>   Documentation/arm64/booting.rst               |  14 +++
>   Documentation/arm64/cpu-feature-registers.rst |   2 +
>   Documentation/arm64/index.rst                 |   1 +
>   4 files changed, 124 insertions(+)
>   create mode 100644 Documentation/arm64/amu.rst
> 
> diff --git a/Documentation/arm64/amu.rst b/Documentation/arm64/amu.rst
> new file mode 100644
> index 000000000000..62a6635522e1
> --- /dev/null
> +++ b/Documentation/arm64/amu.rst
> @@ -0,0 +1,107 @@
> +=======================================================
> +Activity Monitors Unit (AMU) extension in AArch64 Linux
> +=======================================================
> +
> +Author: Ionela Voinescu <ionela.voinescu@arm.com>
> +
> +Date: 2019-09-10
> +
> +This document briefly describes the provision of Activity Monitors Unit
> +support in AArch64 Linux.
> +
> +
> +Architecture overview
> +---------------------
> +
> +The activity monitors extension is an optional extension introduced by the
> +ARMv8.4 CPU architecture.
> +
> +The activity monitors unit, implemented in each CPU, provides performance
> +counters intended for system management use. The AMU extension provides a
> +system register interface to the counter registers and also supports an
> +optional external memory-mapped interface.
> +
> +Version 1 of the Activity Monitors architecture implements a counter group
> +of four fixed and architecturally defined 64-bit event counters.
> +  - CPU cycle counter: increments at the frequency of the CPU.
> +  - Constant counter: increments at the fixed frequency of the system
> +    clock.
> +  - Instructions retired: increments with every architecturally executed
> +    instruction.
> +  - Memory stall cycles: counts instruction dispatch stall cycles caused by
> +    misses in the last level cache within the clock domain.
> +
> +When in WFI or WFE these counters do not increment.
> +
> +The Activity Monitors architecture provides space for up to 16 architected
> +event counters. Future versions of the architecture may use this space to
> +implement additional architected event counters.
> +
> +Additionally, version 1 implements a counter group of up to 16 auxiliary
> +64-bit event counters.
> +
> +On cold reset all counters reset to 0.
> +
> +
> +Basic support
> +-------------
> +
> +The kernel can safely run a mix of CPUs with and without support for the
> +activity monitors extension.


  Therefore, when CONFIG_ARM64_AMU_EXTN is
> +selected we unconditionally enable the capability to allow any late CPU
> +(secondary or hotplugged) to detect and use the feature.
> +
> +When the feature is detected on a CPU, a per-CPU variable (amu_feat) is
> +set, but this does not guarantee the correct functionality of the
> +counters, only the presence of the extension.

nit: I would rather omit the implementation details (esp variable names)
in the documentation. It may become a pain to keep this in sync with the
code changes. You could simply mention, "we keep track of the 
availability of the feature" per CPU. If someone wants to figure out
how, they can always read the code.

> +
> +Firmware (code running at higher exception levels, e.g. arm-tf) support is
> +needed to:
> + - Enable access for lower exception levels (EL2 and EL1) to the AMU
> +   registers.
> + - Enable the counters. If not enabled these will read as 0.
> + - Save/restore the counters before/after the CPU is being put/brought up
> +   from the 'off' power state.
> +
> +When using kernels that have this configuration enabled but boot with
> +broken firmware the user may experience panics or lockups when accessing
> +the counter registers. Even if these symptoms are not observed, the
> +values returned by the register reads might not correctly reflect reality.
> +Most commonly, the counters will read as 0, indicating that they are not
> +enabled. If proper support is not provided in firmware it's best to disable
> +CONFIG_ARM64_AMU_EXTN.

For the sake of one kernel runs everywhere, do we need some other
mechanism to disable the AMU. e.g kernel parameter to disable amu
at runtime ?

> diff --git a/Documentation/arm64/booting.rst b/Documentation/arm64/booting.rst
> index 5d78a6f5b0ae..a3f1a47b6f1c 100644
> --- a/Documentation/arm64/booting.rst
> +++ b/Documentation/arm64/booting.rst
> @@ -248,6 +248,20 @@ Before jumping into the kernel, the following conditions must be met:
>       - HCR_EL2.APK (bit 40) must be initialised to 0b1
>       - HCR_EL2.API (bit 41) must be initialised to 0b1
>   
> +  For CPUs with Activity Monitors Unit v1 (AMUv1) extension present:
> +  - If EL3 is present:
> +    CPTR_EL3.TAM (bit 30) must be initialised to 0b0
> +    CPTR_EL2.TAM (bit 30) must be initialised to 0b0
> +    AMCNTENSET0_EL0 must be initialised to 0b1111
> +    AMCNTENSET1_EL0 must be initialised to a platform specific value
> +    having 0b1 set for the corresponding bit for each of the auxiliary
> +    counters present.
> +  - If the kernel is entered at EL1:
> +    AMCNTENSET0_EL0 must be initialised to 0b1111
> +    AMCNTENSET1_EL0 must be initialised to a platform specific value
> +    having 0b1 set for the corresponding bit for each of the auxiliary
> +    counters present.
> +
>   The requirements described above for CPU mode, caches, MMUs, architected
>   timers, coherency and system registers apply to all CPUs.  All CPUs must
>   enter the kernel in the same exception level.
> diff --git a/Documentation/arm64/cpu-feature-registers.rst b/Documentation/arm64/cpu-feature-registers.rst
> index b6e44884e3ad..4770ae54032b 100644
> --- a/Documentation/arm64/cpu-feature-registers.rst
> +++ b/Documentation/arm64/cpu-feature-registers.rst
> @@ -150,6 +150,8 @@ infrastructure:
>        +------------------------------+---------+---------+
>        | DIT                          | [51-48] |    y    |
>        +------------------------------+---------+---------+
> +     | AMU                          | [47-44] |    n    |
> +     +------------------------------+---------+---------+

As mentioned above, please drop it.


Suzuki

^ permalink raw reply	[flat|nested] 40+ messages in thread

* Re: [PATCH v2 5/6] TEMP: sched: add interface for counter-based frequency invariance
  2020-01-29 19:37   ` Peter Zijlstra
@ 2020-01-30 15:33     ` Ionela Voinescu
  0 siblings, 0 replies; 40+ messages in thread
From: Ionela Voinescu @ 2020-01-30 15:33 UTC (permalink / raw)
  To: Peter Zijlstra
  Cc: catalin.marinas, will, mark.rutland, maz, suzuki.poulose,
	sudeep.holla, dietmar.eggemann, mingo, ggherdovich,
	vincent.guittot, linux-arm-kernel, linux-doc, linux-kernel,
	Juri Lelli

On Wednesday 29 Jan 2020 at 20:37:41 (+0100), Peter Zijlstra wrote:
> On Wed, Dec 18, 2019 at 06:26:06PM +0000, Ionela Voinescu wrote:
> > To be noted that this patch is a temporary one. It introduces the
> > interface added by the patches at [1] to allow update of the frequency
> > invariance scale factor based on counters. If [1] is merged there is
> > not need for this patch.
> > 
> > For platforms that support counters (x86 - APERF/MPERF, arm64 - AMU
> > counters) the frequency invariance correction factor can be obtained
> > using a core counter and a fixed counter to get information on the
> > performance (frequency based only) obtained in a period of time. This
> > will more accurately reflect the actual current frequency of the CPU,
> > compared with the alternative implementation that reflects the request
> > of a performance level from the OS through the cpufreq framework
> > (arch_set_freq_scale).
> > 
> > Therefore, introduce an interface - arch_scale_freq_tick, to be
> > implemented by each architecture and called for each CPU on the tick
> > to update the scale factor based on the delta in the counter values,
> > if counter support is present on the CPU.
> > 
> > Either because reading counters is expensive or because reading
> > counters from remote CPUs is not possible or is expensive, only
> > update the counter based frequency scale factor on the tick for
> > now. A tick based update will definitely be necessary either due to
> > it being the only point of update for certain architectures or in
> > order to cache the counter values for a particular CPU, if a
> > further update from that CPU is not possible.
> > 
> > [1]
> > https://lore.kernel.org/lkml/20191113124654.18122-1-ggherdovich@suse.cz/
> 
> FWIW, those patches just landed in tip/sched/core

Thanks, Peter, I'll drop this one next time around.

Ionela.

^ permalink raw reply	[flat|nested] 40+ messages in thread

* Re: [PATCH v2 6/6] arm64: use activity monitors for frequency invariance
  2020-01-29 23:39     ` Valentin Schneider
@ 2020-01-30 15:49       ` Ionela Voinescu
  2020-01-30 16:11         ` Valentin Schneider
  0 siblings, 1 reply; 40+ messages in thread
From: Ionela Voinescu @ 2020-01-30 15:49 UTC (permalink / raw)
  To: Valentin Schneider
  Cc: catalin.marinas, will, mark.rutland, maz, suzuki.poulose,
	sudeep.holla, dietmar.eggemann, peterz, mingo, ggherdovich,
	vincent.guittot, linux-arm-kernel, linux-doc, linux-kernel

Hi Valentin,

On Wednesday 29 Jan 2020 at 23:39:11 (+0000), Valentin Schneider wrote:
> On 29/01/2020 17:13, Valentin Schneider wrote:
> > I had a brief look at the Arm ARM, for the arch timer it says it is
> > "typically in the range 1-50MHz", but then also gives an example with 20KHz
> > in a low-power mode.
> > 
> > If we take say 5GHz max CPU frequency, our lower bound for the arch timer
> > (with that SCHED_CAPACITY_SCALE² trick) is about ~4.768KHz. It's not *too*
> > far from that 20KHz, but I'm not sure we would actually be executing stuff
> > in that low-power mode.
> > 
> 
> I mixed up a few things in there; that low-power mode is supposed to do
> higher increments, so it would emulate a similar frequency as the non-low-power
> mode. Thus the actual frequency matters less than what is reported in CNTFRQ
> (though we hope to get the behaviour we're told we should see), so we should
> be quite safe from that ~5KHz value. Still, to make it obvious, I don't think
> something like this would hurt:
> 
> ---
> diff --git a/drivers/clocksource/arm_arch_timer.c b/drivers/clocksource/arm_arch_timer.c
> index 9a5464c625b45..a72832093575a 100644
> --- a/drivers/clocksource/arm_arch_timer.c
> +++ b/drivers/clocksource/arm_arch_timer.c
> @@ -885,6 +885,17 @@ static int arch_timer_starting_cpu(unsigned int cpu)
>  	return 0;
>  }
>  
> +static int validate_timer_rate(void)
> +{
> +	if (!arch_timer_rate)
> +		return 1;
> +
> +	/* Arch timer frequency < 1MHz is shady */
> +	WARN_ON(arch_timer_rate < 1000000);
> +
> +	return 0;
> +}
> +
>  /*
>   * For historical reasons, when probing with DT we use whichever (non-zero)
>   * rate was probed first, and don't verify that others match. If the first node
> @@ -900,7 +911,7 @@ static void arch_timer_of_configure_rate(u32 rate, struct device_node *np)
>  		arch_timer_rate = rate;
>  
>  	/* Check the timer frequency. */
> -	if (arch_timer_rate == 0)
> +	if (validate_timer_rate())
>  		pr_warn("frequency not available\n");
>  }
>  
> @@ -1594,7 +1605,7 @@ static int __init arch_timer_acpi_init(struct acpi_table_header *table)
>  	 * CNTFRQ value. This *must* be correct.
>  	 */
>  	arch_timer_rate = arch_timer_get_cntfrq();
> -	if (!arch_timer_rate) {
> +	if (validate_timer_rate()) {
>  		pr_err(FW_BUG "frequency not available.\n");
>  		return -EINVAL;
>  	}
> ---
> 

Okay, I'll add this as a separate patch to the series and put you as
author. That is if you want me to tie this check to this usecase that
proves its usefulness. Otherwise it can stand on its own as well if
you want to submit it separately.

In regards to the ratio computation for frequency invariance where this
plays a role, I'll do a check and bail out if the ratio is 0, which I'm
ashamed to not have added before :).

Thanks,
Ionela.


> > Long story short, we're probably fine, but it would nice to shove some of
> > the above into comments (especially the SCHED_CAPACITY_SCALE² trick)
> > 

^ permalink raw reply	[flat|nested] 40+ messages in thread

* Re: [PATCH v2 6/6] arm64: use activity monitors for frequency invariance
  2020-01-30 15:49       ` Ionela Voinescu
@ 2020-01-30 16:11         ` Valentin Schneider
  0 siblings, 0 replies; 40+ messages in thread
From: Valentin Schneider @ 2020-01-30 16:11 UTC (permalink / raw)
  To: Ionela Voinescu
  Cc: catalin.marinas, will, mark.rutland, maz, suzuki.poulose,
	sudeep.holla, dietmar.eggemann, peterz, mingo, ggherdovich,
	vincent.guittot, linux-arm-kernel, linux-doc, linux-kernel

On 30/01/2020 15:49, Ionela Voinescu wrote:
> Okay, I'll add this as a separate patch to the series and put you as
> author. That is if you want me to tie this check to this usecase that
> proves its usefulness. Otherwise it can stand on its own as well if
> you want to submit it separately.
> 

It's pretty much standalone, but it does make sense to bundle it with this
series, I think. Feel free to grab ownership (I didn't test it) ;)

> In regards to the ratio computation for frequency invariance where this
> plays a role, I'll do a check and bail out if the ratio is 0, which I'm
> ashamed to not have added before :).

That does sound like something we very much want to have.

> 
> Thanks,
> Ionela.
> 
> 
>>> Long story short, we're probably fine, but it would nice to shove some of
>>> the above into comments (especially the SCHED_CAPACITY_SCALE² trick)
>>>

^ permalink raw reply	[flat|nested] 40+ messages in thread

* Re: [PATCH v2 4/6] Documentation: arm64: document support for the AMU extension
  2020-01-30 15:04   ` Suzuki Kuruppassery Poulose
@ 2020-01-30 16:45     ` Ionela Voinescu
  2020-01-30 18:26       ` Suzuki K Poulose
  0 siblings, 1 reply; 40+ messages in thread
From: Ionela Voinescu @ 2020-01-30 16:45 UTC (permalink / raw)
  To: Suzuki Kuruppassery Poulose
  Cc: catalin.marinas, will, mark.rutland, maz, sudeep.holla,
	dietmar.eggemann, peterz, mingo, ggherdovich, vincent.guittot,
	linux-arm-kernel, linux-doc, linux-kernel, Jonathan Corbet

Hi Suzuki,

On Thursday 30 Jan 2020 at 15:04:27 (+0000), Suzuki Kuruppassery Poulose wrote:
> Hi Ionela,
> 
> On 18/12/2019 18:26, Ionela Voinescu wrote:
> > The activity monitors extension is an optional extension introduced
> > by the ARMv8.4 CPU architecture.
> > 
> > Add initial documentation for the AMUv1 extension:
> >   - arm64/amu.txt: AMUv1 documentation
> >   - arm64/booting.txt: system registers initialisation
> >   - arm64/cpu-feature-registers.txt: visibility to userspace
> 
> We have stopped adding "invisible" fields to the list. So, you
> can drop the changes to cpu-feature-registers.txt.
> 
> > 
> > Signed-off-by: Ionela Voinescu <ionela.voinescu@arm.com>
> > Cc: Catalin Marinas <catalin.marinas@arm.com>
> > Cc: Will Deacon <will@kernel.org>
> > Cc: Jonathan Corbet <corbet@lwn.net>
> > ---
> >   Documentation/arm64/amu.rst                   | 107 ++++++++++++++++++
> >   Documentation/arm64/booting.rst               |  14 +++
> >   Documentation/arm64/cpu-feature-registers.rst |   2 +
> >   Documentation/arm64/index.rst                 |   1 +
> >   4 files changed, 124 insertions(+)
> >   create mode 100644 Documentation/arm64/amu.rst
> > 
> > diff --git a/Documentation/arm64/amu.rst b/Documentation/arm64/amu.rst
> > new file mode 100644
> > index 000000000000..62a6635522e1
> > --- /dev/null
> > +++ b/Documentation/arm64/amu.rst
> > @@ -0,0 +1,107 @@
> > +=======================================================
> > +Activity Monitors Unit (AMU) extension in AArch64 Linux
> > +=======================================================
> > +
> > +Author: Ionela Voinescu <ionela.voinescu@arm.com>
> > +
> > +Date: 2019-09-10
> > +
> > +This document briefly describes the provision of Activity Monitors Unit
> > +support in AArch64 Linux.
> > +
> > +
> > +Architecture overview
> > +---------------------
> > +
> > +The activity monitors extension is an optional extension introduced by the
> > +ARMv8.4 CPU architecture.
> > +
> > +The activity monitors unit, implemented in each CPU, provides performance
> > +counters intended for system management use. The AMU extension provides a
> > +system register interface to the counter registers and also supports an
> > +optional external memory-mapped interface.
> > +
> > +Version 1 of the Activity Monitors architecture implements a counter group
> > +of four fixed and architecturally defined 64-bit event counters.
> > +  - CPU cycle counter: increments at the frequency of the CPU.
> > +  - Constant counter: increments at the fixed frequency of the system
> > +    clock.
> > +  - Instructions retired: increments with every architecturally executed
> > +    instruction.
> > +  - Memory stall cycles: counts instruction dispatch stall cycles caused by
> > +    misses in the last level cache within the clock domain.
> > +
> > +When in WFI or WFE these counters do not increment.
> > +
> > +The Activity Monitors architecture provides space for up to 16 architected
> > +event counters. Future versions of the architecture may use this space to
> > +implement additional architected event counters.
> > +
> > +Additionally, version 1 implements a counter group of up to 16 auxiliary
> > +64-bit event counters.
> > +
> > +On cold reset all counters reset to 0.
> > +
> > +
> > +Basic support
> > +-------------
> > +
> > +The kernel can safely run a mix of CPUs with and without support for the
> > +activity monitors extension.
> 
> 
>  Therefore, when CONFIG_ARM64_AMU_EXTN is
> > +selected we unconditionally enable the capability to allow any late CPU
> > +(secondary or hotplugged) to detect and use the feature.
> > +
> > +When the feature is detected on a CPU, a per-CPU variable (amu_feat) is
> > +set, but this does not guarantee the correct functionality of the
> > +counters, only the presence of the extension.
> 
> nit: I would rather omit the implementation details (esp variable names)
> in the documentation. It may become a pain to keep this in sync with the
> code changes. You could simply mention, "we keep track of the availability
> of the feature" per CPU. If someone wants to figure out
> how, they can always read the code.
> 
> > +
> > +Firmware (code running at higher exception levels, e.g. arm-tf) support is
> > +needed to:
> > + - Enable access for lower exception levels (EL2 and EL1) to the AMU
> > +   registers.
> > + - Enable the counters. If not enabled these will read as 0.
> > + - Save/restore the counters before/after the CPU is being put/brought up
> > +   from the 'off' power state.
> > +
> > +When using kernels that have this configuration enabled but boot with
> > +broken firmware the user may experience panics or lockups when accessing
> > +the counter registers. Even if these symptoms are not observed, the
> > +values returned by the register reads might not correctly reflect reality.
> > +Most commonly, the counters will read as 0, indicating that they are not
> > +enabled. If proper support is not provided in firmware it's best to disable
> > +CONFIG_ARM64_AMU_EXTN.
> 
> For the sake of one kernel runs everywhere, do we need some other
> mechanism to disable the AMU. e.g kernel parameter to disable amu
> at runtime ?
>

The reason I've not added this is twofold:
 - Even if we add this, it should be in order to disable the use of the
   counters for a certain purpose, in this case  frequency invariance.
   On its own AMU provides the counters but it does not mandate their
   use.
 - I could add something to disable the use of the core and cycle
   counters for frequency invariance at runtime, but I doubt that
   anyone would use it. Logically it makes sense to use the counters
   order to have a more accurate view of the performance that the CPUs
   are actually providing. Therefore, until anyone asks for this, I
   thought it's better to keep it simple and not add extra switches,
   until there is a use for them.

Does it make sense?

P.S. I'll make all the other changes you've suggested in v3. 

Thank you,
Ionela.



> > diff --git a/Documentation/arm64/booting.rst b/Documentation/arm64/booting.rst
> > index 5d78a6f5b0ae..a3f1a47b6f1c 100644
> > --- a/Documentation/arm64/booting.rst
> > +++ b/Documentation/arm64/booting.rst
> > @@ -248,6 +248,20 @@ Before jumping into the kernel, the following conditions must be met:
> >       - HCR_EL2.APK (bit 40) must be initialised to 0b1
> >       - HCR_EL2.API (bit 41) must be initialised to 0b1
> > +  For CPUs with Activity Monitors Unit v1 (AMUv1) extension present:
> > +  - If EL3 is present:
> > +    CPTR_EL3.TAM (bit 30) must be initialised to 0b0
> > +    CPTR_EL2.TAM (bit 30) must be initialised to 0b0
> > +    AMCNTENSET0_EL0 must be initialised to 0b1111
> > +    AMCNTENSET1_EL0 must be initialised to a platform specific value
> > +    having 0b1 set for the corresponding bit for each of the auxiliary
> > +    counters present.
> > +  - If the kernel is entered at EL1:
> > +    AMCNTENSET0_EL0 must be initialised to 0b1111
> > +    AMCNTENSET1_EL0 must be initialised to a platform specific value
> > +    having 0b1 set for the corresponding bit for each of the auxiliary
> > +    counters present.
> > +
> >   The requirements described above for CPU mode, caches, MMUs, architected
> >   timers, coherency and system registers apply to all CPUs.  All CPUs must
> >   enter the kernel in the same exception level.
> > diff --git a/Documentation/arm64/cpu-feature-registers.rst b/Documentation/arm64/cpu-feature-registers.rst
> > index b6e44884e3ad..4770ae54032b 100644
> > --- a/Documentation/arm64/cpu-feature-registers.rst
> > +++ b/Documentation/arm64/cpu-feature-registers.rst
> > @@ -150,6 +150,8 @@ infrastructure:
> >        +------------------------------+---------+---------+
> >        | DIT                          | [51-48] |    y    |
> >        +------------------------------+---------+---------+
> > +     | AMU                          | [47-44] |    n    |
> > +     +------------------------------+---------+---------+
> 
> As mentioned above, please drop it.
> 
> 
> Suzuki

^ permalink raw reply	[flat|nested] 40+ messages in thread

* Re: [PATCH v2 4/6] Documentation: arm64: document support for the AMU extension
  2020-01-30 16:45     ` Ionela Voinescu
@ 2020-01-30 18:26       ` Suzuki K Poulose
  2020-01-31  9:54         ` Ionela Voinescu
  0 siblings, 1 reply; 40+ messages in thread
From: Suzuki K Poulose @ 2020-01-30 18:26 UTC (permalink / raw)
  To: Ionela Voinescu
  Cc: catalin.marinas, will, mark.rutland, maz, sudeep.holla,
	dietmar.eggemann, peterz, mingo, ggherdovich, vincent.guittot,
	linux-arm-kernel, linux-doc, linux-kernel, Jonathan Corbet

Hi Ionela,

On Thu, Jan 30, 2020 at 04:45:42PM +0000, Ionela Voinescu wrote:
> Hi Suzuki,
> 
> On Thursday 30 Jan 2020 at 15:04:27 (+0000), Suzuki Kuruppassery Poulose wrote:
> > Hi Ionela,
> > 
> > On 18/12/2019 18:26, Ionela Voinescu wrote:
> > > The activity monitors extension is an optional extension introduced
> > > by the ARMv8.4 CPU architecture.
> > > 
> > > Add initial documentation for the AMUv1 extension:
> > >   - arm64/amu.txt: AMUv1 documentation
> > >   - arm64/booting.txt: system registers initialisation
> > >   - arm64/cpu-feature-registers.txt: visibility to userspace
> > 
> > We have stopped adding "invisible" fields to the list. So, you
> > can drop the changes to cpu-feature-registers.txt.
> > 
> > > 
> > > Signed-off-by: Ionela Voinescu <ionela.voinescu@arm.com>
> > > Cc: Catalin Marinas <catalin.marinas@arm.com>
> > > Cc: Will Deacon <will@kernel.org>
> > > Cc: Jonathan Corbet <corbet@lwn.net>
> > > ---
> > >   Documentation/arm64/amu.rst                   | 107 ++++++++++++++++++
> > >   Documentation/arm64/booting.rst               |  14 +++
> > >   Documentation/arm64/cpu-feature-registers.rst |   2 +
> > >   Documentation/arm64/index.rst                 |   1 +
> > >   4 files changed, 124 insertions(+)
> > >   create mode 100644 Documentation/arm64/amu.rst
> > > 
> > > diff --git a/Documentation/arm64/amu.rst b/Documentation/arm64/amu.rst
> > > new file mode 100644
> > > index 000000000000..62a6635522e1
> > > --- /dev/null
> > > +++ b/Documentation/arm64/amu.rst
> > > @@ -0,0 +1,107 @@
> > > +-------------
> > > +
> > > +The kernel can safely run a mix of CPUs with and without support for the
> > > +activity monitors extension.
> > 
> > 
> >  Therefore, when CONFIG_ARM64_AMU_EXTN is
> > > +selected we unconditionally enable the capability to allow any late CPU
> > > +(secondary or hotplugged) to detect and use the feature.
> > > +
> > > +When the feature is detected on a CPU, a per-CPU variable (amu_feat) is
> > > +set, but this does not guarantee the correct functionality of the
> > > +counters, only the presence of the extension.
> > 
> > nit: I would rather omit the implementation details (esp variable names)
> > in the documentation. It may become a pain to keep this in sync with the
> > code changes. You could simply mention, "we keep track of the availability
> > of the feature" per CPU. If someone wants to figure out
> > how, they can always read the code.
> > 
> > > +
> > > +Firmware (code running at higher exception levels, e.g. arm-tf) support is
> > > +needed to:
> > > + - Enable access for lower exception levels (EL2 and EL1) to the AMU
> > > +   registers.
> > > + - Enable the counters. If not enabled these will read as 0.
> > > + - Save/restore the counters before/after the CPU is being put/brought up
> > > +   from the 'off' power state.
> > > +
> > > +When using kernels that have this configuration enabled but boot with
> > > +broken firmware the user may experience panics or lockups when accessing
> > > +the counter registers. Even if these symptoms are not observed, the
> > > +values returned by the register reads might not correctly reflect reality.
> > > +Most commonly, the counters will read as 0, indicating that they are not
> > > +enabled. If proper support is not provided in firmware it's best to disable
> > > +CONFIG_ARM64_AMU_EXTN.
> > 
> > For the sake of one kernel runs everywhere, do we need some other
> > mechanism to disable the AMU. e.g kernel parameter to disable amu
> > at runtime ?
> >
> 
> The reason I've not added this is twofold:
>  - Even if we add this, it should be in order to disable the use of the
>    counters for a certain purpose, in this case  frequency invariance.
>    On its own AMU provides the counters but it does not mandate their
>    use.
>  - I could add something to disable the use of the core and cycle
>    counters for frequency invariance at runtime, but I doubt that
>    anyone would use it. Logically it makes sense to use the counters
>    order to have a more accurate view of the performance that the CPUs
>    are actually providing. Therefore, until anyone asks for this, I
>    thought it's better to keep it simple and not add extra switches,
>    until there is a use for them.
> 
> Does it make sense?

The comment is about addressing someone who must run an "AMU" enabled
kernel ("one kernel") on a system with potentially "broken firmware",
where there is no option to use the system as you mention above,
the firmware could panic. How common is the "broken firmware" ?
Right now there is no way to ensure "firmware" is sane and if
someone detects that firmware is broken, there is no way to
disable the AMU if they are running a standard distro kernel.
A kernel parameter could prevent the AMU capability from
being detected on a broken system and thus make it usable
(without the AMU of course). Now, if the "broken firmware"
is extremely rare, we could simply ignore this case and
ignore the suggestion.

Suzuki



> 
> P.S. I'll make all the other changes you've suggested in v3. 
> 
> Thank you,
> Ionela.
> 
> 
> 
> > > diff --git a/Documentation/arm64/booting.rst b/Documentation/arm64/booting.rst
> > > index 5d78a6f5b0ae..a3f1a47b6f1c 100644
> > > --- a/Documentation/arm64/booting.rst
> > > +++ b/Documentation/arm64/booting.rst
> > > @@ -248,6 +248,20 @@ Before jumping into the kernel, the following conditions must be met:
> > >       - HCR_EL2.APK (bit 40) must be initialised to 0b1
> > >       - HCR_EL2.API (bit 41) must be initialised to 0b1
> > > +  For CPUs with Activity Monitors Unit v1 (AMUv1) extension present:
> > > +  - If EL3 is present:
> > > +    CPTR_EL3.TAM (bit 30) must be initialised to 0b0
> > > +    CPTR_EL2.TAM (bit 30) must be initialised to 0b0
> > > +    AMCNTENSET0_EL0 must be initialised to 0b1111
> > > +    AMCNTENSET1_EL0 must be initialised to a platform specific value
> > > +    having 0b1 set for the corresponding bit for each of the auxiliary
> > > +    counters present.
> > > +  - If the kernel is entered at EL1:
> > > +    AMCNTENSET0_EL0 must be initialised to 0b1111
> > > +    AMCNTENSET1_EL0 must be initialised to a platform specific value
> > > +    having 0b1 set for the corresponding bit for each of the auxiliary
> > > +    counters present.
> > > +
> > >   The requirements described above for CPU mode, caches, MMUs, architected
> > >   timers, coherency and system registers apply to all CPUs.  All CPUs must
> > >   enter the kernel in the same exception level.
> > > diff --git a/Documentation/arm64/cpu-feature-registers.rst b/Documentation/arm64/cpu-feature-registers.rst
> > > index b6e44884e3ad..4770ae54032b 100644
> > > --- a/Documentation/arm64/cpu-feature-registers.rst
> > > +++ b/Documentation/arm64/cpu-feature-registers.rst
> > > @@ -150,6 +150,8 @@ infrastructure:
> > >        +------------------------------+---------+---------+
> > >        | DIT                          | [51-48] |    y    |
> > >        +------------------------------+---------+---------+
> > > +     | AMU                          | [47-44] |    n    |
> > > +     +------------------------------+---------+---------+
> > 
> > As mentioned above, please drop it.
> > 
> > 
> > Suzuki

^ permalink raw reply	[flat|nested] 40+ messages in thread

* Re: [PATCH v2 4/6] Documentation: arm64: document support for the AMU extension
  2020-01-30 18:26       ` Suzuki K Poulose
@ 2020-01-31  9:54         ` Ionela Voinescu
  0 siblings, 0 replies; 40+ messages in thread
From: Ionela Voinescu @ 2020-01-31  9:54 UTC (permalink / raw)
  To: Suzuki K Poulose
  Cc: catalin.marinas, will, mark.rutland, maz, sudeep.holla,
	dietmar.eggemann, peterz, mingo, ggherdovich, vincent.guittot,
	linux-arm-kernel, linux-doc, linux-kernel, Jonathan Corbet

On Thursday 30 Jan 2020 at 18:26:53 (+0000), Suzuki K Poulose wrote:
[..]
> > > > +Firmware (code running at higher exception levels, e.g. arm-tf) support is
> > > > +needed to:
> > > > + - Enable access for lower exception levels (EL2 and EL1) to the AMU
> > > > +   registers.
> > > > + - Enable the counters. If not enabled these will read as 0.
> > > > + - Save/restore the counters before/after the CPU is being put/brought up
> > > > +   from the 'off' power state.
> > > > +
> > > > +When using kernels that have this configuration enabled but boot with
> > > > +broken firmware the user may experience panics or lockups when accessing
> > > > +the counter registers. Even if these symptoms are not observed, the
> > > > +values returned by the register reads might not correctly reflect reality.
> > > > +Most commonly, the counters will read as 0, indicating that they are not
> > > > +enabled. If proper support is not provided in firmware it's best to disable
> > > > +CONFIG_ARM64_AMU_EXTN.
> > > 
> > > For the sake of one kernel runs everywhere, do we need some other
> > > mechanism to disable the AMU. e.g kernel parameter to disable amu
> > > at runtime ?
> > >
> > 
> > The reason I've not added this is twofold:
> >  - Even if we add this, it should be in order to disable the use of the
> >    counters for a certain purpose, in this case  frequency invariance.
> >    On its own AMU provides the counters but it does not mandate their
> >    use.
> >  - I could add something to disable the use of the core and cycle
> >    counters for frequency invariance at runtime, but I doubt that
> >    anyone would use it. Logically it makes sense to use the counters
> >    order to have a more accurate view of the performance that the CPUs
> >    are actually providing. Therefore, until anyone asks for this, I
> >    thought it's better to keep it simple and not add extra switches,
> >    until there is a use for them.
> > 
> > Does it make sense?
> 
> The comment is about addressing someone who must run an "AMU" enabled
> kernel ("one kernel") on a system with potentially "broken firmware",
> where there is no option to use the system as you mention above,
> the firmware could panic. How common is the "broken firmware" ?
> Right now there is no way to ensure "firmware" is sane and if
> someone detects that firmware is broken, there is no way to
> disable the AMU if they are running a standard distro kernel.
> A kernel parameter could prevent the AMU capability from
> being detected on a broken system and thus make it usable
> (without the AMU of course). Now, if the "broken firmware"
> is extremely rare, we could simply ignore this case and
> ignore the suggestion.
> 
> Suzuki
> 
>

Sorry Suzuki, I initially interpreted the question independently from
the context and only thought about cases where they are working
correctly but users might want to disable the use of them.

In this case, I don't see any harm in adding a command line parameter
to disable the use of the unit, even if it's only to support firmware
that does not support AMU at all, rather than the implementation being
broken.

I'm not really sure how common bad firmware would be. I suppose that
firmware as bad as to cause firmware panics and lockups would be quite
rare, but scenarios where firmware might not properly support AMU and
result in kernel lockups could be more often, and this would handle
both.

Thank you,
Ionela.

^ permalink raw reply	[flat|nested] 40+ messages in thread

end of thread, other threads:[~2020-01-31  9:54 UTC | newest]

Thread overview: 40+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2019-12-18 18:26 [PATCH v2 0/6] arm64: ARMv8.4 Activity Monitors support Ionela Voinescu
2019-12-18 18:26 ` [PATCH v2 1/6] arm64: add support for the AMU extension v1 Ionela Voinescu
2020-01-23 17:04   ` Valentin Schneider
2020-01-23 18:32     ` Ionela Voinescu
2020-01-24 12:00       ` Valentin Schneider
2020-01-28 11:00         ` Ionela Voinescu
2020-01-28 16:34   ` Suzuki Kuruppassery Poulose
2020-01-29 16:42     ` Ionela Voinescu
2019-12-18 18:26 ` [PATCH v2 2/6] arm64: trap to EL1 accesses to AMU counters from EL0 Ionela Voinescu
2020-01-23 17:04   ` Valentin Schneider
2020-01-23 17:34     ` Ionela Voinescu
2019-12-18 18:26 ` [PATCH v2 3/6] arm64/kvm: disable access to AMU registers from kvm guests Ionela Voinescu
2020-01-27 15:33   ` Valentin Schneider
2020-01-28 15:48     ` Ionela Voinescu
2020-01-28 17:26     ` Suzuki Kuruppassery Poulose
2020-01-28 17:37       ` Valentin Schneider
2020-01-28 17:52         ` Ionela Voinescu
2019-12-18 18:26 ` [PATCH v2 4/6] Documentation: arm64: document support for the AMU extension Ionela Voinescu
2020-01-27 16:47   ` Valentin Schneider
2020-01-28 16:53     ` Ionela Voinescu
2020-01-28 18:36       ` Valentin Schneider
2020-01-30 15:04   ` Suzuki Kuruppassery Poulose
2020-01-30 16:45     ` Ionela Voinescu
2020-01-30 18:26       ` Suzuki K Poulose
2020-01-31  9:54         ` Ionela Voinescu
2019-12-18 18:26 ` [PATCH v2 5/6] TEMP: sched: add interface for counter-based frequency invariance Ionela Voinescu
2020-01-29 19:37   ` Peter Zijlstra
2020-01-30 15:33     ` Ionela Voinescu
2019-12-18 18:26 ` [PATCH v2 6/6] arm64: use activity monitors for " Ionela Voinescu
2020-01-23 11:49   ` Lukasz Luba
2020-01-23 17:07     ` Ionela Voinescu
2020-01-24  1:19       ` Lukasz Luba
2020-01-24 13:12         ` Ionela Voinescu
2020-01-24 15:17           ` Lukasz Luba
2020-01-28 17:36             ` Ionela Voinescu
2020-01-29 17:13   ` Valentin Schneider
2020-01-29 17:52     ` Ionela Voinescu
2020-01-29 23:39     ` Valentin Schneider
2020-01-30 15:49       ` Ionela Voinescu
2020-01-30 16:11         ` Valentin Schneider

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).