linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [PATCH v2 0/9] arm64: Work around for mismatched cache line size
@ 2016-08-26  9:23 Suzuki K Poulose
  2016-08-26  9:23 ` [PATCH v2 1/9] arm64: Set the safe value for L1 icache policy Suzuki K Poulose
                   ` (8 more replies)
  0 siblings, 9 replies; 17+ messages in thread
From: Suzuki K Poulose @ 2016-08-26  9:23 UTC (permalink / raw)
  To: will.deacon, catalin.marinas
  Cc: linux-arm-kernel, linux-kernel, marc.zyngier, mark.rutland,
	ard.biesheuvel, Suzuki K Poulose

This series adds a work around for systems with mismatched {I,D}-cache
line sizes. When a thread of execution gets migrated to a different CPU,
the cache line size it had cached could be larger than that of the new
CPU. This could cause data corruption issues. We work around this by

 - Dynamically patching the kernel to use the smallest line size on the
   system (from the CPU feature infrastructure)
 - Trapping the userspace access to CTR_EL0 (by clearing SCTLR_EL1.UCT) and
   emulating it with the system wide safe value of CTR.

The series also adds support for alternative code patching of adrp
instructions by adjusting the PC-relative address offset to reflect
the new PC.

The series has been tested on Juno with a hack to forced enabling
of the capability.

Applies on v4.8-rc3. The tree is avaiable at :
  git://linux-arm.org/linux-skp.git ctr-v2

Changes since V1:

 - Replace adr_adrp insn helper with seperate helpers for adr and adrp.
 - Add/use align_down() macro for adjusting the page address for adrp offsets.
 - Add comments for existing ISS field defintions.
 - Added a patch to disallow silent patching of unhandled pc relative
   instructions in alternative code patching.

Suzuki K Poulose (9):
  arm64: Set the safe value for L1 icache policy
  arm64: Use consistent naming for errata handling
  arm64: Rearrange CPU errata workaround checks
  arm64: alternative: Disallow patching instructions using literals
  arm64: insn: Add helpers for adrp offsets
  arm64: alternative: Add support for patching adrp instructions
  arm64: Introduce raw_{d,i}cache_line_size
  arm64: Refactor sysinstr exception handling
  arm64: Work around systems with mismatched cache line sizes

 arch/arm64/include/asm/assembler.h  | 45 +++++++++++++++++--
 arch/arm64/include/asm/cpufeature.h | 14 +++---
 arch/arm64/include/asm/esr.h        | 84 +++++++++++++++++++++++++++++++----
 arch/arm64/include/asm/insn.h       | 11 ++++-
 arch/arm64/include/asm/sysreg.h     |  1 +
 arch/arm64/kernel/alternative.c     | 21 +++++++++
 arch/arm64/kernel/asm-offsets.c     |  2 +
 arch/arm64/kernel/cpu_errata.c      | 26 ++++++++++-
 arch/arm64/kernel/cpufeature.c      | 44 ++++++++++++++-----
 arch/arm64/kernel/cpuinfo.c         |  2 -
 arch/arm64/kernel/hibernate-asm.S   |  2 +-
 arch/arm64/kernel/insn.c            | 13 ++++++
 arch/arm64/kernel/relocate_kernel.S |  2 +-
 arch/arm64/kernel/smp.c             |  8 +++-
 arch/arm64/kernel/traps.c           | 87 ++++++++++++++++++++++++++-----------
 15 files changed, 297 insertions(+), 65 deletions(-)

-- 
2.7.4

^ permalink raw reply	[flat|nested] 17+ messages in thread

* [PATCH v2 1/9] arm64: Set the safe value for L1 icache policy
  2016-08-26  9:23 [PATCH v2 0/9] arm64: Work around for mismatched cache line size Suzuki K Poulose
@ 2016-08-26  9:23 ` Suzuki K Poulose
  2016-08-26  9:23 ` [PATCH v2 2/9] arm64: Use consistent naming for errata handling Suzuki K Poulose
                   ` (7 subsequent siblings)
  8 siblings, 0 replies; 17+ messages in thread
From: Suzuki K Poulose @ 2016-08-26  9:23 UTC (permalink / raw)
  To: will.deacon, catalin.marinas
  Cc: linux-arm-kernel, linux-kernel, marc.zyngier, mark.rutland,
	ard.biesheuvel, Suzuki K Poulose

Right now we use 0 as the safe value for CTR_EL0:L1Ip, which is
not defined at the moment. The safer value for the L1Ip should be
the weakest of the policies, which happens to be AIVIVT. While at it,
fix the comment about safe_val.

Cc: Mark Rutland <mark.rutland@arm.com>
Cc: Catalin Marinas <catalin.marinas@arm.com>
Cc: Will Deacon <will.deacon@arm.com>
Signed-off-by: Suzuki K Poulose <suzuki.poulose@arm.com>
---
 arch/arm64/include/asm/cpufeature.h | 2 +-
 arch/arm64/kernel/cpufeature.c      | 5 +++--
 2 files changed, 4 insertions(+), 3 deletions(-)

diff --git a/arch/arm64/include/asm/cpufeature.h b/arch/arm64/include/asm/cpufeature.h
index 7099f26..f6f5e49 100644
--- a/arch/arm64/include/asm/cpufeature.h
+++ b/arch/arm64/include/asm/cpufeature.h
@@ -63,7 +63,7 @@ struct arm64_ftr_bits {
 	enum ftr_type	type;
 	u8		shift;
 	u8		width;
-	s64		safe_val; /* safe value for discrete features */
+	s64		safe_val; /* safe value for FTR_EXACT features */
 };
 
 /*
diff --git a/arch/arm64/kernel/cpufeature.c b/arch/arm64/kernel/cpufeature.c
index 62272ea..43f73e0 100644
--- a/arch/arm64/kernel/cpufeature.c
+++ b/arch/arm64/kernel/cpufeature.c
@@ -147,9 +147,10 @@ static struct arm64_ftr_bits ftr_ctr[] = {
 	ARM64_FTR_BITS(FTR_STRICT, FTR_LOWER_SAFE, 16, 4, 1),	/* DminLine */
 	/*
 	 * Linux can handle differing I-cache policies. Userspace JITs will
-	 * make use of *minLine
+	 * make use of *minLine.
+	 * If we have differing I-cache policies, report it as the weakest - AIVIVT.
 	 */
-	ARM64_FTR_BITS(FTR_NONSTRICT, FTR_EXACT, 14, 2, 0),	/* L1Ip */
+	ARM64_FTR_BITS(FTR_NONSTRICT, FTR_EXACT, 14, 2, ICACHE_POLICY_AIVIVT),	/* L1Ip */
 	ARM64_FTR_BITS(FTR_STRICT, FTR_EXACT, 4, 10, 0),	/* RAZ */
 	ARM64_FTR_BITS(FTR_STRICT, FTR_LOWER_SAFE, 0, 4, 0),	/* IminLine */
 	ARM64_FTR_END,
-- 
2.7.4

^ permalink raw reply related	[flat|nested] 17+ messages in thread

* [PATCH v2 2/9] arm64: Use consistent naming for errata handling
  2016-08-26  9:23 [PATCH v2 0/9] arm64: Work around for mismatched cache line size Suzuki K Poulose
  2016-08-26  9:23 ` [PATCH v2 1/9] arm64: Set the safe value for L1 icache policy Suzuki K Poulose
@ 2016-08-26  9:23 ` Suzuki K Poulose
  2016-08-26  9:23 ` [PATCH v2 3/9] arm64: Rearrange CPU errata workaround checks Suzuki K Poulose
                   ` (6 subsequent siblings)
  8 siblings, 0 replies; 17+ messages in thread
From: Suzuki K Poulose @ 2016-08-26  9:23 UTC (permalink / raw)
  To: will.deacon, catalin.marinas
  Cc: linux-arm-kernel, linux-kernel, marc.zyngier, mark.rutland,
	ard.biesheuvel, Suzuki K Poulose

This is a cosmetic change to rename the functions dealing with
the errata work arounds to be more consistent with their naming.

1) check_local_cpu_errata() => update_cpu_errata_workarounds()
check_local_cpu_errata() actually updates the system's errata work
arounds. So rename it to reflect the same.

2) verify_local_cpu_errata() => verify_local_cpu_errata_workarounds()
Use errata_workarounds instead of _errata.

Signed-off-by: Suzuki K Poulose <suzuki.poulose@arm.com>
---
 arch/arm64/include/asm/cpufeature.h | 4 ++--
 arch/arm64/kernel/cpu_errata.c      | 4 ++--
 arch/arm64/kernel/cpufeature.c      | 2 +-
 arch/arm64/kernel/cpuinfo.c         | 2 +-
 4 files changed, 6 insertions(+), 6 deletions(-)

diff --git a/arch/arm64/include/asm/cpufeature.h b/arch/arm64/include/asm/cpufeature.h
index f6f5e49..aadd946 100644
--- a/arch/arm64/include/asm/cpufeature.h
+++ b/arch/arm64/include/asm/cpufeature.h
@@ -193,10 +193,10 @@ void __init setup_cpu_features(void);
 void update_cpu_capabilities(const struct arm64_cpu_capabilities *caps,
 			    const char *info);
 void enable_cpu_capabilities(const struct arm64_cpu_capabilities *caps);
-void check_local_cpu_errata(void);
+void update_cpu_errata_workarounds(void);
 void __init enable_errata_workarounds(void);
 
-void verify_local_cpu_errata(void);
+void verify_local_cpu_errata_workarounds(void);
 void verify_local_cpu_capabilities(void);
 
 u64 read_system_reg(u32 id);
diff --git a/arch/arm64/kernel/cpu_errata.c b/arch/arm64/kernel/cpu_errata.c
index 82b0fc2..5836b3d 100644
--- a/arch/arm64/kernel/cpu_errata.c
+++ b/arch/arm64/kernel/cpu_errata.c
@@ -116,7 +116,7 @@ const struct arm64_cpu_capabilities arm64_errata[] = {
  * and the related information is freed soon after. If the new CPU requires
  * an errata not detected at boot, fail this CPU.
  */
-void verify_local_cpu_errata(void)
+void verify_local_cpu_errata_workarounds(void)
 {
 	const struct arm64_cpu_capabilities *caps = arm64_errata;
 
@@ -131,7 +131,7 @@ void verify_local_cpu_errata(void)
 		}
 }
 
-void check_local_cpu_errata(void)
+void update_cpu_errata_workarounds(void)
 {
 	update_cpu_capabilities(arm64_errata, "enabling workaround for");
 }
diff --git a/arch/arm64/kernel/cpufeature.c b/arch/arm64/kernel/cpufeature.c
index 43f73e0..a591c35 100644
--- a/arch/arm64/kernel/cpufeature.c
+++ b/arch/arm64/kernel/cpufeature.c
@@ -1017,7 +1017,7 @@ void verify_local_cpu_capabilities(void)
 	if (!sys_caps_initialised)
 		return;
 
-	verify_local_cpu_errata();
+	verify_local_cpu_errata_workarounds();
 	verify_local_cpu_features(arm64_features);
 	verify_local_elf_hwcaps(arm64_elf_hwcaps);
 	if (system_supports_32bit_el0())
diff --git a/arch/arm64/kernel/cpuinfo.c b/arch/arm64/kernel/cpuinfo.c
index ed1b84f..4fa7b73 100644
--- a/arch/arm64/kernel/cpuinfo.c
+++ b/arch/arm64/kernel/cpuinfo.c
@@ -364,7 +364,7 @@ static void __cpuinfo_store_cpu(struct cpuinfo_arm64 *info)
 
 	cpuinfo_detect_icache_policy(info);
 
-	check_local_cpu_errata();
+	update_cpu_errata_workarounds();
 }
 
 void cpuinfo_store_cpu(void)
-- 
2.7.4

^ permalink raw reply related	[flat|nested] 17+ messages in thread

* [PATCH v2 3/9] arm64: Rearrange CPU errata workaround checks
  2016-08-26  9:23 [PATCH v2 0/9] arm64: Work around for mismatched cache line size Suzuki K Poulose
  2016-08-26  9:23 ` [PATCH v2 1/9] arm64: Set the safe value for L1 icache policy Suzuki K Poulose
  2016-08-26  9:23 ` [PATCH v2 2/9] arm64: Use consistent naming for errata handling Suzuki K Poulose
@ 2016-08-26  9:23 ` Suzuki K Poulose
  2016-08-26  9:23 ` [PATCH v2 4/9] arm64: alternative: Disallow patching instructions using literals Suzuki K Poulose
                   ` (5 subsequent siblings)
  8 siblings, 0 replies; 17+ messages in thread
From: Suzuki K Poulose @ 2016-08-26  9:23 UTC (permalink / raw)
  To: will.deacon, catalin.marinas
  Cc: linux-arm-kernel, linux-kernel, marc.zyngier, mark.rutland,
	ard.biesheuvel, Suzuki K Poulose, Andre Przywara

Right now we run through the work around checks on a CPU
from __cpuinfo_store_cpu. There are some problems with that:

1) We initialise the system wide CPU feature registers only after the
Boot CPU updates its cpuinfo. Now, if a work around depends on the
variance of a CPU ID feature (e.g, check for Cache Line size mismatch),
we have no way of performing it cleanly for the boot CPU.

2) It is out of place, invoked from __cpuinfo_store_cpu() in cpuinfo.c. It
is not an obvious place for that.

This patch rearranges the CPU specific capability(aka work around) checks.

1) At the moment we use verify_local_cpu_capabilities() to check if a new
CPU has all the system advertised features. Use this for the secondary CPUs
to perform the work around check. For that we rename
  verify_local_cpu_capabilities() => check_local_cpu_capabilities()
which:

   If the system wide capabilities haven't been initialised (i.e, the CPU
   is activated at the boot), update the system wide detected work arounds.

   Otherwise (i.e a CPU hotplugged in later) verify that this CPU conforms to the
   system wide capabilities.

2) Boot CPU updates the work arounds from smp_prepare_boot_cpu() after we have
initialised the system wide CPU feature values.

Cc: Mark Rutland <mark.rutland@arm.com>
Cc: Andre Przywara <andre.przywara@arm.com>
Cc: Will Deacon <will.deacon@arm.com>
Cc: Catalin Marinas <catalin.marinas@arm.com>
Signed-off-by: Suzuki K Poulose <suzuki.poulose@arm.com>
---
 arch/arm64/include/asm/cpufeature.h |  4 ++--
 arch/arm64/kernel/cpufeature.c      | 30 ++++++++++++++++++++----------
 arch/arm64/kernel/cpuinfo.c         |  2 --
 arch/arm64/kernel/smp.c             |  8 +++++++-
 4 files changed, 29 insertions(+), 15 deletions(-)

diff --git a/arch/arm64/include/asm/cpufeature.h b/arch/arm64/include/asm/cpufeature.h
index aadd946..692b8d3 100644
--- a/arch/arm64/include/asm/cpufeature.h
+++ b/arch/arm64/include/asm/cpufeature.h
@@ -193,11 +193,11 @@ void __init setup_cpu_features(void);
 void update_cpu_capabilities(const struct arm64_cpu_capabilities *caps,
 			    const char *info);
 void enable_cpu_capabilities(const struct arm64_cpu_capabilities *caps);
+void check_local_cpu_capabilities(void);
+
 void update_cpu_errata_workarounds(void);
 void __init enable_errata_workarounds(void);
-
 void verify_local_cpu_errata_workarounds(void);
-void verify_local_cpu_capabilities(void);
 
 u64 read_system_reg(u32 id);
 
diff --git a/arch/arm64/kernel/cpufeature.c b/arch/arm64/kernel/cpufeature.c
index a591c35..fcf87ca 100644
--- a/arch/arm64/kernel/cpufeature.c
+++ b/arch/arm64/kernel/cpufeature.c
@@ -1005,23 +1005,33 @@ verify_local_cpu_features(const struct arm64_cpu_capabilities *caps)
  * cannot do anything to fix it up and could cause unexpected failures. So
  * we park the CPU.
  */
-void verify_local_cpu_capabilities(void)
+static void verify_local_cpu_capabilities(void)
 {
+	verify_local_cpu_errata_workarounds();
+	verify_local_cpu_features(arm64_features);
+	verify_local_elf_hwcaps(arm64_elf_hwcaps);
+	if (system_supports_32bit_el0())
+		verify_local_elf_hwcaps(compat_elf_hwcaps);
+}
 
+void check_local_cpu_capabilities(void)
+{
+	/*
+	 * All secondary CPUs should conform to the early CPU features
+	 * in use by the kernel based on boot CPU.
+	 */
 	check_early_cpu_features();
 
 	/*
-	 * If we haven't computed the system capabilities, there is nothing
-	 * to verify.
+	 * If we haven't finalised the system capabilities, this CPU gets
+	 * a chance to update the errata work arounds.
+	 * Otherwise, this CPU should verify that it has all the system
+	 * advertised capabilities.
 	 */
 	if (!sys_caps_initialised)
-		return;
-
-	verify_local_cpu_errata_workarounds();
-	verify_local_cpu_features(arm64_features);
-	verify_local_elf_hwcaps(arm64_elf_hwcaps);
-	if (system_supports_32bit_el0())
-		verify_local_elf_hwcaps(compat_elf_hwcaps);
+		update_cpu_errata_workarounds();
+	else
+		verify_local_cpu_capabilities();
 }
 
 static void __init setup_feature_capabilities(void)
diff --git a/arch/arm64/kernel/cpuinfo.c b/arch/arm64/kernel/cpuinfo.c
index 4fa7b73..b3d5b3e 100644
--- a/arch/arm64/kernel/cpuinfo.c
+++ b/arch/arm64/kernel/cpuinfo.c
@@ -363,8 +363,6 @@ static void __cpuinfo_store_cpu(struct cpuinfo_arm64 *info)
 	}
 
 	cpuinfo_detect_icache_policy(info);
-
-	update_cpu_errata_workarounds();
 }
 
 void cpuinfo_store_cpu(void)
diff --git a/arch/arm64/kernel/smp.c b/arch/arm64/kernel/smp.c
index d93d433..99d8cc3 100644
--- a/arch/arm64/kernel/smp.c
+++ b/arch/arm64/kernel/smp.c
@@ -239,7 +239,7 @@ asmlinkage void secondary_start_kernel(void)
 	 * this CPU ticks all of those. If it doesn't, the CPU will
 	 * fail to come online.
 	 */
-	verify_local_cpu_capabilities();
+	check_local_cpu_capabilities();
 
 	if (cpu_ops[cpu]->cpu_postboot)
 		cpu_ops[cpu]->cpu_postboot();
@@ -439,6 +439,12 @@ void __init smp_prepare_boot_cpu(void)
 	set_my_cpu_offset(per_cpu_offset(smp_processor_id()));
 	cpuinfo_store_boot_cpu();
 	save_boot_cpu_run_el();
+	/*
+	 * Run the errata work around checks on the boot CPU, once we have
+	 * initialised the cpu feature infrastructure from
+	 * cpuinfo_store_boot_cpu() above.
+	 */
+	update_cpu_errata_workarounds();
 }
 
 static u64 __init of_get_cpu_mpidr(struct device_node *dn)
-- 
2.7.4

^ permalink raw reply related	[flat|nested] 17+ messages in thread

* [PATCH v2 4/9] arm64: alternative: Disallow patching instructions using literals
  2016-08-26  9:23 [PATCH v2 0/9] arm64: Work around for mismatched cache line size Suzuki K Poulose
                   ` (2 preceding siblings ...)
  2016-08-26  9:23 ` [PATCH v2 3/9] arm64: Rearrange CPU errata workaround checks Suzuki K Poulose
@ 2016-08-26  9:23 ` Suzuki K Poulose
  2016-08-26  9:23 ` [PATCH v2 5/9] arm64: insn: Add helpers for adrp offsets Suzuki K Poulose
                   ` (4 subsequent siblings)
  8 siblings, 0 replies; 17+ messages in thread
From: Suzuki K Poulose @ 2016-08-26  9:23 UTC (permalink / raw)
  To: will.deacon, catalin.marinas
  Cc: linux-arm-kernel, linux-kernel, marc.zyngier, mark.rutland,
	ard.biesheuvel, Suzuki K Poulose, Andre Przywara

The alternative code patching doesn't check if the replaced instruction
uses a pc relative literal. This could cause silent corruption in the
instruction stream as the instruction will be executed from a different
address than what it was compiled for. Catch all such cases.

Cc: Marc Zyngier <marc.zyngier@arm.com>
Cc: Andre Przywara <andre.przywara@arm.com>
Cc: Mark Rutland <mark.rutland@arm.com>
Cc: Catalin Marinas <catalin.marinas@arm.com>
Suggested-by: Will Deacon <will.deacon@arm.com>
Signed-off-by: Suzuki K Poulose <suzuki.poulose@arm.com>
---
 arch/arm64/kernel/alternative.c | 6 ++++++
 1 file changed, 6 insertions(+)

diff --git a/arch/arm64/kernel/alternative.c b/arch/arm64/kernel/alternative.c
index d2ee1b2..6b269a9 100644
--- a/arch/arm64/kernel/alternative.c
+++ b/arch/arm64/kernel/alternative.c
@@ -80,6 +80,12 @@ static u32 get_alt_insn(struct alt_instr *alt, u32 *insnptr, u32 *altinsnptr)
 			offset = target - (unsigned long)insnptr;
 			insn = aarch64_set_branch_offset(insn, offset);
 		}
+	} else if (aarch64_insn_uses_literal(insn)) {
+		/*
+		 * Disallow patching unhandled instructions using PC relative
+		 * literal addresses
+		 */
+		BUG();
 	}
 
 	return insn;
-- 
2.7.4

^ permalink raw reply related	[flat|nested] 17+ messages in thread

* [PATCH v2 5/9] arm64: insn: Add helpers for adrp offsets
  2016-08-26  9:23 [PATCH v2 0/9] arm64: Work around for mismatched cache line size Suzuki K Poulose
                   ` (3 preceding siblings ...)
  2016-08-26  9:23 ` [PATCH v2 4/9] arm64: alternative: Disallow patching instructions using literals Suzuki K Poulose
@ 2016-08-26  9:23 ` Suzuki K Poulose
  2016-08-26  9:23 ` [PATCH v2 6/9] arm64: alternative: Add support for patching adrp instructions Suzuki K Poulose
                   ` (3 subsequent siblings)
  8 siblings, 0 replies; 17+ messages in thread
From: Suzuki K Poulose @ 2016-08-26  9:23 UTC (permalink / raw)
  To: will.deacon, catalin.marinas
  Cc: linux-arm-kernel, linux-kernel, marc.zyngier, mark.rutland,
	ard.biesheuvel, Suzuki K Poulose

Adds helpers for decoding/encoding the PC relative addresses for adrp.
This will be used for handling dynamic patching of 'adrp' instructions
in alternative code patching.

Cc: Mark Rutland <mark.rutland@arm.com>
Cc: Will Deacon <will.deacon@arm.com>
Cc: Catalin Marinas <catalin.marinas@arm.com>
Signed-off-by: Suzuki K Poulose <suzuki.poulose@arm.com>
---
Changes since V1:
 - Replace adr_adrp with seperate handlers for adr and adrp (Marc Zyngier)
---
 arch/arm64/include/asm/insn.h | 11 ++++++++++-
 arch/arm64/kernel/insn.c      | 13 +++++++++++++
 2 files changed, 23 insertions(+), 1 deletion(-)

diff --git a/arch/arm64/include/asm/insn.h b/arch/arm64/include/asm/insn.h
index 1dbaa90..bc85366 100644
--- a/arch/arm64/include/asm/insn.h
+++ b/arch/arm64/include/asm/insn.h
@@ -246,7 +246,8 @@ static __always_inline bool aarch64_insn_is_##abbr(u32 code) \
 static __always_inline u32 aarch64_insn_get_##abbr##_value(void) \
 { return (val); }
 
-__AARCH64_INSN_FUNCS(adr_adrp,	0x1F000000, 0x10000000)
+__AARCH64_INSN_FUNCS(adr,	0x9F000000, 0x10000000)
+__AARCH64_INSN_FUNCS(adrp,	0x9F000000, 0x90000000)
 __AARCH64_INSN_FUNCS(prfm_lit,	0xFF000000, 0xD8000000)
 __AARCH64_INSN_FUNCS(str_reg,	0x3FE0EC00, 0x38206800)
 __AARCH64_INSN_FUNCS(ldr_reg,	0x3FE0EC00, 0x38606800)
@@ -318,6 +319,11 @@ __AARCH64_INSN_FUNCS(msr_reg,	0xFFF00000, 0xD5100000)
 bool aarch64_insn_is_nop(u32 insn);
 bool aarch64_insn_is_branch_imm(u32 insn);
 
+static inline bool aarch64_insn_is_adr_adrp(u32 insn)
+{
+	return aarch64_insn_is_adr(insn) || aarch64_insn_is_adrp(insn);
+}
+
 int aarch64_insn_read(void *addr, u32 *insnp);
 int aarch64_insn_write(void *addr, u32 insn);
 enum aarch64_insn_encoding_class aarch64_get_insn_class(u32 insn);
@@ -398,6 +404,9 @@ int aarch64_insn_patch_text_nosync(void *addr, u32 insn);
 int aarch64_insn_patch_text_sync(void *addrs[], u32 insns[], int cnt);
 int aarch64_insn_patch_text(void *addrs[], u32 insns[], int cnt);
 
+s32 aarch64_insn_adrp_get_offset(u32 insn);
+u32 aarch64_insn_adrp_set_offset(u32 insn, s32 offset);
+
 bool aarch32_insn_is_wide(u32 insn);
 
 #define A32_RN_OFFSET	16
diff --git a/arch/arm64/kernel/insn.c b/arch/arm64/kernel/insn.c
index 63f9432..f022af4 100644
--- a/arch/arm64/kernel/insn.c
+++ b/arch/arm64/kernel/insn.c
@@ -1202,6 +1202,19 @@ u32 aarch64_set_branch_offset(u32 insn, s32 offset)
 	BUG();
 }
 
+s32 aarch64_insn_adrp_get_offset(u32 insn)
+{
+	BUG_ON(!aarch64_insn_is_adrp(insn));
+	return aarch64_insn_decode_immediate(AARCH64_INSN_IMM_ADR, insn) << 12;
+}
+
+u32 aarch64_insn_adrp_set_offset(u32 insn, s32 offset)
+{
+	BUG_ON(!aarch64_insn_is_adrp(insn));
+	return aarch64_insn_encode_immediate(AARCH64_INSN_IMM_ADR, insn,
+						offset >> 12);
+}
+
 /*
  * Extract the Op/CR data from a msr/mrs instruction.
  */
-- 
2.7.4

^ permalink raw reply related	[flat|nested] 17+ messages in thread

* [PATCH v2 6/9] arm64: alternative: Add support for patching adrp instructions
  2016-08-26  9:23 [PATCH v2 0/9] arm64: Work around for mismatched cache line size Suzuki K Poulose
                   ` (4 preceding siblings ...)
  2016-08-26  9:23 ` [PATCH v2 5/9] arm64: insn: Add helpers for adrp offsets Suzuki K Poulose
@ 2016-08-26  9:23 ` Suzuki K Poulose
  2016-08-26  9:23 ` [PATCH v2 7/9] arm64: Introduce raw_{d,i}cache_line_size Suzuki K Poulose
                   ` (2 subsequent siblings)
  8 siblings, 0 replies; 17+ messages in thread
From: Suzuki K Poulose @ 2016-08-26  9:23 UTC (permalink / raw)
  To: will.deacon, catalin.marinas
  Cc: linux-arm-kernel, linux-kernel, marc.zyngier, mark.rutland,
	ard.biesheuvel, Suzuki K Poulose, Andre Przywara

adrp uses PC-relative address offset to a page (of 4K size) of
a symbol. If it appears in an alternative code patched in, we
should adjust the offset to reflect the address where it will
be run from. This patch adds support for fixing the offset
for adrp instructions.

Cc: Will Deacon <will.deacon@arm.com>
Cc: Marc Zyngier <marc.zyngier@arm.com>
Cc: Andre Przywara <andre.przywara@arm.com>
Cc: Mark Rutland <mark.rutland@arm.com>
Signed-off-by: Suzuki K Poulose <suzuki.poulose@arm.com>

---
Changes since V1:
 - Add align_down macro. Couldn't find the best place to add it.
   Didn't want to add this to uapi/ headers where the kernel's generic
   ALIGN helpers are really defined. For the time being left it here.
---
 arch/arm64/kernel/alternative.c | 15 +++++++++++++++
 1 file changed, 15 insertions(+)

diff --git a/arch/arm64/kernel/alternative.c b/arch/arm64/kernel/alternative.c
index 6b269a9..d681498 100644
--- a/arch/arm64/kernel/alternative.c
+++ b/arch/arm64/kernel/alternative.c
@@ -59,6 +59,8 @@ static bool branch_insn_requires_update(struct alt_instr *alt, unsigned long pc)
 	BUG();
 }
 
+#define align_down(x, a)	((unsigned long)(x) & ~(((unsigned long)(a)) - 1))
+
 static u32 get_alt_insn(struct alt_instr *alt, u32 *insnptr, u32 *altinsnptr)
 {
 	u32 insn;
@@ -80,6 +82,19 @@ static u32 get_alt_insn(struct alt_instr *alt, u32 *insnptr, u32 *altinsnptr)
 			offset = target - (unsigned long)insnptr;
 			insn = aarch64_set_branch_offset(insn, offset);
 		}
+	} else if (aarch64_insn_is_adrp(insn)) {
+		s32 orig_offset, new_offset;
+		unsigned long target;
+
+		/*
+		 * If we're replacing an adrp instruction, which uses PC-relative
+		 * immediate addressing, adjust the offset to reflect the new
+		 * PC. adrp operates on 4K aligned addresses.
+		 */
+		orig_offset  = aarch64_insn_adrp_get_offset(insn);
+		target = align_down(altinsnptr, SZ_4K) + orig_offset;
+		new_offset = target - align_down(insnptr, SZ_4K);
+		insn = aarch64_insn_adrp_set_offset(insn, new_offset);
 	} else if (aarch64_insn_uses_literal(insn)) {
 		/*
 		 * Disallow patching unhandled instructions using PC relative
-- 
2.7.4

^ permalink raw reply related	[flat|nested] 17+ messages in thread

* [PATCH v2 7/9] arm64: Introduce raw_{d,i}cache_line_size
  2016-08-26  9:23 [PATCH v2 0/9] arm64: Work around for mismatched cache line size Suzuki K Poulose
                   ` (5 preceding siblings ...)
  2016-08-26  9:23 ` [PATCH v2 6/9] arm64: alternative: Add support for patching adrp instructions Suzuki K Poulose
@ 2016-08-26  9:23 ` Suzuki K Poulose
  2016-08-26  9:23 ` [PATCH v2 8/9] arm64: Refactor sysinstr exception handling Suzuki K Poulose
  2016-08-26  9:23 ` [PATCH v2 9/9] arm64: Work around systems with mismatched cache line sizes Suzuki K Poulose
  8 siblings, 0 replies; 17+ messages in thread
From: Suzuki K Poulose @ 2016-08-26  9:23 UTC (permalink / raw)
  To: will.deacon, catalin.marinas
  Cc: linux-arm-kernel, linux-kernel, marc.zyngier, mark.rutland,
	ard.biesheuvel, Suzuki K Poulose

On systems with mismatched i/d cache min line sizes, we need to use
the smallest size possible across all CPUs. This will be done by fetching
the system wide safe value from CPU feature infrastructure.
However the some special users(e.g kexec, hibernate) would need the line
size on the CPU (rather than the system wide), when either the system
wide feature may not be accessible or it is guranteed that the caller
executes with a gurantee of no migration.
Provide another helper which will fetch cache line size on the current CPU.

Acked-by: James Morse <james.morse@arm.com>
Reviewed-by: Geoff Levand <geoff@infradead.org>
Signed-off-by: Suzuki K Poulose <suzuki.poulose@arm.com>
---
 arch/arm64/include/asm/assembler.h  | 24 ++++++++++++++++++++----
 arch/arm64/kernel/hibernate-asm.S   |  2 +-
 arch/arm64/kernel/relocate_kernel.S |  2 +-
 3 files changed, 22 insertions(+), 6 deletions(-)

diff --git a/arch/arm64/include/asm/assembler.h b/arch/arm64/include/asm/assembler.h
index d5025c6..a4bb3f5 100644
--- a/arch/arm64/include/asm/assembler.h
+++ b/arch/arm64/include/asm/assembler.h
@@ -218,9 +218,10 @@ lr	.req	x30		// link register
 	.endm
 
 /*
- * dcache_line_size - get the minimum D-cache line size from the CTR register.
+ * raw_dcache_line_size - get the minimum D-cache line size on this CPU
+ * from the CTR register.
  */
-	.macro	dcache_line_size, reg, tmp
+	.macro	raw_dcache_line_size, reg, tmp
 	mrs	\tmp, ctr_el0			// read CTR
 	ubfm	\tmp, \tmp, #16, #19		// cache line size encoding
 	mov	\reg, #4			// bytes per word
@@ -228,9 +229,17 @@ lr	.req	x30		// link register
 	.endm
 
 /*
- * icache_line_size - get the minimum I-cache line size from the CTR register.
+ * dcache_line_size - get the safe D-cache line size across all CPUs
  */
-	.macro	icache_line_size, reg, tmp
+	.macro	dcache_line_size, reg, tmp
+	raw_dcache_line_size	\reg, \tmp
+	.endm
+
+/*
+ * raw_icache_line_size - get the minimum I-cache line size on this CPU
+ * from the CTR register.
+ */
+	.macro	raw_icache_line_size, reg, tmp
 	mrs	\tmp, ctr_el0			// read CTR
 	and	\tmp, \tmp, #0xf		// cache line size encoding
 	mov	\reg, #4			// bytes per word
@@ -238,6 +247,13 @@ lr	.req	x30		// link register
 	.endm
 
 /*
+ * icache_line_size - get the safe I-cache line size across all CPUs
+ */
+	.macro	icache_line_size, reg, tmp
+	raw_icache_line_size	\reg, \tmp
+	.endm
+
+/*
  * tcr_set_idmap_t0sz - update TCR.T0SZ so that we can load the ID map
  */
 	.macro	tcr_set_idmap_t0sz, valreg, tmpreg
diff --git a/arch/arm64/kernel/hibernate-asm.S b/arch/arm64/kernel/hibernate-asm.S
index 46f29b6..4ebc6a1 100644
--- a/arch/arm64/kernel/hibernate-asm.S
+++ b/arch/arm64/kernel/hibernate-asm.S
@@ -96,7 +96,7 @@ ENTRY(swsusp_arch_suspend_exit)
 
 	add	x1, x10, #PAGE_SIZE
 	/* Clean the copied page to PoU - based on flush_icache_range() */
-	dcache_line_size x2, x3
+	raw_dcache_line_size x2, x3
 	sub	x3, x2, #1
 	bic	x4, x10, x3
 2:	dc	cvau, x4	/* clean D line / unified line */
diff --git a/arch/arm64/kernel/relocate_kernel.S b/arch/arm64/kernel/relocate_kernel.S
index 51b73cd..ce704a4 100644
--- a/arch/arm64/kernel/relocate_kernel.S
+++ b/arch/arm64/kernel/relocate_kernel.S
@@ -34,7 +34,7 @@ ENTRY(arm64_relocate_new_kernel)
 	/* Setup the list loop variables. */
 	mov	x17, x1				/* x17 = kimage_start */
 	mov	x16, x0				/* x16 = kimage_head */
-	dcache_line_size x15, x0		/* x15 = dcache line size */
+	raw_dcache_line_size x15, x0		/* x15 = dcache line size */
 	mov	x14, xzr			/* x14 = entry ptr */
 	mov	x13, xzr			/* x13 = copy dest */
 
-- 
2.7.4

^ permalink raw reply related	[flat|nested] 17+ messages in thread

* [PATCH v2 8/9] arm64: Refactor sysinstr exception handling
  2016-08-26  9:23 [PATCH v2 0/9] arm64: Work around for mismatched cache line size Suzuki K Poulose
                   ` (6 preceding siblings ...)
  2016-08-26  9:23 ` [PATCH v2 7/9] arm64: Introduce raw_{d,i}cache_line_size Suzuki K Poulose
@ 2016-08-26  9:23 ` Suzuki K Poulose
  2016-08-26  9:23 ` [PATCH v2 9/9] arm64: Work around systems with mismatched cache line sizes Suzuki K Poulose
  8 siblings, 0 replies; 17+ messages in thread
From: Suzuki K Poulose @ 2016-08-26  9:23 UTC (permalink / raw)
  To: will.deacon, catalin.marinas
  Cc: linux-arm-kernel, linux-kernel, marc.zyngier, mark.rutland,
	ard.biesheuvel, Suzuki K Poulose, Andre Przywara

Right now we trap some of the user space data cache operations
based on a few Errata (ARM 819472, 826319, 827319 and 824069).
We need to trap userspace access to CTR_EL0, if we detect mismatched
cache line size. Since both these traps share the EC, refactor
the handler a little bit to make it a bit more reader friendly.

Cc: Andre Przywara <andre.przywara@arm.com>
Cc: Mark Rutland <mark.rutland@arm.com>
Cc: Will Deacon <will.deacon@arm.com>
Cc: Catalin Marinas <catalin.marinas@arm.com>
Signed-off-by: Suzuki K Poulose <suzuki.poulose@arm.com>

---
Changes since V1:
 - Add comments for iss field definitions for other exceptions.
---
 arch/arm64/include/asm/esr.h | 76 ++++++++++++++++++++++++++++++++++++++------
 arch/arm64/kernel/traps.c    | 73 +++++++++++++++++++++++++++---------------
 2 files changed, 114 insertions(+), 35 deletions(-)

diff --git a/arch/arm64/include/asm/esr.h b/arch/arm64/include/asm/esr.h
index f772e15..9875b32 100644
--- a/arch/arm64/include/asm/esr.h
+++ b/arch/arm64/include/asm/esr.h
@@ -78,6 +78,23 @@
 
 #define ESR_ELx_IL		(UL(1) << 25)
 #define ESR_ELx_ISS_MASK	(ESR_ELx_IL - 1)
+
+/* ISS field definitions shared by different classes */
+#define ESR_ELx_WNR		(UL(1) << 6)
+
+/* Shared ISS field definitions for Data/Instruction aborts */
+#define ESR_ELx_EA		(UL(1) << 9)
+#define ESR_ELx_S1PTW		(UL(1) << 7)
+
+/* Shared ISS fault status code(IFSC/DFSC) for Data/Instruction aborts */
+#define ESR_ELx_FSC		(0x3F)
+#define ESR_ELx_FSC_TYPE	(0x3C)
+#define ESR_ELx_FSC_EXTABT	(0x10)
+#define ESR_ELx_FSC_ACCESS	(0x08)
+#define ESR_ELx_FSC_FAULT	(0x04)
+#define ESR_ELx_FSC_PERM	(0x0C)
+
+/* ISS field definitions for Data Aborts */
 #define ESR_ELx_ISV		(UL(1) << 24)
 #define ESR_ELx_SAS_SHIFT	(22)
 #define ESR_ELx_SAS		(UL(3) << ESR_ELx_SAS_SHIFT)
@@ -86,16 +103,9 @@
 #define ESR_ELx_SRT_MASK	(UL(0x1F) << ESR_ELx_SRT_SHIFT)
 #define ESR_ELx_SF 		(UL(1) << 15)
 #define ESR_ELx_AR 		(UL(1) << 14)
-#define ESR_ELx_EA 		(UL(1) << 9)
 #define ESR_ELx_CM 		(UL(1) << 8)
-#define ESR_ELx_S1PTW 		(UL(1) << 7)
-#define ESR_ELx_WNR		(UL(1) << 6)
-#define ESR_ELx_FSC		(0x3F)
-#define ESR_ELx_FSC_TYPE	(0x3C)
-#define ESR_ELx_FSC_EXTABT	(0x10)
-#define ESR_ELx_FSC_ACCESS	(0x08)
-#define ESR_ELx_FSC_FAULT	(0x04)
-#define ESR_ELx_FSC_PERM	(0x0C)
+
+/* ISS field definitions for exceptions taken in to Hyp */
 #define ESR_ELx_CV		(UL(1) << 24)
 #define ESR_ELx_COND_SHIFT	(20)
 #define ESR_ELx_COND_MASK	(UL(0xF) << ESR_ELx_COND_SHIFT)
@@ -109,6 +119,54 @@
 	((ESR_ELx_EC_BRK64 << ESR_ELx_EC_SHIFT) | ESR_ELx_IL |	\
 	 ((imm) & 0xffff))
 
+/* ISS field definitions for System instruction traps */
+#define ESR_ELx_SYS64_ISS_RES0_SHIFT	22
+#define ESR_ELx_SYS64_ISS_RES0_MASK	(UL(0x7) << ESR_ELx_SYS64_ISS_RES0_SHIFT)
+#define ESR_ELx_SYS64_ISS_DIR_MASK	0x1
+#define ESR_ELx_SYS64_ISS_DIR_READ	0x1
+#define ESR_ELx_SYS64_ISS_DIR_WRITE	0x0
+
+#define ESR_ELx_SYS64_ISS_RT_SHIFT	5
+#define ESR_ELx_SYS64_ISS_RT_MASK	(UL(0x1f) << ESR_ELx_SYS64_ISS_RT_SHIFT)
+#define ESR_ELx_SYS64_ISS_CRM_SHIFT	1
+#define ESR_ELx_SYS64_ISS_CRM_MASK	(UL(0xf) << ESR_ELx_SYS64_ISS_CRM_SHIFT)
+#define ESR_ELx_SYS64_ISS_CRN_SHIFT	10
+#define ESR_ELx_SYS64_ISS_CRN_MASK	(UL(0xf) << ESR_ELx_SYS64_ISS_CRN_SHIFT)
+#define ESR_ELx_SYS64_ISS_OP1_SHIFT	14
+#define ESR_ELx_SYS64_ISS_OP1_MASK	(UL(0x7) << ESR_ELx_SYS64_ISS_OP1_SHIFT)
+#define ESR_ELx_SYS64_ISS_OP2_SHIFT	17
+#define ESR_ELx_SYS64_ISS_OP2_MASK	(UL(0x7) << ESR_ELx_SYS64_ISS_OP2_SHIFT)
+#define ESR_ELx_SYS64_ISS_OP0_SHIFT	20
+#define ESR_ELx_SYS64_ISS_OP0_MASK	(UL(0x3) << ESR_ELx_SYS64_ISS_OP0_SHIFT)
+#define ESR_ELx_SYS64_ISS_SYS_MASK	(ESR_ELx_SYS64_ISS_OP0_MASK | \
+					 ESR_ELx_SYS64_ISS_OP1_MASK | \
+					 ESR_ELx_SYS64_ISS_OP2_MASK | \
+					 ESR_ELx_SYS64_ISS_CRN_MASK | \
+					 ESR_ELx_SYS64_ISS_CRM_MASK)
+#define ESR_ELx_SYS64_ISS_SYS_VAL(op0, op1, op2, crn, crm) \
+					(((op0) << ESR_ELx_SYS64_ISS_OP0_SHIFT) | \
+					 ((op1) << ESR_ELx_SYS64_ISS_OP1_SHIFT) | \
+					 ((op2) << ESR_ELx_SYS64_ISS_OP2_SHIFT) | \
+					 ((crn) << ESR_ELx_SYS64_ISS_CRN_SHIFT) | \
+					 ((crm) << ESR_ELx_SYS64_ISS_CRM_SHIFT))
+/*
+ * User space cache operations have the following sysreg encoding
+ * in System instructions.
+ * op0=1, op1=3, op2=1, crn=7, crm={ 5, 10, 11, 14 }, WRITE (L=0)
+ */
+#define ESR_ELx_SYS64_ISS_CRM_DC_CIVAC	14
+#define ESR_ELx_SYS64_ISS_CRM_DC_CVAU	11
+#define ESR_ELx_SYS64_ISS_CRM_DC_CVAC	10
+#define ESR_ELx_SYS64_ISS_CRM_IC_IVAU	5
+
+#define ESR_ELx_SYS64_ISS_EL0_CACHE_OP_MASK	(ESR_ELx_SYS64_ISS_OP0_MASK | \
+						 ESR_ELx_SYS64_ISS_OP1_MASK | \
+						 ESR_ELx_SYS64_ISS_OP2_MASK | \
+						 ESR_ELx_SYS64_ISS_CRN_MASK | \
+						 ESR_ELx_SYS64_ISS_DIR_MASK)
+#define ESR_ELx_SYS64_ISS_EL0_CACHE_OP_VAL \
+				(ESR_ELx_SYS64_ISS_SYS_VAL(1, 3, 1, 7, 0) | \
+				 ESR_ELx_SYS64_ISS_DIR_WRITE)
 #ifndef __ASSEMBLY__
 #include <asm/types.h>
 
diff --git a/arch/arm64/kernel/traps.c b/arch/arm64/kernel/traps.c
index e04f838..224f64e 100644
--- a/arch/arm64/kernel/traps.c
+++ b/arch/arm64/kernel/traps.c
@@ -447,36 +447,29 @@ void cpu_enable_cache_maint_trap(void *__unused)
 		: "=r" (res)					\
 		: "r" (address), "i" (-EFAULT) )
 
-asmlinkage void __exception do_sysinstr(unsigned int esr, struct pt_regs *regs)
+static void user_cache_maint_handler(unsigned int esr, struct pt_regs *regs)
 {
 	unsigned long address;
-	int ret;
+	int rt = (esr & ESR_ELx_SYS64_ISS_RT_MASK) >> ESR_ELx_SYS64_ISS_RT_SHIFT;
+	int crm = (esr & ESR_ELx_SYS64_ISS_CRM_MASK) >> ESR_ELx_SYS64_ISS_CRM_SHIFT;
+	int ret = 0;
 
-	/* if this is a write with: Op0=1, Op2=1, Op1=3, CRn=7 */
-	if ((esr & 0x01fffc01) == 0x0012dc00) {
-		int rt = (esr >> 5) & 0x1f;
-		int crm = (esr >> 1) & 0x0f;
+	address = (rt == 31) ? 0 : regs->regs[rt];
 
-		address = (rt == 31) ? 0 : regs->regs[rt];
-
-		switch (crm) {
-		case 11:		/* DC CVAU, gets promoted */
-			__user_cache_maint("dc civac", address, ret);
-			break;
-		case 10:		/* DC CVAC, gets promoted */
-			__user_cache_maint("dc civac", address, ret);
-			break;
-		case 14:		/* DC CIVAC */
-			__user_cache_maint("dc civac", address, ret);
-			break;
-		case 5:			/* IC IVAU */
-			__user_cache_maint("ic ivau", address, ret);
-			break;
-		default:
-			force_signal_inject(SIGILL, ILL_ILLOPC, regs, 0);
-			return;
-		}
-	} else {
+	switch (crm) {
+	case ESR_ELx_SYS64_ISS_CRM_DC_CVAU:	/* DC CVAU, gets promoted */
+		__user_cache_maint("dc civac", address, ret);
+		break;
+	case ESR_ELx_SYS64_ISS_CRM_DC_CVAC:	/* DC CVAC, gets promoted */
+		__user_cache_maint("dc civac", address, ret);
+		break;
+	case ESR_ELx_SYS64_ISS_CRM_DC_CIVAC:	/* DC CIVAC */
+		__user_cache_maint("dc civac", address, ret);
+		break;
+	case ESR_ELx_SYS64_ISS_CRM_IC_IVAU:	/* IC IVAU */
+		__user_cache_maint("ic ivau", address, ret);
+		break;
+	default:
 		force_signal_inject(SIGILL, ILL_ILLOPC, regs, 0);
 		return;
 	}
@@ -487,6 +480,34 @@ asmlinkage void __exception do_sysinstr(unsigned int esr, struct pt_regs *regs)
 		regs->pc += 4;
 }
 
+struct sys64_hook {
+	unsigned int esr_mask;
+	unsigned int esr_val;
+	void (*handler)(unsigned int esr, struct pt_regs *regs);
+};
+
+static struct sys64_hook sys64_hooks[] = {
+	{
+		.esr_mask = ESR_ELx_SYS64_ISS_EL0_CACHE_OP_MASK,
+		.esr_val = ESR_ELx_SYS64_ISS_EL0_CACHE_OP_VAL,
+		.handler = user_cache_maint_handler,
+	},
+	{},
+};
+
+asmlinkage void __exception do_sysinstr(unsigned int esr, struct pt_regs *regs)
+{
+	struct sys64_hook *hook;
+
+	for (hook = sys64_hooks; hook->handler; hook++)
+		if ((hook->esr_mask & esr) == hook->esr_val) {
+			hook->handler(esr, regs);
+			return;
+		}
+
+	force_signal_inject(SIGILL, ILL_ILLOPC, regs, 0);
+}
+
 long compat_arm_syscall(struct pt_regs *regs);
 
 asmlinkage long do_ni_syscall(struct pt_regs *regs)
-- 
2.7.4

^ permalink raw reply related	[flat|nested] 17+ messages in thread

* [PATCH v2 9/9] arm64: Work around systems with mismatched cache line sizes
  2016-08-26  9:23 [PATCH v2 0/9] arm64: Work around for mismatched cache line size Suzuki K Poulose
                   ` (7 preceding siblings ...)
  2016-08-26  9:23 ` [PATCH v2 8/9] arm64: Refactor sysinstr exception handling Suzuki K Poulose
@ 2016-08-26  9:23 ` Suzuki K Poulose
  2016-08-26 11:03   ` Ard Biesheuvel
  8 siblings, 1 reply; 17+ messages in thread
From: Suzuki K Poulose @ 2016-08-26  9:23 UTC (permalink / raw)
  To: will.deacon, catalin.marinas
  Cc: linux-arm-kernel, linux-kernel, marc.zyngier, mark.rutland,
	ard.biesheuvel, Suzuki K Poulose, Andre Przywara

Systems with differing CPU i-cache/d-cache line sizes can cause
problems with the cache management by software when the execution
is migrated from one to another. Usually, the application reads
the cache size on a CPU and then uses that length to perform cache
operations. However, if it gets migrated to another CPU with a smaller
cache line size, things could go completely wrong. To prevent such
cases, always use the smallest cache line size among the CPUs. The
kernel CPU feature infrastructure already keeps track of the safe
value for all CPUID registers including CTR. This patch works around
the problem by :

For kernel, dynamically patch the kernel to read the cache size
from the system wide copy of CTR_EL0.

For applications, trap read accesses to CTR_EL0 (by clearing the SCTLR.UCT)
and emulate the mrs instruction to return the system wide safe value
of CTR_EL0.

For faster access (i.e, avoiding to lookup the system wide value of CTR_EL0
via read_system_reg), we keep track of the pointer to table entry for
CTR_EL0 in the CPU feature infrastructure.

Cc: Mark Rutland <mark.rutland@arm.com>
Cc: Andre Przywara <andre.przywara@arm.com>
Cc: Will Deacon <will.deacon@arm.com>
Cc: Catalin Marinas <catalin.marinas@arm.com>
Signed-off-by: Suzuki K Poulose <suzuki.poulose@arm.com>
---
 arch/arm64/include/asm/assembler.h  | 25 +++++++++++++++++++++++--
 arch/arm64/include/asm/cpufeature.h |  4 +++-
 arch/arm64/include/asm/esr.h        |  8 ++++++++
 arch/arm64/include/asm/sysreg.h     |  1 +
 arch/arm64/kernel/asm-offsets.c     |  2 ++
 arch/arm64/kernel/cpu_errata.c      | 22 ++++++++++++++++++++++
 arch/arm64/kernel/cpufeature.c      |  9 +++++++++
 arch/arm64/kernel/traps.c           | 14 ++++++++++++++
 8 files changed, 82 insertions(+), 3 deletions(-)

diff --git a/arch/arm64/include/asm/assembler.h b/arch/arm64/include/asm/assembler.h
index a4bb3f5..66bd268 100644
--- a/arch/arm64/include/asm/assembler.h
+++ b/arch/arm64/include/asm/assembler.h
@@ -216,6 +216,21 @@ lr	.req	x30		// link register
 	.macro	mmid, rd, rn
 	ldr	\rd, [\rn, #MM_CONTEXT_ID]
 	.endm
+/*
+ * read_ctr - read CTR_EL0. If the system has mismatched
+ * cache line sizes, provide the system wide safe value.
+ */
+	.macro	read_ctr, reg
+alternative_if_not ARM64_MISMATCHED_CACHE_LINE_SIZE
+	mrs	\reg, ctr_el0			// read CTR
+	nop
+	nop
+alternative_else
+	ldr_l	\reg, sys_ctr_ftr		// Read system wide safe CTR value
+	ldr	\reg, [\reg, #ARM64_FTR_SYSVAL] // from sys_ctr_ftr->sys_val
+alternative_endif
+	.endm
+
 
 /*
  * raw_dcache_line_size - get the minimum D-cache line size on this CPU
@@ -232,7 +247,10 @@ lr	.req	x30		// link register
  * dcache_line_size - get the safe D-cache line size across all CPUs
  */
 	.macro	dcache_line_size, reg, tmp
-	raw_dcache_line_size	\reg, \tmp
+	read_ctr	\tmp
+	ubfm		\tmp, \tmp, #16, #19	// cache line size encoding
+	mov		\reg, #4		// bytes per word
+	lsl		\reg, \reg, \tmp	// actual cache line size
 	.endm
 
 /*
@@ -250,7 +268,10 @@ lr	.req	x30		// link register
  * icache_line_size - get the safe I-cache line size across all CPUs
  */
 	.macro	icache_line_size, reg, tmp
-	raw_icache_line_size	\reg, \tmp
+	read_ctr	\tmp
+	and		\tmp, \tmp, #0xf	// cache line size encoding
+	mov		\reg, #4		// bytes per word
+	lsl		\reg, \reg, \tmp	// actual cache line size
 	.endm
 
 /*
diff --git a/arch/arm64/include/asm/cpufeature.h b/arch/arm64/include/asm/cpufeature.h
index 692b8d3..e99f2af 100644
--- a/arch/arm64/include/asm/cpufeature.h
+++ b/arch/arm64/include/asm/cpufeature.h
@@ -37,8 +37,9 @@
 #define ARM64_WORKAROUND_CAVIUM_27456		12
 #define ARM64_HAS_32BIT_EL0			13
 #define ARM64_HYP_OFFSET_LOW			14
+#define ARM64_MISMATCHED_CACHE_LINE_SIZE	15
 
-#define ARM64_NCAPS				15
+#define ARM64_NCAPS				16
 
 #ifndef __ASSEMBLY__
 
@@ -109,6 +110,7 @@ struct arm64_cpu_capabilities {
 };
 
 extern DECLARE_BITMAP(cpu_hwcaps, ARM64_NCAPS);
+extern struct arm64_ftr_reg *sys_ctr_ftr;
 
 bool this_cpu_has_cap(unsigned int cap);
 
diff --git a/arch/arm64/include/asm/esr.h b/arch/arm64/include/asm/esr.h
index 9875b32..d14c478 100644
--- a/arch/arm64/include/asm/esr.h
+++ b/arch/arm64/include/asm/esr.h
@@ -149,6 +149,9 @@
 					 ((op2) << ESR_ELx_SYS64_ISS_OP2_SHIFT) | \
 					 ((crn) << ESR_ELx_SYS64_ISS_CRN_SHIFT) | \
 					 ((crm) << ESR_ELx_SYS64_ISS_CRM_SHIFT))
+
+#define ESR_ELx_SYS64_ISS_SYS_OP_MASK	(ESR_ELx_SYS64_ISS_SYS_MASK | \
+					 ESR_ELx_SYS64_ISS_DIR_MASK)
 /*
  * User space cache operations have the following sysreg encoding
  * in System instructions.
@@ -167,6 +170,11 @@
 #define ESR_ELx_SYS64_ISS_EL0_CACHE_OP_VAL \
 				(ESR_ELx_SYS64_ISS_SYS_VAL(1, 3, 1, 7, 0) | \
 				 ESR_ELx_SYS64_ISS_DIR_WRITE)
+
+#define ESR_ELx_SYS64_ISS_SYS_CTR	ESR_ELx_SYS64_ISS_SYS_VAL(3, 3, 1, 0, 0)
+#define ESR_ELx_SYS64_ISS_SYS_CTR_READ	(ESR_ELx_SYS64_ISS_SYS_CTR | \
+					 ESR_ELx_SYS64_ISS_DIR_READ)
+
 #ifndef __ASSEMBLY__
 #include <asm/types.h>
 
diff --git a/arch/arm64/include/asm/sysreg.h b/arch/arm64/include/asm/sysreg.h
index cc06794..4f26ad5 100644
--- a/arch/arm64/include/asm/sysreg.h
+++ b/arch/arm64/include/asm/sysreg.h
@@ -100,6 +100,7 @@
 /* SCTLR_EL1 specific flags. */
 #define SCTLR_EL1_UCI		(1 << 26)
 #define SCTLR_EL1_SPAN		(1 << 23)
+#define SCTLR_EL1_UCT		(1 << 15)
 #define SCTLR_EL1_SED		(1 << 8)
 #define SCTLR_EL1_CP15BEN	(1 << 5)
 
diff --git a/arch/arm64/kernel/asm-offsets.c b/arch/arm64/kernel/asm-offsets.c
index 05070b7..4a2f0f0 100644
--- a/arch/arm64/kernel/asm-offsets.c
+++ b/arch/arm64/kernel/asm-offsets.c
@@ -23,6 +23,7 @@
 #include <linux/dma-mapping.h>
 #include <linux/kvm_host.h>
 #include <linux/suspend.h>
+#include <asm/cpufeature.h>
 #include <asm/thread_info.h>
 #include <asm/memory.h>
 #include <asm/smp_plat.h>
@@ -145,5 +146,6 @@ int main(void)
   DEFINE(HIBERN_PBE_ORIG,	offsetof(struct pbe, orig_address));
   DEFINE(HIBERN_PBE_ADDR,	offsetof(struct pbe, address));
   DEFINE(HIBERN_PBE_NEXT,	offsetof(struct pbe, next));
+  DEFINE(ARM64_FTR_SYSVAL,	offsetof(struct arm64_ftr_reg, sys_val));
   return 0;
 }
diff --git a/arch/arm64/kernel/cpu_errata.c b/arch/arm64/kernel/cpu_errata.c
index 5836b3d..621fa7f 100644
--- a/arch/arm64/kernel/cpu_errata.c
+++ b/arch/arm64/kernel/cpu_errata.c
@@ -30,6 +30,21 @@ is_affected_midr_range(const struct arm64_cpu_capabilities *entry, int scope)
 				       entry->midr_range_max);
 }
 
+static bool
+has_mismatched_cache_line_size(const struct arm64_cpu_capabilities *entry,
+				int scope)
+{
+	WARN_ON(scope != SCOPE_LOCAL_CPU || preemptible());
+	return (read_cpuid_cachetype() & sys_ctr_ftr->strict_mask) !=
+		 (sys_ctr_ftr->sys_val & sys_ctr_ftr->strict_mask);
+}
+
+static void cpu_enable_trap_ctr_access(void *__unused)
+{
+	/* Clear SCTLR_EL1.UCT */
+	config_sctlr_el1(SCTLR_EL1_UCT, 0);
+}
+
 #define MIDR_RANGE(model, min, max) \
 	.def_scope = SCOPE_LOCAL_CPU, \
 	.matches = is_affected_midr_range, \
@@ -108,6 +123,13 @@ const struct arm64_cpu_capabilities arm64_errata[] = {
 	},
 #endif
 	{
+		.desc = "Mismatched cache line size",
+		.capability = ARM64_MISMATCHED_CACHE_LINE_SIZE,
+		.matches = has_mismatched_cache_line_size,
+		.def_scope = SCOPE_LOCAL_CPU,
+		.enable = cpu_enable_trap_ctr_access,
+	},
+	{
 	}
 };
 
diff --git a/arch/arm64/kernel/cpufeature.c b/arch/arm64/kernel/cpufeature.c
index fcf87ca..c1db313 100644
--- a/arch/arm64/kernel/cpufeature.c
+++ b/arch/arm64/kernel/cpufeature.c
@@ -46,6 +46,13 @@ unsigned int compat_elf_hwcap2 __read_mostly;
 
 DECLARE_BITMAP(cpu_hwcaps, ARM64_NCAPS);
 
+/*
+ * If we have mismatched cache line sizes, we need to emulate access to
+ * CTRL_EL0. To avoid a lookup everytime (via read_system_reg()), cache
+ * the table entry for CTR_EL0.
+ */
+struct arm64_ftr_reg *sys_ctr_ftr;
+
 #define __ARM64_FTR_BITS(SIGNED, STRICT, TYPE, SHIFT, WIDTH, SAFE_VAL) \
 	{						\
 		.sign = SIGNED,				\
@@ -461,6 +468,8 @@ void __init init_cpu_features(struct cpuinfo_arm64 *info)
 		init_cpu_ftr_reg(SYS_MVFR2_EL1, info->reg_mvfr2);
 	}
 
+	/* Keep track of the CTR feature register */
+	sys_ctr_ftr = get_arm64_ftr_reg(SYS_CTR_EL0);
 }
 
 static void update_cpu_ftr_reg(struct arm64_ftr_reg *reg, u64 new)
diff --git a/arch/arm64/kernel/traps.c b/arch/arm64/kernel/traps.c
index 224f64e..7c28898 100644
--- a/arch/arm64/kernel/traps.c
+++ b/arch/arm64/kernel/traps.c
@@ -480,6 +480,14 @@ static void user_cache_maint_handler(unsigned int esr, struct pt_regs *regs)
 		regs->pc += 4;
 }
 
+static void ctr_read_handler(unsigned int esr, struct pt_regs *regs)
+{
+	int rt = (esr & ESR_ELx_SYS64_ISS_RT_MASK) >> ESR_ELx_SYS64_ISS_RT_SHIFT;
+
+	regs->regs[rt] = sys_ctr_ftr->sys_val;
+	regs->pc += 4;
+}
+
 struct sys64_hook {
 	unsigned int esr_mask;
 	unsigned int esr_val;
@@ -492,6 +500,12 @@ static struct sys64_hook sys64_hooks[] = {
 		.esr_val = ESR_ELx_SYS64_ISS_EL0_CACHE_OP_VAL,
 		.handler = user_cache_maint_handler,
 	},
+	{
+		/* Trap read access to CTR_EL0 */
+		.esr_mask = ESR_ELx_SYS64_ISS_SYS_OP_MASK,
+		.esr_val = ESR_ELx_SYS64_ISS_SYS_CTR_READ,
+		.handler = ctr_read_handler,
+	},
 	{},
 };
 
-- 
2.7.4

^ permalink raw reply related	[flat|nested] 17+ messages in thread

* Re: [PATCH v2 9/9] arm64: Work around systems with mismatched cache line sizes
  2016-08-26  9:23 ` [PATCH v2 9/9] arm64: Work around systems with mismatched cache line sizes Suzuki K Poulose
@ 2016-08-26 11:03   ` Ard Biesheuvel
  2016-08-26 13:04     ` Suzuki K Poulose
  0 siblings, 1 reply; 17+ messages in thread
From: Ard Biesheuvel @ 2016-08-26 11:03 UTC (permalink / raw)
  To: Suzuki K Poulose
  Cc: Will Deacon, Catalin Marinas, linux-arm-kernel, linux-kernel,
	Marc Zyngier, Mark Rutland, Andre Przywara

Hello Suzuki,

On 26 August 2016 at 11:23, Suzuki K Poulose <suzuki.poulose@arm.com> wrote:
> Systems with differing CPU i-cache/d-cache line sizes can cause
> problems with the cache management by software when the execution
> is migrated from one to another. Usually, the application reads
> the cache size on a CPU and then uses that length to perform cache
> operations. However, if it gets migrated to another CPU with a smaller
> cache line size, things could go completely wrong. To prevent such
> cases, always use the smallest cache line size among the CPUs. The
> kernel CPU feature infrastructure already keeps track of the safe
> value for all CPUID registers including CTR. This patch works around
> the problem by :
>
> For kernel, dynamically patch the kernel to read the cache size
> from the system wide copy of CTR_EL0.
>
> For applications, trap read accesses to CTR_EL0 (by clearing the SCTLR.UCT)
> and emulate the mrs instruction to return the system wide safe value
> of CTR_EL0.
>
> For faster access (i.e, avoiding to lookup the system wide value of CTR_EL0
> via read_system_reg), we keep track of the pointer to table entry for
> CTR_EL0 in the CPU feature infrastructure.
>

IIUC it is the runtime sorting of the arm64_ftr_reg array that
requires you to stash a pointer to CTR_EL0's entry somewhere, so that
you can dereference it without doing the bsearch.

IMO, this is a pattern that we should avoid: you are introducing one
instance now, which will make it hard to say no to the next one in the
future. Isn't there a better way to organize the arm64_ftr_reg array
that allows us to reference entries directly? Ideally, a way that gets
rid of the runtime sorting, since I don't think that is a good
replacement for developer discipline anyway (although I should have
spoken up when that was first introduced) Or am I missing something
here?

-- 
Ard.

> Cc: Mark Rutland <mark.rutland@arm.com>
> Cc: Andre Przywara <andre.przywara@arm.com>
> Cc: Will Deacon <will.deacon@arm.com>
> Cc: Catalin Marinas <catalin.marinas@arm.com>
> Signed-off-by: Suzuki K Poulose <suzuki.poulose@arm.com>
> ---
>  arch/arm64/include/asm/assembler.h  | 25 +++++++++++++++++++++++--
>  arch/arm64/include/asm/cpufeature.h |  4 +++-
>  arch/arm64/include/asm/esr.h        |  8 ++++++++
>  arch/arm64/include/asm/sysreg.h     |  1 +
>  arch/arm64/kernel/asm-offsets.c     |  2 ++
>  arch/arm64/kernel/cpu_errata.c      | 22 ++++++++++++++++++++++
>  arch/arm64/kernel/cpufeature.c      |  9 +++++++++
>  arch/arm64/kernel/traps.c           | 14 ++++++++++++++
>  8 files changed, 82 insertions(+), 3 deletions(-)
>
> diff --git a/arch/arm64/include/asm/assembler.h b/arch/arm64/include/asm/assembler.h
> index a4bb3f5..66bd268 100644
> --- a/arch/arm64/include/asm/assembler.h
> +++ b/arch/arm64/include/asm/assembler.h
> @@ -216,6 +216,21 @@ lr .req    x30             // link register
>         .macro  mmid, rd, rn
>         ldr     \rd, [\rn, #MM_CONTEXT_ID]
>         .endm
> +/*
> + * read_ctr - read CTR_EL0. If the system has mismatched
> + * cache line sizes, provide the system wide safe value.
> + */
> +       .macro  read_ctr, reg
> +alternative_if_not ARM64_MISMATCHED_CACHE_LINE_SIZE
> +       mrs     \reg, ctr_el0                   // read CTR
> +       nop
> +       nop
> +alternative_else
> +       ldr_l   \reg, sys_ctr_ftr               // Read system wide safe CTR value
> +       ldr     \reg, [\reg, #ARM64_FTR_SYSVAL] // from sys_ctr_ftr->sys_val
> +alternative_endif
> +       .endm
> +
>
>  /*
>   * raw_dcache_line_size - get the minimum D-cache line size on this CPU
> @@ -232,7 +247,10 @@ lr .req    x30             // link register
>   * dcache_line_size - get the safe D-cache line size across all CPUs
>   */
>         .macro  dcache_line_size, reg, tmp
> -       raw_dcache_line_size    \reg, \tmp
> +       read_ctr        \tmp
> +       ubfm            \tmp, \tmp, #16, #19    // cache line size encoding
> +       mov             \reg, #4                // bytes per word
> +       lsl             \reg, \reg, \tmp        // actual cache line size
>         .endm
>
>  /*
> @@ -250,7 +268,10 @@ lr .req    x30             // link register
>   * icache_line_size - get the safe I-cache line size across all CPUs
>   */
>         .macro  icache_line_size, reg, tmp
> -       raw_icache_line_size    \reg, \tmp
> +       read_ctr        \tmp
> +       and             \tmp, \tmp, #0xf        // cache line size encoding
> +       mov             \reg, #4                // bytes per word
> +       lsl             \reg, \reg, \tmp        // actual cache line size
>         .endm
>
>  /*
> diff --git a/arch/arm64/include/asm/cpufeature.h b/arch/arm64/include/asm/cpufeature.h
> index 692b8d3..e99f2af 100644
> --- a/arch/arm64/include/asm/cpufeature.h
> +++ b/arch/arm64/include/asm/cpufeature.h
> @@ -37,8 +37,9 @@
>  #define ARM64_WORKAROUND_CAVIUM_27456          12
>  #define ARM64_HAS_32BIT_EL0                    13
>  #define ARM64_HYP_OFFSET_LOW                   14
> +#define ARM64_MISMATCHED_CACHE_LINE_SIZE       15
>
> -#define ARM64_NCAPS                            15
> +#define ARM64_NCAPS                            16
>
>  #ifndef __ASSEMBLY__
>
> @@ -109,6 +110,7 @@ struct arm64_cpu_capabilities {
>  };
>
>  extern DECLARE_BITMAP(cpu_hwcaps, ARM64_NCAPS);
> +extern struct arm64_ftr_reg *sys_ctr_ftr;
>
>  bool this_cpu_has_cap(unsigned int cap);
>
> diff --git a/arch/arm64/include/asm/esr.h b/arch/arm64/include/asm/esr.h
> index 9875b32..d14c478 100644
> --- a/arch/arm64/include/asm/esr.h
> +++ b/arch/arm64/include/asm/esr.h
> @@ -149,6 +149,9 @@
>                                          ((op2) << ESR_ELx_SYS64_ISS_OP2_SHIFT) | \
>                                          ((crn) << ESR_ELx_SYS64_ISS_CRN_SHIFT) | \
>                                          ((crm) << ESR_ELx_SYS64_ISS_CRM_SHIFT))
> +
> +#define ESR_ELx_SYS64_ISS_SYS_OP_MASK  (ESR_ELx_SYS64_ISS_SYS_MASK | \
> +                                        ESR_ELx_SYS64_ISS_DIR_MASK)
>  /*
>   * User space cache operations have the following sysreg encoding
>   * in System instructions.
> @@ -167,6 +170,11 @@
>  #define ESR_ELx_SYS64_ISS_EL0_CACHE_OP_VAL \
>                                 (ESR_ELx_SYS64_ISS_SYS_VAL(1, 3, 1, 7, 0) | \
>                                  ESR_ELx_SYS64_ISS_DIR_WRITE)
> +
> +#define ESR_ELx_SYS64_ISS_SYS_CTR      ESR_ELx_SYS64_ISS_SYS_VAL(3, 3, 1, 0, 0)
> +#define ESR_ELx_SYS64_ISS_SYS_CTR_READ (ESR_ELx_SYS64_ISS_SYS_CTR | \
> +                                        ESR_ELx_SYS64_ISS_DIR_READ)
> +
>  #ifndef __ASSEMBLY__
>  #include <asm/types.h>
>
> diff --git a/arch/arm64/include/asm/sysreg.h b/arch/arm64/include/asm/sysreg.h
> index cc06794..4f26ad5 100644
> --- a/arch/arm64/include/asm/sysreg.h
> +++ b/arch/arm64/include/asm/sysreg.h
> @@ -100,6 +100,7 @@
>  /* SCTLR_EL1 specific flags. */
>  #define SCTLR_EL1_UCI          (1 << 26)
>  #define SCTLR_EL1_SPAN         (1 << 23)
> +#define SCTLR_EL1_UCT          (1 << 15)
>  #define SCTLR_EL1_SED          (1 << 8)
>  #define SCTLR_EL1_CP15BEN      (1 << 5)
>
> diff --git a/arch/arm64/kernel/asm-offsets.c b/arch/arm64/kernel/asm-offsets.c
> index 05070b7..4a2f0f0 100644
> --- a/arch/arm64/kernel/asm-offsets.c
> +++ b/arch/arm64/kernel/asm-offsets.c
> @@ -23,6 +23,7 @@
>  #include <linux/dma-mapping.h>
>  #include <linux/kvm_host.h>
>  #include <linux/suspend.h>
> +#include <asm/cpufeature.h>
>  #include <asm/thread_info.h>
>  #include <asm/memory.h>
>  #include <asm/smp_plat.h>
> @@ -145,5 +146,6 @@ int main(void)
>    DEFINE(HIBERN_PBE_ORIG,      offsetof(struct pbe, orig_address));
>    DEFINE(HIBERN_PBE_ADDR,      offsetof(struct pbe, address));
>    DEFINE(HIBERN_PBE_NEXT,      offsetof(struct pbe, next));
> +  DEFINE(ARM64_FTR_SYSVAL,     offsetof(struct arm64_ftr_reg, sys_val));
>    return 0;
>  }
> diff --git a/arch/arm64/kernel/cpu_errata.c b/arch/arm64/kernel/cpu_errata.c
> index 5836b3d..621fa7f 100644
> --- a/arch/arm64/kernel/cpu_errata.c
> +++ b/arch/arm64/kernel/cpu_errata.c
> @@ -30,6 +30,21 @@ is_affected_midr_range(const struct arm64_cpu_capabilities *entry, int scope)
>                                        entry->midr_range_max);
>  }
>
> +static bool
> +has_mismatched_cache_line_size(const struct arm64_cpu_capabilities *entry,
> +                               int scope)
> +{
> +       WARN_ON(scope != SCOPE_LOCAL_CPU || preemptible());
> +       return (read_cpuid_cachetype() & sys_ctr_ftr->strict_mask) !=
> +                (sys_ctr_ftr->sys_val & sys_ctr_ftr->strict_mask);
> +}
> +
> +static void cpu_enable_trap_ctr_access(void *__unused)
> +{
> +       /* Clear SCTLR_EL1.UCT */
> +       config_sctlr_el1(SCTLR_EL1_UCT, 0);
> +}
> +
>  #define MIDR_RANGE(model, min, max) \
>         .def_scope = SCOPE_LOCAL_CPU, \
>         .matches = is_affected_midr_range, \
> @@ -108,6 +123,13 @@ const struct arm64_cpu_capabilities arm64_errata[] = {
>         },
>  #endif
>         {
> +               .desc = "Mismatched cache line size",
> +               .capability = ARM64_MISMATCHED_CACHE_LINE_SIZE,
> +               .matches = has_mismatched_cache_line_size,
> +               .def_scope = SCOPE_LOCAL_CPU,
> +               .enable = cpu_enable_trap_ctr_access,
> +       },
> +       {
>         }
>  };
>
> diff --git a/arch/arm64/kernel/cpufeature.c b/arch/arm64/kernel/cpufeature.c
> index fcf87ca..c1db313 100644
> --- a/arch/arm64/kernel/cpufeature.c
> +++ b/arch/arm64/kernel/cpufeature.c
> @@ -46,6 +46,13 @@ unsigned int compat_elf_hwcap2 __read_mostly;
>
>  DECLARE_BITMAP(cpu_hwcaps, ARM64_NCAPS);
>
> +/*
> + * If we have mismatched cache line sizes, we need to emulate access to
> + * CTRL_EL0. To avoid a lookup everytime (via read_system_reg()), cache
> + * the table entry for CTR_EL0.
> + */
> +struct arm64_ftr_reg *sys_ctr_ftr;
> +
>  #define __ARM64_FTR_BITS(SIGNED, STRICT, TYPE, SHIFT, WIDTH, SAFE_VAL) \
>         {                                               \
>                 .sign = SIGNED,                         \
> @@ -461,6 +468,8 @@ void __init init_cpu_features(struct cpuinfo_arm64 *info)
>                 init_cpu_ftr_reg(SYS_MVFR2_EL1, info->reg_mvfr2);
>         }
>
> +       /* Keep track of the CTR feature register */
> +       sys_ctr_ftr = get_arm64_ftr_reg(SYS_CTR_EL0);
>  }
>
>  static void update_cpu_ftr_reg(struct arm64_ftr_reg *reg, u64 new)
> diff --git a/arch/arm64/kernel/traps.c b/arch/arm64/kernel/traps.c
> index 224f64e..7c28898 100644
> --- a/arch/arm64/kernel/traps.c
> +++ b/arch/arm64/kernel/traps.c
> @@ -480,6 +480,14 @@ static void user_cache_maint_handler(unsigned int esr, struct pt_regs *regs)
>                 regs->pc += 4;
>  }
>
> +static void ctr_read_handler(unsigned int esr, struct pt_regs *regs)
> +{
> +       int rt = (esr & ESR_ELx_SYS64_ISS_RT_MASK) >> ESR_ELx_SYS64_ISS_RT_SHIFT;
> +
> +       regs->regs[rt] = sys_ctr_ftr->sys_val;
> +       regs->pc += 4;
> +}
> +
>  struct sys64_hook {
>         unsigned int esr_mask;
>         unsigned int esr_val;
> @@ -492,6 +500,12 @@ static struct sys64_hook sys64_hooks[] = {
>                 .esr_val = ESR_ELx_SYS64_ISS_EL0_CACHE_OP_VAL,
>                 .handler = user_cache_maint_handler,
>         },
> +       {
> +               /* Trap read access to CTR_EL0 */
> +               .esr_mask = ESR_ELx_SYS64_ISS_SYS_OP_MASK,
> +               .esr_val = ESR_ELx_SYS64_ISS_SYS_CTR_READ,
> +               .handler = ctr_read_handler,
> +       },
>         {},
>  };
>
> --
> 2.7.4
>

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [PATCH v2 9/9] arm64: Work around systems with mismatched cache line sizes
  2016-08-26 11:03   ` Ard Biesheuvel
@ 2016-08-26 13:04     ` Suzuki K Poulose
  2016-08-26 13:08       ` Suzuki K Poulose
  0 siblings, 1 reply; 17+ messages in thread
From: Suzuki K Poulose @ 2016-08-26 13:04 UTC (permalink / raw)
  To: Ard Biesheuvel
  Cc: Will Deacon, Catalin Marinas, linux-arm-kernel, linux-kernel,
	Marc Zyngier, Mark Rutland, Andre Przywara

On 26/08/16 12:03, Ard Biesheuvel wrote:
> Hello Suzuki,
>


>> For faster access (i.e, avoiding to lookup the system wide value of CTR_EL0
>> via read_system_reg), we keep track of the pointer to table entry for
>> CTR_EL0 in the CPU feature infrastructure.
>>
>
> IIUC it is the runtime sorting of the arm64_ftr_reg array that
> requires you to stash a pointer to CTR_EL0's entry somewhere, so that
> you can dereference it without doing the bsearch.

Correct.

>
> IMO, this is a pattern that we should avoid: you are introducing one
> instance now, which will make it hard to say no to the next one in the
> future. Isn't there a better way to organize the arm64_ftr_reg array
> that allows us to reference entries directly? Ideally, a way that gets
> rid of the runtime sorting, since I don't think that is a good
> replacement for developer discipline anyway (although I should have
> spoken up when that was first introduced) Or am I missing something
> here?

I had some form of direct access to the feature register in one of
the versions [0], but was dropped based on Catalin's suggestion at [1].


[0] https://lkml.org/lkml/2015/10/5/504
[1] https://lkml.org/lkml/2015/10/7/558


Suzuki

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [PATCH v2 9/9] arm64: Work around systems with mismatched cache line sizes
  2016-08-26 13:04     ` Suzuki K Poulose
@ 2016-08-26 13:08       ` Suzuki K Poulose
  2016-08-26 16:16         ` Will Deacon
  0 siblings, 1 reply; 17+ messages in thread
From: Suzuki K Poulose @ 2016-08-26 13:08 UTC (permalink / raw)
  To: Ard Biesheuvel
  Cc: Will Deacon, Catalin Marinas, linux-arm-kernel, linux-kernel,
	Marc Zyngier, Mark Rutland, Andre Przywara

On 26/08/16 14:04, Suzuki K Poulose wrote:
> On 26/08/16 12:03, Ard Biesheuvel wrote:
>> Hello Suzuki,
>>
>
>
>>> For faster access (i.e, avoiding to lookup the system wide value of CTR_EL0
>>> via read_system_reg), we keep track of the pointer to table entry for
>>> CTR_EL0 in the CPU feature infrastructure.
>>>
>>
>> IIUC it is the runtime sorting of the arm64_ftr_reg array that
>> requires you to stash a pointer to CTR_EL0's entry somewhere, so that
>> you can dereference it without doing the bsearch.
>
> Correct.



>
>>
>> IMO, this is a pattern that we should avoid: you are introducing one
>> instance now, which will make it hard to say no to the next one in the
>> future. Isn't there a better way to organize the arm64_ftr_reg array
>> that allows us to reference entries directly? Ideally, a way that gets
>> rid of the runtime sorting, since I don't think that is a good
>> replacement for developer discipline anyway (although I should have
>> spoken up when that was first introduced) Or am I missing something
>> here?
>
> I had some form of direct access to the feature register in one of
> the versions [0], but was dropped based on Catalin's suggestion at [1].

Forgot to add, [0] wouldn't solve this issue cleanly either. It would simply
speed up the read_system_reg(). So we do need a call to read_system_reg()
from assembly code, which makes it a little bit tricky.

Suzuki

>
>
> [0] https://lkml.org/lkml/2015/10/5/504
> [1] https://lkml.org/lkml/2015/10/7/558
>
>
> Suzuki
>
>

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [PATCH v2 9/9] arm64: Work around systems with mismatched cache line sizes
  2016-08-26 13:08       ` Suzuki K Poulose
@ 2016-08-26 16:16         ` Will Deacon
  2016-08-26 16:58           ` Ard Biesheuvel
  2016-08-26 17:00           ` Catalin Marinas
  0 siblings, 2 replies; 17+ messages in thread
From: Will Deacon @ 2016-08-26 16:16 UTC (permalink / raw)
  To: Suzuki K Poulose
  Cc: Ard Biesheuvel, Catalin Marinas, linux-arm-kernel, linux-kernel,
	Marc Zyngier, Mark Rutland, Andre Przywara

On Fri, Aug 26, 2016 at 02:08:01PM +0100, Suzuki K Poulose wrote:
> On 26/08/16 14:04, Suzuki K Poulose wrote:
> >On 26/08/16 12:03, Ard Biesheuvel wrote:
> >>IMO, this is a pattern that we should avoid: you are introducing one
> >>instance now, which will make it hard to say no to the next one in the
> >>future. Isn't there a better way to organize the arm64_ftr_reg array
> >>that allows us to reference entries directly? Ideally, a way that gets
> >>rid of the runtime sorting, since I don't think that is a good
> >>replacement for developer discipline anyway (although I should have
> >>spoken up when that was first introduced) Or am I missing something
> >>here?
> >
> >I had some form of direct access to the feature register in one of
> >the versions [0], but was dropped based on Catalin's suggestion at [1].
> 
> Forgot to add, [0] wouldn't solve this issue cleanly either. It would simply
> speed up the read_system_reg(). So we do need a call to read_system_reg()
> from assembly code, which makes it a little bit tricky.

It might be worth looking to see if we can pass the ctr as an extra
parameter to the assembly routines that need it. Then you can access it
easily from C code, and if you pass it as 0 that could result in the asm
code reading it from the h/w register, removing the need for the _raw
stuff you add.

Of course, it could also be a complete mess fixing up all the callers,
but it's probably worth investigating to see what the trade-off is.

Will

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [PATCH v2 9/9] arm64: Work around systems with mismatched cache line sizes
  2016-08-26 16:16         ` Will Deacon
@ 2016-08-26 16:58           ` Ard Biesheuvel
  2016-08-26 17:00           ` Catalin Marinas
  1 sibling, 0 replies; 17+ messages in thread
From: Ard Biesheuvel @ 2016-08-26 16:58 UTC (permalink / raw)
  To: Will Deacon
  Cc: Suzuki K Poulose, Catalin Marinas, linux-arm-kernel,
	linux-kernel, Marc Zyngier, Mark Rutland, Andre Przywara

On 26 August 2016 at 17:16, Will Deacon <will.deacon@arm.com> wrote:
> On Fri, Aug 26, 2016 at 02:08:01PM +0100, Suzuki K Poulose wrote:
>> On 26/08/16 14:04, Suzuki K Poulose wrote:
>> >On 26/08/16 12:03, Ard Biesheuvel wrote:
>> >>IMO, this is a pattern that we should avoid: you are introducing one
>> >>instance now, which will make it hard to say no to the next one in the
>> >>future. Isn't there a better way to organize the arm64_ftr_reg array
>> >>that allows us to reference entries directly? Ideally, a way that gets
>> >>rid of the runtime sorting, since I don't think that is a good
>> >>replacement for developer discipline anyway (although I should have
>> >>spoken up when that was first introduced) Or am I missing something
>> >>here?
>> >
>> >I had some form of direct access to the feature register in one of
>> >the versions [0], but was dropped based on Catalin's suggestion at [1].
>>
>> Forgot to add, [0] wouldn't solve this issue cleanly either. It would simply
>> speed up the read_system_reg(). So we do need a call to read_system_reg()
>> from assembly code, which makes it a little bit tricky.
>
> It might be worth looking to see if we can pass the ctr as an extra
> parameter to the assembly routines that need it. Then you can access it
> easily from C code, and if you pass it as 0 that could result in the asm
> code reading it from the h/w register, removing the need for the _raw
> stuff you add.
>
> Of course, it could also be a complete mess fixing up all the callers,
> but it's probably worth investigating to see what the trade-off is.
>

The point Catalin made was under the assumption that this code is
never called on a hot path, and now you are finding yourself
optimizing the access, which either means you are doing so
prematurely, or it is on a hot path.

I will follow up with a separate suggestion. Feel free to incorporate
it, or completely ignore it. Just my 2 cents.

-- 
Ard.

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [PATCH v2 9/9] arm64: Work around systems with mismatched cache line sizes
  2016-08-26 16:16         ` Will Deacon
  2016-08-26 16:58           ` Ard Biesheuvel
@ 2016-08-26 17:00           ` Catalin Marinas
  2016-09-02 10:03             ` Suzuki K Poulose
  1 sibling, 1 reply; 17+ messages in thread
From: Catalin Marinas @ 2016-08-26 17:00 UTC (permalink / raw)
  To: Will Deacon
  Cc: Suzuki K Poulose, Mark Rutland, Ard Biesheuvel, Marc Zyngier,
	linux-kernel, Andre Przywara, linux-arm-kernel

On Fri, Aug 26, 2016 at 05:16:27PM +0100, Will Deacon wrote:
> On Fri, Aug 26, 2016 at 02:08:01PM +0100, Suzuki K Poulose wrote:
> > On 26/08/16 14:04, Suzuki K Poulose wrote:
> > >On 26/08/16 12:03, Ard Biesheuvel wrote:
> > >>IMO, this is a pattern that we should avoid: you are introducing one
> > >>instance now, which will make it hard to say no to the next one in the
> > >>future. Isn't there a better way to organize the arm64_ftr_reg array
> > >>that allows us to reference entries directly? Ideally, a way that gets
> > >>rid of the runtime sorting, since I don't think that is a good
> > >>replacement for developer discipline anyway (although I should have
> > >>spoken up when that was first introduced) Or am I missing something
> > >>here?

I'm not sure we can have some simple direct access. Suzuki did some
grouping originally but I wouldn't call that a direct access either,
more like iterating through several groups.

A possibility would be to generate global variables for each
arm64_ftr_reg with the ARM64_FTR_REG macro extended to also place a
pointer in a dedicated section to be used as array. Assembly code would
access the global variable directly. But I'm not really sure it's worth
it.

> > >I had some form of direct access to the feature register in one of
> > >the versions [0], but was dropped based on Catalin's suggestion at [1].
> > 
> > Forgot to add, [0] wouldn't solve this issue cleanly either. It would simply
> > speed up the read_system_reg(). So we do need a call to read_system_reg()
> > from assembly code, which makes it a little bit tricky.
> 
> It might be worth looking to see if we can pass the ctr as an extra
> parameter to the assembly routines that need it. Then you can access it
> easily from C code, and if you pass it as 0 that could result in the asm
> code reading it from the h/w register, removing the need for the _raw
> stuff you add.

How often to we need to access a sanitised sysreg from assembly? AFAICT,
CTR_EL0 is the first. If we only need it to infer the minimum cache line
size, we could as well store the latter in a global variable and access
it directly. If we feel brave, we could patch a "mov \reg, #x"
instruction in the ?cache_line_size macros (starting with 32 by default,
though to make it less cumbersome we'd have to improve the run-time
patching code a bit).

-- 
Catalin

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [PATCH v2 9/9] arm64: Work around systems with mismatched cache line sizes
  2016-08-26 17:00           ` Catalin Marinas
@ 2016-09-02 10:03             ` Suzuki K Poulose
  0 siblings, 0 replies; 17+ messages in thread
From: Suzuki K Poulose @ 2016-09-02 10:03 UTC (permalink / raw)
  To: Catalin Marinas, Will Deacon
  Cc: Mark Rutland, Ard Biesheuvel, Marc Zyngier, linux-kernel,
	Andre Przywara, linux-arm-kernel

On 26/08/16 18:00, Catalin Marinas wrote:
> On Fri, Aug 26, 2016 at 05:16:27PM +0100, Will Deacon wrote:
>> On Fri, Aug 26, 2016 at 02:08:01PM +0100, Suzuki K Poulose wrote:
>>> On 26/08/16 14:04, Suzuki K Poulose wrote:

>> It might be worth looking to see if we can pass the ctr as an extra
>> parameter to the assembly routines that need it. Then you can access it
>> easily from C code, and if you pass it as 0 that could result in the asm
>> code reading it from the h/w register, removing the need for the _raw
>> stuff you add.
>
> How often to we need to access a sanitised sysreg from assembly? AFAICT,
> CTR_EL0 is the first. If we only need it to infer the minimum cache line
> size, we could as well store the latter in a global variable and access
> it directly. If we feel brave, we could patch a "mov \reg, #x"
> instruction in the ?cache_line_size macros (starting with 32 by default,
> though to make it less cumbersome we'd have to improve the run-time
> patching code a bit).


With Ard's patches [1] to refactor the feature array, we can refer to named
CTR_EL0 feature register cleanly. I can rebase this series on top of that
if nobody has any objection.

[1] http://marc.info/?l=linux-arm-kernel&m=147263959504998&w=2


Suzuki

^ permalink raw reply	[flat|nested] 17+ messages in thread

end of thread, other threads:[~2016-09-02 10:03 UTC | newest]

Thread overview: 17+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2016-08-26  9:23 [PATCH v2 0/9] arm64: Work around for mismatched cache line size Suzuki K Poulose
2016-08-26  9:23 ` [PATCH v2 1/9] arm64: Set the safe value for L1 icache policy Suzuki K Poulose
2016-08-26  9:23 ` [PATCH v2 2/9] arm64: Use consistent naming for errata handling Suzuki K Poulose
2016-08-26  9:23 ` [PATCH v2 3/9] arm64: Rearrange CPU errata workaround checks Suzuki K Poulose
2016-08-26  9:23 ` [PATCH v2 4/9] arm64: alternative: Disallow patching instructions using literals Suzuki K Poulose
2016-08-26  9:23 ` [PATCH v2 5/9] arm64: insn: Add helpers for adrp offsets Suzuki K Poulose
2016-08-26  9:23 ` [PATCH v2 6/9] arm64: alternative: Add support for patching adrp instructions Suzuki K Poulose
2016-08-26  9:23 ` [PATCH v2 7/9] arm64: Introduce raw_{d,i}cache_line_size Suzuki K Poulose
2016-08-26  9:23 ` [PATCH v2 8/9] arm64: Refactor sysinstr exception handling Suzuki K Poulose
2016-08-26  9:23 ` [PATCH v2 9/9] arm64: Work around systems with mismatched cache line sizes Suzuki K Poulose
2016-08-26 11:03   ` Ard Biesheuvel
2016-08-26 13:04     ` Suzuki K Poulose
2016-08-26 13:08       ` Suzuki K Poulose
2016-08-26 16:16         ` Will Deacon
2016-08-26 16:58           ` Ard Biesheuvel
2016-08-26 17:00           ` Catalin Marinas
2016-09-02 10:03             ` Suzuki K Poulose

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).