linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [RESEND] [PATCH 0/8] arm64: Work around for mismatched cache line size
@ 2016-08-18 13:10 Suzuki K Poulose
  2016-08-18 13:10 ` [PATCH 1/8] arm64: Set the safe value for L1 icache policy Suzuki K Poulose
                   ` (7 more replies)
  0 siblings, 8 replies; 24+ messages in thread
From: Suzuki K Poulose @ 2016-08-18 13:10 UTC (permalink / raw)
  To: linux-arm-kernel
  Cc: linux-kernel, will.deacon, catalin.marinas, mark.rutland,
	andre.przywara, Suzuki K Poulose

This series adds a work around for systems with mismatched {I,D}-cache
line sizes. When a thread of execution gets migrated to a different CPU,
the cache line size it had cached could be larger than that of the new
CPU. This could cause data corruption issues. We work around this by

 - Dynamically patching the kernel to use the smallest line size on the
   system (from the CPU feature infrastructure)
 - Trapping the userspace access to CTR_EL0 (by clearing SCTLR_EL1.UCT) and
   emulating it with the system wide safe value of CTR.

The series also adds support for alternative code patching of adrp
instructions by adjusting the PC-relative address offset to reflect
the new PC.

The series has been tested on Juno with a hack to forced enabling
of the capability.

Applies on aarch64: for-next/core. The tree is avaiable at :

	git://linux-arm.org/linux-skp.git ctr-emulation

Suzuki K Poulose (8):
  arm64: Set the safe value for L1 icache policy
  arm64: Use consistent naming for errata handling
  arm64: Rearrange CPU errata workaround checks
  arm64: insn: Add helpers for adrp offsets
  arm64: alternative: Add support for patching adrp instructions
  arm64: Introduce raw_{d,i}cache_line_size
  arm64: Refactor sysinstr exception handling
  arm64: Work around systems with mismatched cache line sizes

 arch/arm64/include/asm/assembler.h  | 45 +++++++++++++++++--
 arch/arm64/include/asm/cpufeature.h | 14 +++---
 arch/arm64/include/asm/esr.h        | 56 ++++++++++++++++++++++++
 arch/arm64/include/asm/insn.h       |  4 ++
 arch/arm64/include/asm/sysreg.h     |  1 +
 arch/arm64/kernel/alternative.c     | 13 ++++++
 arch/arm64/kernel/asm-offsets.c     |  2 +
 arch/arm64/kernel/cpu_errata.c      | 26 ++++++++++-
 arch/arm64/kernel/cpufeature.c      | 44 ++++++++++++++-----
 arch/arm64/kernel/cpuinfo.c         |  2 -
 arch/arm64/kernel/hibernate-asm.S   |  2 +-
 arch/arm64/kernel/insn.c            | 13 ++++++
 arch/arm64/kernel/relocate_kernel.S |  2 +-
 arch/arm64/kernel/smp.c             |  8 +++-
 arch/arm64/kernel/traps.c           | 87 ++++++++++++++++++++++++++-----------
 15 files changed, 264 insertions(+), 55 deletions(-)

-- 
2.7.4

^ permalink raw reply	[flat|nested] 24+ messages in thread

* [PATCH 1/8] arm64: Set the safe value for L1 icache policy
  2016-08-18 13:10 [RESEND] [PATCH 0/8] arm64: Work around for mismatched cache line size Suzuki K Poulose
@ 2016-08-18 13:10 ` Suzuki K Poulose
  2016-08-18 13:10 ` [PATCH 2/8] arm64: Use consistent naming for errata handling Suzuki K Poulose
                   ` (6 subsequent siblings)
  7 siblings, 0 replies; 24+ messages in thread
From: Suzuki K Poulose @ 2016-08-18 13:10 UTC (permalink / raw)
  To: linux-arm-kernel
  Cc: linux-kernel, will.deacon, catalin.marinas, mark.rutland,
	andre.przywara, Suzuki K Poulose

Right now we use 0 as the safe value for CTR_EL0:L1Ip, which is
not defined at the moment. The safer value for the L1Ip should be
the weakest of the policies, which happens to be AIVIVT. While at it,
fix the comment about safe_val.

Cc: Mark Rutland <mark.rutland@arm.com>
Cc: Catalin Marinas <catalin.marinas@arm.com>
Cc: Will Deacon <will.deacon@arm.com>
Signed-off-by: Suzuki K Poulose <suzuki.poulose@arm.com>
---
 arch/arm64/include/asm/cpufeature.h | 2 +-
 arch/arm64/kernel/cpufeature.c      | 5 +++--
 2 files changed, 4 insertions(+), 3 deletions(-)

diff --git a/arch/arm64/include/asm/cpufeature.h b/arch/arm64/include/asm/cpufeature.h
index 7099f26..f6f5e49 100644
--- a/arch/arm64/include/asm/cpufeature.h
+++ b/arch/arm64/include/asm/cpufeature.h
@@ -63,7 +63,7 @@ struct arm64_ftr_bits {
 	enum ftr_type	type;
 	u8		shift;
 	u8		width;
-	s64		safe_val; /* safe value for discrete features */
+	s64		safe_val; /* safe value for FTR_EXACT features */
 };
 
 /*
diff --git a/arch/arm64/kernel/cpufeature.c b/arch/arm64/kernel/cpufeature.c
index 62272ea..43f73e0 100644
--- a/arch/arm64/kernel/cpufeature.c
+++ b/arch/arm64/kernel/cpufeature.c
@@ -147,9 +147,10 @@ static struct arm64_ftr_bits ftr_ctr[] = {
 	ARM64_FTR_BITS(FTR_STRICT, FTR_LOWER_SAFE, 16, 4, 1),	/* DminLine */
 	/*
 	 * Linux can handle differing I-cache policies. Userspace JITs will
-	 * make use of *minLine
+	 * make use of *minLine.
+	 * If we have differing I-cache policies, report it as the weakest - AIVIVT.
 	 */
-	ARM64_FTR_BITS(FTR_NONSTRICT, FTR_EXACT, 14, 2, 0),	/* L1Ip */
+	ARM64_FTR_BITS(FTR_NONSTRICT, FTR_EXACT, 14, 2, ICACHE_POLICY_AIVIVT),	/* L1Ip */
 	ARM64_FTR_BITS(FTR_STRICT, FTR_EXACT, 4, 10, 0),	/* RAZ */
 	ARM64_FTR_BITS(FTR_STRICT, FTR_LOWER_SAFE, 0, 4, 0),	/* IminLine */
 	ARM64_FTR_END,
-- 
2.7.4

^ permalink raw reply related	[flat|nested] 24+ messages in thread

* [PATCH 2/8] arm64: Use consistent naming for errata handling
  2016-08-18 13:10 [RESEND] [PATCH 0/8] arm64: Work around for mismatched cache line size Suzuki K Poulose
  2016-08-18 13:10 ` [PATCH 1/8] arm64: Set the safe value for L1 icache policy Suzuki K Poulose
@ 2016-08-18 13:10 ` Suzuki K Poulose
  2016-08-18 13:10 ` [PATCH 3/8] arm64: Rearrange CPU errata workaround checks Suzuki K Poulose
                   ` (5 subsequent siblings)
  7 siblings, 0 replies; 24+ messages in thread
From: Suzuki K Poulose @ 2016-08-18 13:10 UTC (permalink / raw)
  To: linux-arm-kernel
  Cc: linux-kernel, will.deacon, catalin.marinas, mark.rutland,
	andre.przywara, Suzuki K Poulose

This is a cosmetic change to rename the functions dealing with
the errata work arounds to be more consistent with their naming.

1) check_local_cpu_errata() => update_cpu_errata_workarounds()
check_local_cpu_errata() actually updates the system's errata work
arounds. So rename it to reflect the same.

2) verify_local_cpu_errata() => verify_local_cpu_errata_workarounds()
Use errata_workarounds instead of _errata.

Signed-off-by: Suzuki K Poulose <suzuki.poulose@arm.com>
---
 arch/arm64/include/asm/cpufeature.h | 4 ++--
 arch/arm64/kernel/cpu_errata.c      | 4 ++--
 arch/arm64/kernel/cpufeature.c      | 2 +-
 arch/arm64/kernel/cpuinfo.c         | 2 +-
 4 files changed, 6 insertions(+), 6 deletions(-)

diff --git a/arch/arm64/include/asm/cpufeature.h b/arch/arm64/include/asm/cpufeature.h
index f6f5e49..aadd946 100644
--- a/arch/arm64/include/asm/cpufeature.h
+++ b/arch/arm64/include/asm/cpufeature.h
@@ -193,10 +193,10 @@ void __init setup_cpu_features(void);
 void update_cpu_capabilities(const struct arm64_cpu_capabilities *caps,
 			    const char *info);
 void enable_cpu_capabilities(const struct arm64_cpu_capabilities *caps);
-void check_local_cpu_errata(void);
+void update_cpu_errata_workarounds(void);
 void __init enable_errata_workarounds(void);
 
-void verify_local_cpu_errata(void);
+void verify_local_cpu_errata_workarounds(void);
 void verify_local_cpu_capabilities(void);
 
 u64 read_system_reg(u32 id);
diff --git a/arch/arm64/kernel/cpu_errata.c b/arch/arm64/kernel/cpu_errata.c
index 82b0fc2..5836b3d 100644
--- a/arch/arm64/kernel/cpu_errata.c
+++ b/arch/arm64/kernel/cpu_errata.c
@@ -116,7 +116,7 @@ const struct arm64_cpu_capabilities arm64_errata[] = {
  * and the related information is freed soon after. If the new CPU requires
  * an errata not detected at boot, fail this CPU.
  */
-void verify_local_cpu_errata(void)
+void verify_local_cpu_errata_workarounds(void)
 {
 	const struct arm64_cpu_capabilities *caps = arm64_errata;
 
@@ -131,7 +131,7 @@ void verify_local_cpu_errata(void)
 		}
 }
 
-void check_local_cpu_errata(void)
+void update_cpu_errata_workarounds(void)
 {
 	update_cpu_capabilities(arm64_errata, "enabling workaround for");
 }
diff --git a/arch/arm64/kernel/cpufeature.c b/arch/arm64/kernel/cpufeature.c
index 43f73e0..a591c35 100644
--- a/arch/arm64/kernel/cpufeature.c
+++ b/arch/arm64/kernel/cpufeature.c
@@ -1017,7 +1017,7 @@ void verify_local_cpu_capabilities(void)
 	if (!sys_caps_initialised)
 		return;
 
-	verify_local_cpu_errata();
+	verify_local_cpu_errata_workarounds();
 	verify_local_cpu_features(arm64_features);
 	verify_local_elf_hwcaps(arm64_elf_hwcaps);
 	if (system_supports_32bit_el0())
diff --git a/arch/arm64/kernel/cpuinfo.c b/arch/arm64/kernel/cpuinfo.c
index ed1b84f..4fa7b73 100644
--- a/arch/arm64/kernel/cpuinfo.c
+++ b/arch/arm64/kernel/cpuinfo.c
@@ -364,7 +364,7 @@ static void __cpuinfo_store_cpu(struct cpuinfo_arm64 *info)
 
 	cpuinfo_detect_icache_policy(info);
 
-	check_local_cpu_errata();
+	update_cpu_errata_workarounds();
 }
 
 void cpuinfo_store_cpu(void)
-- 
2.7.4

^ permalink raw reply related	[flat|nested] 24+ messages in thread

* [PATCH 3/8] arm64: Rearrange CPU errata workaround checks
  2016-08-18 13:10 [RESEND] [PATCH 0/8] arm64: Work around for mismatched cache line size Suzuki K Poulose
  2016-08-18 13:10 ` [PATCH 1/8] arm64: Set the safe value for L1 icache policy Suzuki K Poulose
  2016-08-18 13:10 ` [PATCH 2/8] arm64: Use consistent naming for errata handling Suzuki K Poulose
@ 2016-08-18 13:10 ` Suzuki K Poulose
  2016-08-18 13:10 ` [PATCH 4/8] arm64: insn: Add helpers for adrp offsets Suzuki K Poulose
                   ` (4 subsequent siblings)
  7 siblings, 0 replies; 24+ messages in thread
From: Suzuki K Poulose @ 2016-08-18 13:10 UTC (permalink / raw)
  To: linux-arm-kernel
  Cc: linux-kernel, will.deacon, catalin.marinas, mark.rutland,
	andre.przywara, Suzuki K Poulose

Right now we run through the work around checks on a CPU
from __cpuinfo_store_cpu. There are some problems with that:

1) We initialise the system wide CPU feature registers only after the
Boot CPU updates its cpuinfo. Now, if a work around depends on the
variance of a CPU ID feature (e.g, check for Cache Line size mismatch),
we have no way of performing it cleanly for the boot CPU.

2) It is out of place, invoked from __cpuinfo_store_cpu() in cpuinfo.c. It
is not an obvious place for that.

This patch rearranges the CPU specific capability(aka work around) checks.

1) At the moment we use verify_local_cpu_capabilities() to check if a new
CPU has all the system advertised features. Use this for the secondary CPUs
to perform the work around check. For that we rename
  verify_local_cpu_capabilities() => check_local_cpu_capabilities()
which:

   If the system wide capabilities haven't been initialised (i.e, the CPU
   is activated at the boot), update the system wide detected work arounds.

   Otherwise (i.e a CPU hotplugged in later) verify that this CPU conforms to the
   system wide capabilities.

2) Boot CPU updates the work arounds from smp_prepare_boot_cpu() after we have
initialised the system wide CPU feature values.

Cc: Mark Rutland <mark.rutland@arm.com>
Cc: Andre Przywara <andre.przywara@arm.com>
Cc: Will Deacon <will.deacon@arm.com>
Cc: Catalin Marinas <catalin.marinas@arm.com>
Signed-off-by: Suzuki K Poulose <suzuki.poulose@arm.com>
---
 arch/arm64/include/asm/cpufeature.h |  4 ++--
 arch/arm64/kernel/cpufeature.c      | 30 ++++++++++++++++++++----------
 arch/arm64/kernel/cpuinfo.c         |  2 --
 arch/arm64/kernel/smp.c             |  8 +++++++-
 4 files changed, 29 insertions(+), 15 deletions(-)

diff --git a/arch/arm64/include/asm/cpufeature.h b/arch/arm64/include/asm/cpufeature.h
index aadd946..692b8d3 100644
--- a/arch/arm64/include/asm/cpufeature.h
+++ b/arch/arm64/include/asm/cpufeature.h
@@ -193,11 +193,11 @@ void __init setup_cpu_features(void);
 void update_cpu_capabilities(const struct arm64_cpu_capabilities *caps,
 			    const char *info);
 void enable_cpu_capabilities(const struct arm64_cpu_capabilities *caps);
+void check_local_cpu_capabilities(void);
+
 void update_cpu_errata_workarounds(void);
 void __init enable_errata_workarounds(void);
-
 void verify_local_cpu_errata_workarounds(void);
-void verify_local_cpu_capabilities(void);
 
 u64 read_system_reg(u32 id);
 
diff --git a/arch/arm64/kernel/cpufeature.c b/arch/arm64/kernel/cpufeature.c
index a591c35..fcf87ca 100644
--- a/arch/arm64/kernel/cpufeature.c
+++ b/arch/arm64/kernel/cpufeature.c
@@ -1005,23 +1005,33 @@ verify_local_cpu_features(const struct arm64_cpu_capabilities *caps)
  * cannot do anything to fix it up and could cause unexpected failures. So
  * we park the CPU.
  */
-void verify_local_cpu_capabilities(void)
+static void verify_local_cpu_capabilities(void)
 {
+	verify_local_cpu_errata_workarounds();
+	verify_local_cpu_features(arm64_features);
+	verify_local_elf_hwcaps(arm64_elf_hwcaps);
+	if (system_supports_32bit_el0())
+		verify_local_elf_hwcaps(compat_elf_hwcaps);
+}
 
+void check_local_cpu_capabilities(void)
+{
+	/*
+	 * All secondary CPUs should conform to the early CPU features
+	 * in use by the kernel based on boot CPU.
+	 */
 	check_early_cpu_features();
 
 	/*
-	 * If we haven't computed the system capabilities, there is nothing
-	 * to verify.
+	 * If we haven't finalised the system capabilities, this CPU gets
+	 * a chance to update the errata work arounds.
+	 * Otherwise, this CPU should verify that it has all the system
+	 * advertised capabilities.
 	 */
 	if (!sys_caps_initialised)
-		return;
-
-	verify_local_cpu_errata_workarounds();
-	verify_local_cpu_features(arm64_features);
-	verify_local_elf_hwcaps(arm64_elf_hwcaps);
-	if (system_supports_32bit_el0())
-		verify_local_elf_hwcaps(compat_elf_hwcaps);
+		update_cpu_errata_workarounds();
+	else
+		verify_local_cpu_capabilities();
 }
 
 static void __init setup_feature_capabilities(void)
diff --git a/arch/arm64/kernel/cpuinfo.c b/arch/arm64/kernel/cpuinfo.c
index 4fa7b73..b3d5b3e 100644
--- a/arch/arm64/kernel/cpuinfo.c
+++ b/arch/arm64/kernel/cpuinfo.c
@@ -363,8 +363,6 @@ static void __cpuinfo_store_cpu(struct cpuinfo_arm64 *info)
 	}
 
 	cpuinfo_detect_icache_policy(info);
-
-	update_cpu_errata_workarounds();
 }
 
 void cpuinfo_store_cpu(void)
diff --git a/arch/arm64/kernel/smp.c b/arch/arm64/kernel/smp.c
index 76a6d92..289c43b 100644
--- a/arch/arm64/kernel/smp.c
+++ b/arch/arm64/kernel/smp.c
@@ -239,7 +239,7 @@ asmlinkage void secondary_start_kernel(void)
 	 * this CPU ticks all of those. If it doesn't, the CPU will
 	 * fail to come online.
 	 */
-	verify_local_cpu_capabilities();
+	check_local_cpu_capabilities();
 
 	if (cpu_ops[cpu]->cpu_postboot)
 		cpu_ops[cpu]->cpu_postboot();
@@ -439,6 +439,12 @@ void __init smp_prepare_boot_cpu(void)
 	set_my_cpu_offset(per_cpu_offset(smp_processor_id()));
 	cpuinfo_store_boot_cpu();
 	save_boot_cpu_run_el();
+	/*
+	 * Run the errata work around checks on the boot CPU, once we have
+	 * initialised the cpu feature infrastructure from
+	 * cpuinfo_store_boot_cpu() above.
+	 */
+	update_cpu_errata_workarounds();
 }
 
 static u64 __init of_get_cpu_mpidr(struct device_node *dn)
-- 
2.7.4

^ permalink raw reply related	[flat|nested] 24+ messages in thread

* [PATCH 4/8] arm64: insn: Add helpers for adrp offsets
  2016-08-18 13:10 [RESEND] [PATCH 0/8] arm64: Work around for mismatched cache line size Suzuki K Poulose
                   ` (2 preceding siblings ...)
  2016-08-18 13:10 ` [PATCH 3/8] arm64: Rearrange CPU errata workaround checks Suzuki K Poulose
@ 2016-08-18 13:10 ` Suzuki K Poulose
  2016-08-18 14:47   ` Marc Zyngier
  2016-08-18 13:10 ` [PATCH 5/8] arm64: alternative: Add support for patching adrp instructions Suzuki K Poulose
                   ` (3 subsequent siblings)
  7 siblings, 1 reply; 24+ messages in thread
From: Suzuki K Poulose @ 2016-08-18 13:10 UTC (permalink / raw)
  To: linux-arm-kernel
  Cc: linux-kernel, will.deacon, catalin.marinas, mark.rutland,
	andre.przywara, Suzuki K Poulose

Adds helpers for decoding/encoding the PC relative addresses for adrp.
This will be used for handling dynamic patching of 'adrp' instructions
in alternative code patching.

Cc: Mark Rutland <mark.rutland@arm.com>
Cc: Will Deacon <will.deacon@arm.com>
Cc: Catalin Marinas <catalin.marinas@arm.com>
Signed-off-by: Suzuki K Poulose <suzuki.poulose@arm.com>
---
 arch/arm64/include/asm/insn.h |  4 ++++
 arch/arm64/kernel/insn.c      | 13 +++++++++++++
 2 files changed, 17 insertions(+)

diff --git a/arch/arm64/include/asm/insn.h b/arch/arm64/include/asm/insn.h
index 1dbaa90..dffb0364 100644
--- a/arch/arm64/include/asm/insn.h
+++ b/arch/arm64/include/asm/insn.h
@@ -247,6 +247,7 @@ static __always_inline u32 aarch64_insn_get_##abbr##_value(void) \
 { return (val); }
 
 __AARCH64_INSN_FUNCS(adr_adrp,	0x1F000000, 0x10000000)
+__AARCH64_INSN_FUNCS(adrp,	0x9F000000, 0x90000000)
 __AARCH64_INSN_FUNCS(prfm_lit,	0xFF000000, 0xD8000000)
 __AARCH64_INSN_FUNCS(str_reg,	0x3FE0EC00, 0x38206800)
 __AARCH64_INSN_FUNCS(ldr_reg,	0x3FE0EC00, 0x38606800)
@@ -398,6 +399,9 @@ int aarch64_insn_patch_text_nosync(void *addr, u32 insn);
 int aarch64_insn_patch_text_sync(void *addrs[], u32 insns[], int cnt);
 int aarch64_insn_patch_text(void *addrs[], u32 insns[], int cnt);
 
+s32 aarch64_insn_adrp_get_offset(u32 insn);
+u32 aarch64_insn_adrp_set_offset(u32 insn, s32 offset);
+
 bool aarch32_insn_is_wide(u32 insn);
 
 #define A32_RN_OFFSET	16
diff --git a/arch/arm64/kernel/insn.c b/arch/arm64/kernel/insn.c
index 63f9432..f022af4 100644
--- a/arch/arm64/kernel/insn.c
+++ b/arch/arm64/kernel/insn.c
@@ -1202,6 +1202,19 @@ u32 aarch64_set_branch_offset(u32 insn, s32 offset)
 	BUG();
 }
 
+s32 aarch64_insn_adrp_get_offset(u32 insn)
+{
+	BUG_ON(!aarch64_insn_is_adrp(insn));
+	return aarch64_insn_decode_immediate(AARCH64_INSN_IMM_ADR, insn) << 12;
+}
+
+u32 aarch64_insn_adrp_set_offset(u32 insn, s32 offset)
+{
+	BUG_ON(!aarch64_insn_is_adrp(insn));
+	return aarch64_insn_encode_immediate(AARCH64_INSN_IMM_ADR, insn,
+						offset >> 12);
+}
+
 /*
  * Extract the Op/CR data from a msr/mrs instruction.
  */
-- 
2.7.4

^ permalink raw reply related	[flat|nested] 24+ messages in thread

* [PATCH 5/8] arm64: alternative: Add support for patching adrp instructions
  2016-08-18 13:10 [RESEND] [PATCH 0/8] arm64: Work around for mismatched cache line size Suzuki K Poulose
                   ` (3 preceding siblings ...)
  2016-08-18 13:10 ` [PATCH 4/8] arm64: insn: Add helpers for adrp offsets Suzuki K Poulose
@ 2016-08-18 13:10 ` Suzuki K Poulose
  2016-08-22 11:19   ` Will Deacon
  2016-08-22 11:45   ` Ard Biesheuvel
  2016-08-18 13:10 ` [PATCH 6/8] arm64: Introduce raw_{d,i}cache_line_size Suzuki K Poulose
                   ` (2 subsequent siblings)
  7 siblings, 2 replies; 24+ messages in thread
From: Suzuki K Poulose @ 2016-08-18 13:10 UTC (permalink / raw)
  To: linux-arm-kernel
  Cc: linux-kernel, will.deacon, catalin.marinas, mark.rutland,
	andre.przywara, Suzuki K Poulose, Marc Zyngier

adrp uses PC-relative address offset to a page (of 4K size) of
a symbol. If it appears in an alternative code patched in, we
should adjust the offset to reflect the address where it will
be run from. This patch adds support for fixing the offset
for adrp instructions.

Cc: Will Deacon <will.deacon@arm.com>
Cc: Marc Zyngier <marc.zyngier@arm.com>
Cc: Andre Przywara <andre.przywara@arm.com>
Cc: Mark Rutland <mark.rutland@arm.com>
Signed-off-by: Suzuki K Poulose <suzuki.poulose@arm.com>
---
 arch/arm64/kernel/alternative.c | 13 +++++++++++++
 1 file changed, 13 insertions(+)

diff --git a/arch/arm64/kernel/alternative.c b/arch/arm64/kernel/alternative.c
index d2ee1b2..71c6962 100644
--- a/arch/arm64/kernel/alternative.c
+++ b/arch/arm64/kernel/alternative.c
@@ -80,6 +80,19 @@ static u32 get_alt_insn(struct alt_instr *alt, u32 *insnptr, u32 *altinsnptr)
 			offset = target - (unsigned long)insnptr;
 			insn = aarch64_set_branch_offset(insn, offset);
 		}
+	} else if (aarch64_insn_is_adrp(insn)) {
+		s32 orig_offset, new_offset;
+		unsigned long target;
+
+		/*
+		 * If we're replacing an adrp instruction, which uses PC-relative
+		 * immediate addressing, adjust the offset to reflect the new
+		 * PC. adrp operates on 4K aligned addresses.
+		 */
+		orig_offset  = aarch64_insn_adrp_get_offset(insn);
+		target = ((unsigned long)altinsnptr & ~0xfffUL) + orig_offset;
+		new_offset = target - ((unsigned long)insnptr & ~0xfffUL);
+		insn = aarch64_insn_adrp_set_offset(insn, new_offset);
 	}
 
 	return insn;
-- 
2.7.4

^ permalink raw reply related	[flat|nested] 24+ messages in thread

* [PATCH 6/8] arm64: Introduce raw_{d,i}cache_line_size
  2016-08-18 13:10 [RESEND] [PATCH 0/8] arm64: Work around for mismatched cache line size Suzuki K Poulose
                   ` (4 preceding siblings ...)
  2016-08-18 13:10 ` [PATCH 5/8] arm64: alternative: Add support for patching adrp instructions Suzuki K Poulose
@ 2016-08-18 13:10 ` Suzuki K Poulose
  2016-08-18 17:57   ` Geoff Levand
  2016-08-22 10:00   ` Will Deacon
  2016-08-18 13:10 ` [PATCH 7/8] arm64: Refactor sysinstr exception handling Suzuki K Poulose
  2016-08-18 13:10 ` [PATCH 8/8] arm64: Work around systems with mismatched cache line sizes Suzuki K Poulose
  7 siblings, 2 replies; 24+ messages in thread
From: Suzuki K Poulose @ 2016-08-18 13:10 UTC (permalink / raw)
  To: linux-arm-kernel
  Cc: linux-kernel, will.deacon, catalin.marinas, mark.rutland,
	andre.przywara, Suzuki K Poulose, James Morse, Geoff Levand

On systems with mismatched i/d cache min line sizes, we need to use
the smallest size possible across all CPUs. This will be done by fetching
the system wide safe value from CPU feature infrastructure.
However the some special users(e.g kexec, hibernate) would need the line
size on the CPU (rather than the system wide), when the system wide
feature may not be accessible. Provide another helper which will fetch
cache line size on the current CPU.

Cc: James Morse <james.morse@arm.com>
Cc: Geoff Levand <geoff@infradead.org>
Signed-off-by: Suzuki K Poulose <suzuki.poulose@arm.com>
---
 arch/arm64/include/asm/assembler.h  | 24 ++++++++++++++++++++----
 arch/arm64/kernel/hibernate-asm.S   |  2 +-
 arch/arm64/kernel/relocate_kernel.S |  2 +-
 3 files changed, 22 insertions(+), 6 deletions(-)

diff --git a/arch/arm64/include/asm/assembler.h b/arch/arm64/include/asm/assembler.h
index d5025c6..a4bb3f5 100644
--- a/arch/arm64/include/asm/assembler.h
+++ b/arch/arm64/include/asm/assembler.h
@@ -218,9 +218,10 @@ lr	.req	x30		// link register
 	.endm
 
 /*
- * dcache_line_size - get the minimum D-cache line size from the CTR register.
+ * raw_dcache_line_size - get the minimum D-cache line size on this CPU
+ * from the CTR register.
  */
-	.macro	dcache_line_size, reg, tmp
+	.macro	raw_dcache_line_size, reg, tmp
 	mrs	\tmp, ctr_el0			// read CTR
 	ubfm	\tmp, \tmp, #16, #19		// cache line size encoding
 	mov	\reg, #4			// bytes per word
@@ -228,9 +229,17 @@ lr	.req	x30		// link register
 	.endm
 
 /*
- * icache_line_size - get the minimum I-cache line size from the CTR register.
+ * dcache_line_size - get the safe D-cache line size across all CPUs
  */
-	.macro	icache_line_size, reg, tmp
+	.macro	dcache_line_size, reg, tmp
+	raw_dcache_line_size	\reg, \tmp
+	.endm
+
+/*
+ * raw_icache_line_size - get the minimum I-cache line size on this CPU
+ * from the CTR register.
+ */
+	.macro	raw_icache_line_size, reg, tmp
 	mrs	\tmp, ctr_el0			// read CTR
 	and	\tmp, \tmp, #0xf		// cache line size encoding
 	mov	\reg, #4			// bytes per word
@@ -238,6 +247,13 @@ lr	.req	x30		// link register
 	.endm
 
 /*
+ * icache_line_size - get the safe I-cache line size across all CPUs
+ */
+	.macro	icache_line_size, reg, tmp
+	raw_icache_line_size	\reg, \tmp
+	.endm
+
+/*
  * tcr_set_idmap_t0sz - update TCR.T0SZ so that we can load the ID map
  */
 	.macro	tcr_set_idmap_t0sz, valreg, tmpreg
diff --git a/arch/arm64/kernel/hibernate-asm.S b/arch/arm64/kernel/hibernate-asm.S
index 46f29b6..4ebc6a1 100644
--- a/arch/arm64/kernel/hibernate-asm.S
+++ b/arch/arm64/kernel/hibernate-asm.S
@@ -96,7 +96,7 @@ ENTRY(swsusp_arch_suspend_exit)
 
 	add	x1, x10, #PAGE_SIZE
 	/* Clean the copied page to PoU - based on flush_icache_range() */
-	dcache_line_size x2, x3
+	raw_dcache_line_size x2, x3
 	sub	x3, x2, #1
 	bic	x4, x10, x3
 2:	dc	cvau, x4	/* clean D line / unified line */
diff --git a/arch/arm64/kernel/relocate_kernel.S b/arch/arm64/kernel/relocate_kernel.S
index 51b73cd..ce704a4 100644
--- a/arch/arm64/kernel/relocate_kernel.S
+++ b/arch/arm64/kernel/relocate_kernel.S
@@ -34,7 +34,7 @@ ENTRY(arm64_relocate_new_kernel)
 	/* Setup the list loop variables. */
 	mov	x17, x1				/* x17 = kimage_start */
 	mov	x16, x0				/* x16 = kimage_head */
-	dcache_line_size x15, x0		/* x15 = dcache line size */
+	raw_dcache_line_size x15, x0		/* x15 = dcache line size */
 	mov	x14, xzr			/* x14 = entry ptr */
 	mov	x13, xzr			/* x13 = copy dest */
 
-- 
2.7.4

^ permalink raw reply related	[flat|nested] 24+ messages in thread

* [PATCH 7/8] arm64: Refactor sysinstr exception handling
  2016-08-18 13:10 [RESEND] [PATCH 0/8] arm64: Work around for mismatched cache line size Suzuki K Poulose
                   ` (5 preceding siblings ...)
  2016-08-18 13:10 ` [PATCH 6/8] arm64: Introduce raw_{d,i}cache_line_size Suzuki K Poulose
@ 2016-08-18 13:10 ` Suzuki K Poulose
  2016-08-22 12:53   ` Will Deacon
  2016-08-18 13:10 ` [PATCH 8/8] arm64: Work around systems with mismatched cache line sizes Suzuki K Poulose
  7 siblings, 1 reply; 24+ messages in thread
From: Suzuki K Poulose @ 2016-08-18 13:10 UTC (permalink / raw)
  To: linux-arm-kernel
  Cc: linux-kernel, will.deacon, catalin.marinas, mark.rutland,
	andre.przywara, Suzuki K Poulose

Right now we trap some of the user space data cache operations
based on a few Errata (ARM 819472, 826319, 827319 and 824069).
We need to trap userspace access to CTR_EL0, if we detect mismatched
cache line size. Since both these traps share the EC, refactor
the handler a little bit to make it a bit more reader friendly.

Cc: Andre Przywara <andre.przywara@arm.com>
Cc: Mark Rutland <mark.rutland@arm.com>
Cc: Will Deacon <will.deacon@arm.com>
Cc: Catalin Marinas <catalin.marinas@arm.com>
Signed-off-by: Suzuki K Poulose <suzuki.poulose@arm.com>
---
 arch/arm64/include/asm/esr.h | 48 +++++++++++++++++++++++++++++
 arch/arm64/kernel/traps.c    | 73 ++++++++++++++++++++++++++++----------------
 2 files changed, 95 insertions(+), 26 deletions(-)

diff --git a/arch/arm64/include/asm/esr.h b/arch/arm64/include/asm/esr.h
index f772e15..2a8f6c3 100644
--- a/arch/arm64/include/asm/esr.h
+++ b/arch/arm64/include/asm/esr.h
@@ -109,6 +109,54 @@
 	((ESR_ELx_EC_BRK64 << ESR_ELx_EC_SHIFT) | ESR_ELx_IL |	\
 	 ((imm) & 0xffff))
 
+/* ISS field definitions for System instruction traps */
+#define ESR_ELx_SYS64_ISS_RES0_SHIFT	22
+#define ESR_ELx_SYS64_ISS_RES0_MASK	(UL(0x7) << ESR_ELx_SYS64_ISS_RES0_SHIFT)
+#define ESR_ELx_SYS64_ISS_DIR_MASK	0x1
+#define ESR_ELx_SYS64_ISS_DIR_READ	0x1
+#define ESR_ELx_SYS64_ISS_DIR_WRITE	0x0
+
+#define ESR_ELx_SYS64_ISS_RT_SHIFT	5
+#define ESR_ELx_SYS64_ISS_RT_MASK	(UL(0x1f) << ESR_ELx_SYS64_ISS_RT_SHIFT)
+#define ESR_ELx_SYS64_ISS_CRm_SHIFT	1
+#define ESR_ELx_SYS64_ISS_CRm_MASK	(UL(0xf) << ESR_ELx_SYS64_ISS_CRm_SHIFT)
+#define ESR_ELx_SYS64_ISS_CRn_SHIFT	10
+#define ESR_ELx_SYS64_ISS_CRn_MASK	(UL(0xf) << ESR_ELx_SYS64_ISS_CRn_SHIFT)
+#define ESR_ELx_SYS64_ISS_Op1_SHIFT	14
+#define ESR_ELx_SYS64_ISS_Op1_MASK	(UL(0x7) << ESR_ELx_SYS64_ISS_Op1_SHIFT)
+#define ESR_ELx_SYS64_ISS_Op2_SHIFT	17
+#define ESR_ELx_SYS64_ISS_Op2_MASK	(UL(0x7) << ESR_ELx_SYS64_ISS_Op2_SHIFT)
+#define ESR_ELx_SYS64_ISS_Op0_SHIFT	20
+#define ESR_ELx_SYS64_ISS_Op0_MASK	(UL(0x3) << ESR_ELx_SYS64_ISS_Op0_SHIFT)
+#define ESR_ELx_SYS64_ISS_SYS_MASK	(ESR_ELx_SYS64_ISS_Op0_MASK | \
+					 ESR_ELx_SYS64_ISS_Op1_MASK | \
+					 ESR_ELx_SYS64_ISS_Op2_MASK | \
+					 ESR_ELx_SYS64_ISS_CRn_MASK | \
+					 ESR_ELx_SYS64_ISS_CRm_MASK)
+#define ESR_ELx_SYS64_ISS_SYS_VAL(Op0, Op1, Op2, CRn, CRm) \
+					(((Op0) << ESR_ELx_SYS64_ISS_Op0_SHIFT) | \
+					 ((Op1) << ESR_ELx_SYS64_ISS_Op1_SHIFT) | \
+					 ((Op2) << ESR_ELx_SYS64_ISS_Op2_SHIFT) | \
+					 ((CRn) << ESR_ELx_SYS64_ISS_CRn_SHIFT) | \
+					 ((CRm) << ESR_ELx_SYS64_ISS_CRm_SHIFT))
+/*
+ * User space cache operations have the following sysreg encoding
+ * in System instructions.
+ * Op0=1, Op1=3, Op2=1, CRn=7, CRm={ 5, 10, 11, 14 }, WRITE (L=0)
+ */
+#define ESR_ELx_SYS64_ISS_CRm_DC_CIVAC	14
+#define ESR_ELx_SYS64_ISS_CRm_DC_CVAU	11
+#define ESR_ELx_SYS64_ISS_CRm_DC_CVAC	10
+#define ESR_ELx_SYS64_ISS_CRm_IC_IVAU	5
+
+#define ESR_ELx_SYS64_ISS_U_CACHE_OP_MASK	(ESR_ELx_SYS64_ISS_Op0_MASK | \
+						 ESR_ELx_SYS64_ISS_Op1_MASK | \
+						 ESR_ELx_SYS64_ISS_Op2_MASK | \
+						 ESR_ELx_SYS64_ISS_CRn_MASK | \
+						 ESR_ELx_SYS64_ISS_DIR_MASK)
+#define ESR_ELx_SYS64_ISS_U_CACHE_OP_VAL \
+				(ESR_ELx_SYS64_ISS_SYS_VAL(1, 3, 1, 7, 0) | \
+				 ESR_ELx_SYS64_ISS_DIR_WRITE)
 #ifndef __ASSEMBLY__
 #include <asm/types.h>
 
diff --git a/arch/arm64/kernel/traps.c b/arch/arm64/kernel/traps.c
index e04f838..93c5287 100644
--- a/arch/arm64/kernel/traps.c
+++ b/arch/arm64/kernel/traps.c
@@ -447,36 +447,29 @@ void cpu_enable_cache_maint_trap(void *__unused)
 		: "=r" (res)					\
 		: "r" (address), "i" (-EFAULT) )
 
-asmlinkage void __exception do_sysinstr(unsigned int esr, struct pt_regs *regs)
+static void user_cache_maint_handler(unsigned int esr, struct pt_regs *regs)
 {
 	unsigned long address;
-	int ret;
+	int rt = (esr & ESR_ELx_SYS64_ISS_RT_MASK) >> ESR_ELx_SYS64_ISS_RT_SHIFT;
+	int crm = (esr & ESR_ELx_SYS64_ISS_CRm_MASK) >> ESR_ELx_SYS64_ISS_CRm_SHIFT;
+	int ret = 0;
 
-	/* if this is a write with: Op0=1, Op2=1, Op1=3, CRn=7 */
-	if ((esr & 0x01fffc01) == 0x0012dc00) {
-		int rt = (esr >> 5) & 0x1f;
-		int crm = (esr >> 1) & 0x0f;
+	address = (rt == 31) ? 0 : regs->regs[rt];
 
-		address = (rt == 31) ? 0 : regs->regs[rt];
-
-		switch (crm) {
-		case 11:		/* DC CVAU, gets promoted */
-			__user_cache_maint("dc civac", address, ret);
-			break;
-		case 10:		/* DC CVAC, gets promoted */
-			__user_cache_maint("dc civac", address, ret);
-			break;
-		case 14:		/* DC CIVAC */
-			__user_cache_maint("dc civac", address, ret);
-			break;
-		case 5:			/* IC IVAU */
-			__user_cache_maint("ic ivau", address, ret);
-			break;
-		default:
-			force_signal_inject(SIGILL, ILL_ILLOPC, regs, 0);
-			return;
-		}
-	} else {
+	switch (crm) {
+	case ESR_ELx_SYS64_ISS_CRm_DC_CVAU:	/* DC CVAU, gets promoted */
+		__user_cache_maint("dc civac", address, ret);
+		break;
+	case ESR_ELx_SYS64_ISS_CRm_DC_CVAC:	/* DC CVAC, gets promoted */
+		__user_cache_maint("dc civac", address, ret);
+		break;
+	case ESR_ELx_SYS64_ISS_CRm_DC_CIVAC:	/* DC CIVAC */
+		__user_cache_maint("dc civac", address, ret);
+		break;
+	case ESR_ELx_SYS64_ISS_CRm_IC_IVAU:	/* IC IVAU */
+		__user_cache_maint("ic ivau", address, ret);
+		break;
+	default:
 		force_signal_inject(SIGILL, ILL_ILLOPC, regs, 0);
 		return;
 	}
@@ -487,6 +480,34 @@ asmlinkage void __exception do_sysinstr(unsigned int esr, struct pt_regs *regs)
 		regs->pc += 4;
 }
 
+struct sys64_hook {
+	unsigned int esr_mask;
+	unsigned int esr_val;
+	void (*handler)(unsigned int esr, struct pt_regs *regs);
+};
+
+static struct sys64_hook sys64_hooks[] = {
+	{
+		.esr_mask = ESR_ELx_SYS64_ISS_U_CACHE_OP_MASK,
+		.esr_val = ESR_ELx_SYS64_ISS_U_CACHE_OP_VAL,
+		.handler = user_cache_maint_handler,
+	},
+	{},
+};
+
+asmlinkage void __exception do_sysinstr(unsigned int esr, struct pt_regs *regs)
+{
+	struct sys64_hook *hook;
+
+	for (hook = sys64_hooks; hook->handler; hook++)
+		if ((hook->esr_mask & esr) == hook->esr_val) {
+			hook->handler(esr, regs);
+			return;
+		}
+
+	force_signal_inject(SIGILL, ILL_ILLOPC, regs, 0);
+}
+
 long compat_arm_syscall(struct pt_regs *regs);
 
 asmlinkage long do_ni_syscall(struct pt_regs *regs)
-- 
2.7.4

^ permalink raw reply related	[flat|nested] 24+ messages in thread

* [PATCH 8/8] arm64: Work around systems with mismatched cache line sizes
  2016-08-18 13:10 [RESEND] [PATCH 0/8] arm64: Work around for mismatched cache line size Suzuki K Poulose
                   ` (6 preceding siblings ...)
  2016-08-18 13:10 ` [PATCH 7/8] arm64: Refactor sysinstr exception handling Suzuki K Poulose
@ 2016-08-18 13:10 ` Suzuki K Poulose
  2016-08-22 13:02   ` Will Deacon
  7 siblings, 1 reply; 24+ messages in thread
From: Suzuki K Poulose @ 2016-08-18 13:10 UTC (permalink / raw)
  To: linux-arm-kernel
  Cc: linux-kernel, will.deacon, catalin.marinas, mark.rutland,
	andre.przywara, Suzuki K Poulose

Systems with differing CPU i-cache/d-cache line sizes can cause
problems with the cache management by software when the execution
is migrated from one to another. Usually, the application reads
the cache size on a CPU and then uses that length to perform cache
operations. However, if it gets migrated to another CPU with a smaller
cache line size, things could go completely wrong. To prevent such
cases, always use the smallest cache line size among the CPUs. The
kernel CPU feature infrastructure already keeps track of the safe
value for all CPUID registers including CTR. This patch works around
the problem by :

For kernel, dynamically patch the kernel to read the cache size
from the system wide copy of CTR_EL0.

For applications, trap read accesses to CTR_EL0 (by clearing the SCTLR.UCT)
and emulate the mrs instruction to return the system wide safe value
of CTR_EL0.

For faster access (i.e, avoiding to lookup the system wide value of CTR_EL0
via read_system_reg), we keep track of the pointer to table entry for
CTR_EL0 in the CPU feature infrastructure.

Cc: Mark Rutland <mark.rutland@arm.com>
Cc: Andre Przywara <andre.przywara@arm.com>
Cc: Will Deacon <will.deacon@arm.com>
Cc: Catalin Marinas <catalin.marinas@arm.com>
Signed-off-by: Suzuki K Poulose <suzuki.poulose@arm.com>
---
 arch/arm64/include/asm/assembler.h  | 25 +++++++++++++++++++++++--
 arch/arm64/include/asm/cpufeature.h |  4 +++-
 arch/arm64/include/asm/esr.h        |  8 ++++++++
 arch/arm64/include/asm/sysreg.h     |  1 +
 arch/arm64/kernel/asm-offsets.c     |  2 ++
 arch/arm64/kernel/cpu_errata.c      | 22 ++++++++++++++++++++++
 arch/arm64/kernel/cpufeature.c      |  9 +++++++++
 arch/arm64/kernel/traps.c           | 14 ++++++++++++++
 8 files changed, 82 insertions(+), 3 deletions(-)

diff --git a/arch/arm64/include/asm/assembler.h b/arch/arm64/include/asm/assembler.h
index a4bb3f5..66bd268 100644
--- a/arch/arm64/include/asm/assembler.h
+++ b/arch/arm64/include/asm/assembler.h
@@ -216,6 +216,21 @@ lr	.req	x30		// link register
 	.macro	mmid, rd, rn
 	ldr	\rd, [\rn, #MM_CONTEXT_ID]
 	.endm
+/*
+ * read_ctr - read CTR_EL0. If the system has mismatched
+ * cache line sizes, provide the system wide safe value.
+ */
+	.macro	read_ctr, reg
+alternative_if_not ARM64_MISMATCHED_CACHE_LINE_SIZE
+	mrs	\reg, ctr_el0			// read CTR
+	nop
+	nop
+alternative_else
+	ldr_l	\reg, sys_ctr_ftr		// Read system wide safe CTR value
+	ldr	\reg, [\reg, #ARM64_FTR_SYSVAL] // from sys_ctr_ftr->sys_val
+alternative_endif
+	.endm
+
 
 /*
  * raw_dcache_line_size - get the minimum D-cache line size on this CPU
@@ -232,7 +247,10 @@ lr	.req	x30		// link register
  * dcache_line_size - get the safe D-cache line size across all CPUs
  */
 	.macro	dcache_line_size, reg, tmp
-	raw_dcache_line_size	\reg, \tmp
+	read_ctr	\tmp
+	ubfm		\tmp, \tmp, #16, #19	// cache line size encoding
+	mov		\reg, #4		// bytes per word
+	lsl		\reg, \reg, \tmp	// actual cache line size
 	.endm
 
 /*
@@ -250,7 +268,10 @@ lr	.req	x30		// link register
  * icache_line_size - get the safe I-cache line size across all CPUs
  */
 	.macro	icache_line_size, reg, tmp
-	raw_icache_line_size	\reg, \tmp
+	read_ctr	\tmp
+	and		\tmp, \tmp, #0xf	// cache line size encoding
+	mov		\reg, #4		// bytes per word
+	lsl		\reg, \reg, \tmp	// actual cache line size
 	.endm
 
 /*
diff --git a/arch/arm64/include/asm/cpufeature.h b/arch/arm64/include/asm/cpufeature.h
index 692b8d3..e99f2af 100644
--- a/arch/arm64/include/asm/cpufeature.h
+++ b/arch/arm64/include/asm/cpufeature.h
@@ -37,8 +37,9 @@
 #define ARM64_WORKAROUND_CAVIUM_27456		12
 #define ARM64_HAS_32BIT_EL0			13
 #define ARM64_HYP_OFFSET_LOW			14
+#define ARM64_MISMATCHED_CACHE_LINE_SIZE	15
 
-#define ARM64_NCAPS				15
+#define ARM64_NCAPS				16
 
 #ifndef __ASSEMBLY__
 
@@ -109,6 +110,7 @@ struct arm64_cpu_capabilities {
 };
 
 extern DECLARE_BITMAP(cpu_hwcaps, ARM64_NCAPS);
+extern struct arm64_ftr_reg *sys_ctr_ftr;
 
 bool this_cpu_has_cap(unsigned int cap);
 
diff --git a/arch/arm64/include/asm/esr.h b/arch/arm64/include/asm/esr.h
index 2a8f6c3..51aea89 100644
--- a/arch/arm64/include/asm/esr.h
+++ b/arch/arm64/include/asm/esr.h
@@ -139,6 +139,9 @@
 					 ((Op2) << ESR_ELx_SYS64_ISS_Op2_SHIFT) | \
 					 ((CRn) << ESR_ELx_SYS64_ISS_CRn_SHIFT) | \
 					 ((CRm) << ESR_ELx_SYS64_ISS_CRm_SHIFT))
+
+#define ESR_ELx_SYS64_ISS_SYS_OP_MASK	(ESR_ELx_SYS64_ISS_SYS_MASK | \
+					 ESR_ELx_SYS64_ISS_DIR_MASK)
 /*
  * User space cache operations have the following sysreg encoding
  * in System instructions.
@@ -157,6 +160,11 @@
 #define ESR_ELx_SYS64_ISS_U_CACHE_OP_VAL \
 				(ESR_ELx_SYS64_ISS_SYS_VAL(1, 3, 1, 7, 0) | \
 				 ESR_ELx_SYS64_ISS_DIR_WRITE)
+
+#define ESR_ELx_SYS64_ISS_SYS_CTR	ESR_ELx_SYS64_ISS_SYS_VAL(3, 3, 1, 0, 0)
+#define ESR_ELx_SYS64_ISS_SYS_CTR_READ	(ESR_ELx_SYS64_ISS_SYS_CTR | \
+					 ESR_ELx_SYS64_ISS_DIR_READ)
+
 #ifndef __ASSEMBLY__
 #include <asm/types.h>
 
diff --git a/arch/arm64/include/asm/sysreg.h b/arch/arm64/include/asm/sysreg.h
index cc06794..4f26ad5 100644
--- a/arch/arm64/include/asm/sysreg.h
+++ b/arch/arm64/include/asm/sysreg.h
@@ -100,6 +100,7 @@
 /* SCTLR_EL1 specific flags. */
 #define SCTLR_EL1_UCI		(1 << 26)
 #define SCTLR_EL1_SPAN		(1 << 23)
+#define SCTLR_EL1_UCT		(1 << 15)
 #define SCTLR_EL1_SED		(1 << 8)
 #define SCTLR_EL1_CP15BEN	(1 << 5)
 
diff --git a/arch/arm64/kernel/asm-offsets.c b/arch/arm64/kernel/asm-offsets.c
index 05070b7..4a2f0f0 100644
--- a/arch/arm64/kernel/asm-offsets.c
+++ b/arch/arm64/kernel/asm-offsets.c
@@ -23,6 +23,7 @@
 #include <linux/dma-mapping.h>
 #include <linux/kvm_host.h>
 #include <linux/suspend.h>
+#include <asm/cpufeature.h>
 #include <asm/thread_info.h>
 #include <asm/memory.h>
 #include <asm/smp_plat.h>
@@ -145,5 +146,6 @@ int main(void)
   DEFINE(HIBERN_PBE_ORIG,	offsetof(struct pbe, orig_address));
   DEFINE(HIBERN_PBE_ADDR,	offsetof(struct pbe, address));
   DEFINE(HIBERN_PBE_NEXT,	offsetof(struct pbe, next));
+  DEFINE(ARM64_FTR_SYSVAL,	offsetof(struct arm64_ftr_reg, sys_val));
   return 0;
 }
diff --git a/arch/arm64/kernel/cpu_errata.c b/arch/arm64/kernel/cpu_errata.c
index 5836b3d..621fa7f 100644
--- a/arch/arm64/kernel/cpu_errata.c
+++ b/arch/arm64/kernel/cpu_errata.c
@@ -30,6 +30,21 @@ is_affected_midr_range(const struct arm64_cpu_capabilities *entry, int scope)
 				       entry->midr_range_max);
 }
 
+static bool
+has_mismatched_cache_line_size(const struct arm64_cpu_capabilities *entry,
+				int scope)
+{
+	WARN_ON(scope != SCOPE_LOCAL_CPU || preemptible());
+	return (read_cpuid_cachetype() & sys_ctr_ftr->strict_mask) !=
+		 (sys_ctr_ftr->sys_val & sys_ctr_ftr->strict_mask);
+}
+
+static void cpu_enable_trap_ctr_access(void *__unused)
+{
+	/* Clear SCTLR_EL1.UCT */
+	config_sctlr_el1(SCTLR_EL1_UCT, 0);
+}
+
 #define MIDR_RANGE(model, min, max) \
 	.def_scope = SCOPE_LOCAL_CPU, \
 	.matches = is_affected_midr_range, \
@@ -108,6 +123,13 @@ const struct arm64_cpu_capabilities arm64_errata[] = {
 	},
 #endif
 	{
+		.desc = "Mismatched cache line size",
+		.capability = ARM64_MISMATCHED_CACHE_LINE_SIZE,
+		.matches = has_mismatched_cache_line_size,
+		.def_scope = SCOPE_LOCAL_CPU,
+		.enable = cpu_enable_trap_ctr_access,
+	},
+	{
 	}
 };
 
diff --git a/arch/arm64/kernel/cpufeature.c b/arch/arm64/kernel/cpufeature.c
index fcf87ca..c1db313 100644
--- a/arch/arm64/kernel/cpufeature.c
+++ b/arch/arm64/kernel/cpufeature.c
@@ -46,6 +46,13 @@ unsigned int compat_elf_hwcap2 __read_mostly;
 
 DECLARE_BITMAP(cpu_hwcaps, ARM64_NCAPS);
 
+/*
+ * If we have mismatched cache line sizes, we need to emulate access to
+ * CTRL_EL0. To avoid a lookup everytime (via read_system_reg()), cache
+ * the table entry for CTR_EL0.
+ */
+struct arm64_ftr_reg *sys_ctr_ftr;
+
 #define __ARM64_FTR_BITS(SIGNED, STRICT, TYPE, SHIFT, WIDTH, SAFE_VAL) \
 	{						\
 		.sign = SIGNED,				\
@@ -461,6 +468,8 @@ void __init init_cpu_features(struct cpuinfo_arm64 *info)
 		init_cpu_ftr_reg(SYS_MVFR2_EL1, info->reg_mvfr2);
 	}
 
+	/* Keep track of the CTR feature register */
+	sys_ctr_ftr = get_arm64_ftr_reg(SYS_CTR_EL0);
 }
 
 static void update_cpu_ftr_reg(struct arm64_ftr_reg *reg, u64 new)
diff --git a/arch/arm64/kernel/traps.c b/arch/arm64/kernel/traps.c
index 93c5287..db2d6cb 100644
--- a/arch/arm64/kernel/traps.c
+++ b/arch/arm64/kernel/traps.c
@@ -480,6 +480,14 @@ static void user_cache_maint_handler(unsigned int esr, struct pt_regs *regs)
 		regs->pc += 4;
 }
 
+static void ctr_read_handler(unsigned int esr, struct pt_regs *regs)
+{
+	int rt = (esr & ESR_ELx_SYS64_ISS_RT_MASK) >> ESR_ELx_SYS64_ISS_RT_SHIFT;
+
+	regs->regs[rt] = sys_ctr_ftr->sys_val;
+	regs->pc += 4;
+}
+
 struct sys64_hook {
 	unsigned int esr_mask;
 	unsigned int esr_val;
@@ -492,6 +500,12 @@ static struct sys64_hook sys64_hooks[] = {
 		.esr_val = ESR_ELx_SYS64_ISS_U_CACHE_OP_VAL,
 		.handler = user_cache_maint_handler,
 	},
+	{
+		/* Trap read access to CTR_EL0 */
+		.esr_mask = ESR_ELx_SYS64_ISS_SYS_OP_MASK,
+		.esr_val = ESR_ELx_SYS64_ISS_SYS_CTR_READ,
+		.handler = ctr_read_handler,
+	},
 	{},
 };
 
-- 
2.7.4

^ permalink raw reply related	[flat|nested] 24+ messages in thread

* Re: [PATCH 4/8] arm64: insn: Add helpers for adrp offsets
  2016-08-18 13:10 ` [PATCH 4/8] arm64: insn: Add helpers for adrp offsets Suzuki K Poulose
@ 2016-08-18 14:47   ` Marc Zyngier
  2016-08-18 14:52     ` Suzuki K Poulose
  0 siblings, 1 reply; 24+ messages in thread
From: Marc Zyngier @ 2016-08-18 14:47 UTC (permalink / raw)
  To: Suzuki K Poulose, linux-arm-kernel
  Cc: mark.rutland, catalin.marinas, will.deacon, linux-kernel, andre.przywara

Hi Suzuki,

On 18/08/16 14:10, Suzuki K Poulose wrote:
> Adds helpers for decoding/encoding the PC relative addresses for adrp.
> This will be used for handling dynamic patching of 'adrp' instructions
> in alternative code patching.
> 
> Cc: Mark Rutland <mark.rutland@arm.com>
> Cc: Will Deacon <will.deacon@arm.com>
> Cc: Catalin Marinas <catalin.marinas@arm.com>
> Signed-off-by: Suzuki K Poulose <suzuki.poulose@arm.com>
> ---
>  arch/arm64/include/asm/insn.h |  4 ++++
>  arch/arm64/kernel/insn.c      | 13 +++++++++++++
>  2 files changed, 17 insertions(+)
> 
> diff --git a/arch/arm64/include/asm/insn.h b/arch/arm64/include/asm/insn.h
> index 1dbaa90..dffb0364 100644
> --- a/arch/arm64/include/asm/insn.h
> +++ b/arch/arm64/include/asm/insn.h
> @@ -247,6 +247,7 @@ static __always_inline u32 aarch64_insn_get_##abbr##_value(void) \
>  { return (val); }
>  
>  __AARCH64_INSN_FUNCS(adr_adrp,	0x1F000000, 0x10000000)
> +__AARCH64_INSN_FUNCS(adrp,	0x9F000000, 0x90000000)

I'm a bit bothered by this one. We end-up with both
aarch64_insn_is_adr_adrp() *and* aarch64_insn_is_adrp() (and their
respective getters).

How about dropping adr_adrp, and explicitly having adr and adrp? There
is only two users in the tree, so that should be easy to address.

Thanks,

	M.
-- 
Jazz is not dead. It just smells funny...

^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: [PATCH 4/8] arm64: insn: Add helpers for adrp offsets
  2016-08-18 14:47   ` Marc Zyngier
@ 2016-08-18 14:52     ` Suzuki K Poulose
  0 siblings, 0 replies; 24+ messages in thread
From: Suzuki K Poulose @ 2016-08-18 14:52 UTC (permalink / raw)
  To: Marc Zyngier, linux-arm-kernel
  Cc: mark.rutland, catalin.marinas, will.deacon, linux-kernel, andre.przywara

On 18/08/16 15:47, Marc Zyngier wrote:
> Hi Suzuki,
>
> On 18/08/16 14:10, Suzuki K Poulose wrote:
>> Adds helpers for decoding/encoding the PC relative addresses for adrp.
>> This will be used for handling dynamic patching of 'adrp' instructions
>> in alternative code patching.
>>
>> Cc: Mark Rutland <mark.rutland@arm.com>
>> Cc: Will Deacon <will.deacon@arm.com>
>> Cc: Catalin Marinas <catalin.marinas@arm.com>
>> Signed-off-by: Suzuki K Poulose <suzuki.poulose@arm.com>
>> ---
>>  arch/arm64/include/asm/insn.h |  4 ++++
>>  arch/arm64/kernel/insn.c      | 13 +++++++++++++
>>  2 files changed, 17 insertions(+)
>>
>> diff --git a/arch/arm64/include/asm/insn.h b/arch/arm64/include/asm/insn.h
>> index 1dbaa90..dffb0364 100644
>> --- a/arch/arm64/include/asm/insn.h
>> +++ b/arch/arm64/include/asm/insn.h
>> @@ -247,6 +247,7 @@ static __always_inline u32 aarch64_insn_get_##abbr##_value(void) \
>>  { return (val); }
>>
>>  __AARCH64_INSN_FUNCS(adr_adrp,	0x1F000000, 0x10000000)
>> +__AARCH64_INSN_FUNCS(adrp,	0x9F000000, 0x90000000)
>
> I'm a bit bothered by this one. We end-up with both
> aarch64_insn_is_adr_adrp() *and* aarch64_insn_is_adrp() (and their
> respective getters).

You're right. It doesn't look good.  

> How about dropping adr_adrp, and explicitly having adr and adrp? There
> is only two users in the tree, so that should be easy to address.

Sounds good, will update if for v2.

Cheers
Suzuki

^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: [PATCH 6/8] arm64: Introduce raw_{d,i}cache_line_size
  2016-08-18 13:10 ` [PATCH 6/8] arm64: Introduce raw_{d,i}cache_line_size Suzuki K Poulose
@ 2016-08-18 17:57   ` Geoff Levand
  2016-08-22 10:00   ` Will Deacon
  1 sibling, 0 replies; 24+ messages in thread
From: Geoff Levand @ 2016-08-18 17:57 UTC (permalink / raw)
  To: Suzuki K Poulose, linux-arm-kernel
  Cc: linux-kernel, will.deacon, catalin.marinas, mark.rutland,
	andre.przywara, James Morse

On Thu, 2016-08-18 at 14:10 +0100, Suzuki K Poulose wrote:
> On systems with mismatched i/d cache min line sizes, we need to use
> the smallest size possible across all CPUs. This will be done by fetching
> the system wide safe value from CPU feature infrastructure.
> However the some special users(e.g kexec, hibernate) would need the line
> size on the CPU (rather than the system wide), when the system wide
> feature may not be accessible. Provide another helper which will fetch
> cache line size on the current CPU.
> 
> Cc: James Morse <james.morse@arm.com>
> Cc: Geoff Levand <geoff@infradead.org>
> Signed-off-by: Suzuki K Poulose <suzuki.poulose@arm.com>
> ---
>  arch/arm64/include/asm/assembler.h  | 24 ++++++++++++++++++++----
>  arch/arm64/kernel/hibernate-asm.S   |  2 +-
>  arch/arm64/kernel/relocate_kernel.S |  2 +-
>  3 files changed, 22 insertions(+), 6 deletions(-)
> 
> diff --git a/arch/arm64/include/asm/assembler.h b/arch/arm64/include/asm/assembler.h
> index d5025c6..a4bb3f5 100644
> --- a/arch/arm64/include/asm/assembler.h
> +++ b/arch/arm64/include/asm/assembler.h
> @@ -218,9 +218,10 @@ lr	.req	x30		// link register
>  	.endm
>  
>  /*
> - * dcache_line_size - get the minimum D-cache line size from the CTR register.
> + * raw_dcache_line_size - get the minimum D-cache line size on this CPU
> + * from the CTR register.
>   */
> -	.macro	dcache_line_size, reg, tmp
> +	.macro	raw_dcache_line_size, reg, tmp
>  	mrs	\tmp, ctr_el0			// read CTR
>  	ubfm	\tmp, \tmp, #16, #19		// cache line size encoding
>  	mov	\reg, #4			// bytes per word
> @@ -228,9 +229,17 @@ lr	.req	x30		// link register
>  	.endm

...

> +++ b/arch/arm64/kernel/relocate_kernel.S
> @@ -34,7 +34,7 @@ ENTRY(arm64_relocate_new_kernel)
>  	/* Setup the list loop variables. */
>  	mov	x17, x1				/* x17 = kimage_start */
>  	mov	x16, x0				/* x16 = kimage_head */
> -	dcache_line_size x15, x0		/* x15 = dcache line size */
> +	raw_dcache_line_size x15, x0		/* x15 = dcache line size */
>  	mov	x14, xzr			/* x14 = entry ptr */
>  	mov	x13, xzr			/* x13 = copy dest */

Since this is just renaming dcache_line_size to raw_dcache_line_size,
and for kexec's relocate_kernel we need to know about the CPU we are
running on, this part of the change looks good.

Reviewed by: Geoff Levand <geoff@infradead.org>

^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: [PATCH 6/8] arm64: Introduce raw_{d,i}cache_line_size
  2016-08-18 13:10 ` [PATCH 6/8] arm64: Introduce raw_{d,i}cache_line_size Suzuki K Poulose
  2016-08-18 17:57   ` Geoff Levand
@ 2016-08-22 10:00   ` Will Deacon
  2016-08-23 10:07     ` Suzuki K Poulose
  1 sibling, 1 reply; 24+ messages in thread
From: Will Deacon @ 2016-08-22 10:00 UTC (permalink / raw)
  To: Suzuki K Poulose
  Cc: linux-arm-kernel, linux-kernel, catalin.marinas, mark.rutland,
	andre.przywara, James Morse, Geoff Levand

On Thu, Aug 18, 2016 at 02:10:30PM +0100, Suzuki K Poulose wrote:
> On systems with mismatched i/d cache min line sizes, we need to use
> the smallest size possible across all CPUs. This will be done by fetching
> the system wide safe value from CPU feature infrastructure.
> However the some special users(e.g kexec, hibernate) would need the line
> size on the CPU (rather than the system wide), when the system wide
> feature may not be accessible. Provide another helper which will fetch
> cache line size on the current CPU.

Why are these users "special"? Using a smaller line size shouldn't affect
correctness, and I don't see kexec and hibernate as being performance
critical in their cache maintenance.

Will

^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: [PATCH 5/8] arm64: alternative: Add support for patching adrp instructions
  2016-08-18 13:10 ` [PATCH 5/8] arm64: alternative: Add support for patching adrp instructions Suzuki K Poulose
@ 2016-08-22 11:19   ` Will Deacon
  2016-08-23  9:39     ` Suzuki K Poulose
  2016-08-22 11:45   ` Ard Biesheuvel
  1 sibling, 1 reply; 24+ messages in thread
From: Will Deacon @ 2016-08-22 11:19 UTC (permalink / raw)
  To: Suzuki K Poulose
  Cc: linux-arm-kernel, linux-kernel, catalin.marinas, mark.rutland,
	andre.przywara, Marc Zyngier

On Thu, Aug 18, 2016 at 02:10:29PM +0100, Suzuki K Poulose wrote:
> adrp uses PC-relative address offset to a page (of 4K size) of
> a symbol. If it appears in an alternative code patched in, we
> should adjust the offset to reflect the address where it will
> be run from. This patch adds support for fixing the offset
> for adrp instructions.
> 
> Cc: Will Deacon <will.deacon@arm.com>
> Cc: Marc Zyngier <marc.zyngier@arm.com>
> Cc: Andre Przywara <andre.przywara@arm.com>
> Cc: Mark Rutland <mark.rutland@arm.com>
> Signed-off-by: Suzuki K Poulose <suzuki.poulose@arm.com>
> ---
>  arch/arm64/kernel/alternative.c | 13 +++++++++++++
>  1 file changed, 13 insertions(+)
> 
> diff --git a/arch/arm64/kernel/alternative.c b/arch/arm64/kernel/alternative.c
> index d2ee1b2..71c6962 100644
> --- a/arch/arm64/kernel/alternative.c
> +++ b/arch/arm64/kernel/alternative.c
> @@ -80,6 +80,19 @@ static u32 get_alt_insn(struct alt_instr *alt, u32 *insnptr, u32 *altinsnptr)
>  			offset = target - (unsigned long)insnptr;
>  			insn = aarch64_set_branch_offset(insn, offset);
>  		}
> +	} else if (aarch64_insn_is_adrp(insn)) {
> +		s32 orig_offset, new_offset;
> +		unsigned long target;
> +
> +		/*
> +		 * If we're replacing an adrp instruction, which uses PC-relative
> +		 * immediate addressing, adjust the offset to reflect the new
> +		 * PC. adrp operates on 4K aligned addresses.
> +		 */
> +		orig_offset  = aarch64_insn_adrp_get_offset(insn);
> +		target = ((unsigned long)altinsnptr & ~0xfffUL) + orig_offset;
> +		new_offset = target - ((unsigned long)insnptr & ~0xfffUL);

The masking with ~0xfffUL might be nicer if you write it as
align_down(ptr, SZ_4K);

> +		insn = aarch64_insn_adrp_set_offset(insn, new_offset);
>  	}
>  
>  	return insn;

I wonder if we shouldn't have a catch-all for any instructions performing
PC-relative operations here, because silent corruption of the instruction
stream is pretty horrible. What other instructions are there? ADR, LDR
(literal), ... ?

Will

^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: [PATCH 5/8] arm64: alternative: Add support for patching adrp instructions
  2016-08-18 13:10 ` [PATCH 5/8] arm64: alternative: Add support for patching adrp instructions Suzuki K Poulose
  2016-08-22 11:19   ` Will Deacon
@ 2016-08-22 11:45   ` Ard Biesheuvel
  2016-08-23  9:16     ` Suzuki K Poulose
  1 sibling, 1 reply; 24+ messages in thread
From: Ard Biesheuvel @ 2016-08-22 11:45 UTC (permalink / raw)
  To: Suzuki K Poulose
  Cc: linux-arm-kernel, Mark Rutland, Marc Zyngier, Catalin Marinas,
	Will Deacon, linux-kernel, Andre Przywara

On 18 August 2016 at 15:10, Suzuki K Poulose <suzuki.poulose@arm.com> wrote:
> adrp uses PC-relative address offset to a page (of 4K size) of
> a symbol. If it appears in an alternative code patched in, we
> should adjust the offset to reflect the address where it will
> be run from. This patch adds support for fixing the offset
> for adrp instructions.
>
> Cc: Will Deacon <will.deacon@arm.com>
> Cc: Marc Zyngier <marc.zyngier@arm.com>
> Cc: Andre Przywara <andre.przywara@arm.com>
> Cc: Mark Rutland <mark.rutland@arm.com>
> Signed-off-by: Suzuki K Poulose <suzuki.poulose@arm.com>
> ---
>  arch/arm64/kernel/alternative.c | 13 +++++++++++++
>  1 file changed, 13 insertions(+)
>
> diff --git a/arch/arm64/kernel/alternative.c b/arch/arm64/kernel/alternative.c
> index d2ee1b2..71c6962 100644
> --- a/arch/arm64/kernel/alternative.c
> +++ b/arch/arm64/kernel/alternative.c
> @@ -80,6 +80,19 @@ static u32 get_alt_insn(struct alt_instr *alt, u32 *insnptr, u32 *altinsnptr)
>                         offset = target - (unsigned long)insnptr;
>                         insn = aarch64_set_branch_offset(insn, offset);
>                 }
> +       } else if (aarch64_insn_is_adrp(insn)) {
> +               s32 orig_offset, new_offset;
> +               unsigned long target;
> +
> +               /*
> +                * If we're replacing an adrp instruction, which uses PC-relative
> +                * immediate addressing, adjust the offset to reflect the new
> +                * PC. adrp operates on 4K aligned addresses.
> +                */
> +               orig_offset  = aarch64_insn_adrp_get_offset(insn);
> +               target = ((unsigned long)altinsnptr & ~0xfffUL) + orig_offset;
> +               new_offset = target - ((unsigned long)insnptr & ~0xfffUL);
> +               insn = aarch64_insn_adrp_set_offset(insn, new_offset);

Are orig_offset and new_offset guaranteed to be equal modulo 4 KB?
Otherwise, you will have to track down and patch the associated :lo12:
add/ldr instruction as well.

^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: [PATCH 7/8] arm64: Refactor sysinstr exception handling
  2016-08-18 13:10 ` [PATCH 7/8] arm64: Refactor sysinstr exception handling Suzuki K Poulose
@ 2016-08-22 12:53   ` Will Deacon
  2016-08-23 10:19     ` Suzuki K Poulose
  0 siblings, 1 reply; 24+ messages in thread
From: Will Deacon @ 2016-08-22 12:53 UTC (permalink / raw)
  To: Suzuki K Poulose
  Cc: linux-arm-kernel, linux-kernel, catalin.marinas, mark.rutland,
	andre.przywara

On Thu, Aug 18, 2016 at 02:10:31PM +0100, Suzuki K Poulose wrote:
> Right now we trap some of the user space data cache operations
> based on a few Errata (ARM 819472, 826319, 827319 and 824069).
> We need to trap userspace access to CTR_EL0, if we detect mismatched
> cache line size. Since both these traps share the EC, refactor
> the handler a little bit to make it a bit more reader friendly.
> 
> Cc: Andre Przywara <andre.przywara@arm.com>
> Cc: Mark Rutland <mark.rutland@arm.com>
> Cc: Will Deacon <will.deacon@arm.com>
> Cc: Catalin Marinas <catalin.marinas@arm.com>
> Signed-off-by: Suzuki K Poulose <suzuki.poulose@arm.com>
> ---
>  arch/arm64/include/asm/esr.h | 48 +++++++++++++++++++++++++++++
>  arch/arm64/kernel/traps.c    | 73 ++++++++++++++++++++++++++++----------------
>  2 files changed, 95 insertions(+), 26 deletions(-)
> 
> diff --git a/arch/arm64/include/asm/esr.h b/arch/arm64/include/asm/esr.h
> index f772e15..2a8f6c3 100644
> --- a/arch/arm64/include/asm/esr.h
> +++ b/arch/arm64/include/asm/esr.h
> @@ -109,6 +109,54 @@
>  	((ESR_ELx_EC_BRK64 << ESR_ELx_EC_SHIFT) | ESR_ELx_IL |	\
>  	 ((imm) & 0xffff))
>  
> +/* ISS field definitions for System instruction traps */

Can you add a similar comment for the ESR_ELx_* encodings that we already
have, please? Unfortunately, we've not namespaced things, so the
data/instruction abort encodings are described as e.g. ESR_ELx_ISV.

> +#define ESR_ELx_SYS64_ISS_RES0_SHIFT	22
> +#define ESR_ELx_SYS64_ISS_RES0_MASK	(UL(0x7) << ESR_ELx_SYS64_ISS_RES0_SHIFT)
> +#define ESR_ELx_SYS64_ISS_DIR_MASK	0x1
> +#define ESR_ELx_SYS64_ISS_DIR_READ	0x1
> +#define ESR_ELx_SYS64_ISS_DIR_WRITE	0x0
> +
> +#define ESR_ELx_SYS64_ISS_RT_SHIFT	5
> +#define ESR_ELx_SYS64_ISS_RT_MASK	(UL(0x1f) << ESR_ELx_SYS64_ISS_RT_SHIFT)
> +#define ESR_ELx_SYS64_ISS_CRm_SHIFT	1
> +#define ESR_ELx_SYS64_ISS_CRm_MASK	(UL(0xf) << ESR_ELx_SYS64_ISS_CRm_SHIFT)
> +#define ESR_ELx_SYS64_ISS_CRn_SHIFT	10
> +#define ESR_ELx_SYS64_ISS_CRn_MASK	(UL(0xf) << ESR_ELx_SYS64_ISS_CRn_SHIFT)
> +#define ESR_ELx_SYS64_ISS_Op1_SHIFT	14
> +#define ESR_ELx_SYS64_ISS_Op1_MASK	(UL(0x7) << ESR_ELx_SYS64_ISS_Op1_SHIFT)
> +#define ESR_ELx_SYS64_ISS_Op2_SHIFT	17
> +#define ESR_ELx_SYS64_ISS_Op2_MASK	(UL(0x7) << ESR_ELx_SYS64_ISS_Op2_SHIFT)
> +#define ESR_ELx_SYS64_ISS_Op0_SHIFT	20
> +#define ESR_ELx_SYS64_ISS_Op0_MASK	(UL(0x3) << ESR_ELx_SYS64_ISS_Op0_SHIFT)

Inconsistent capitalisation in your macro naming (e.g. RT vs Op1). Maybe
just stick to uppercase for #defines?

> +#define ESR_ELx_SYS64_ISS_SYS_MASK	(ESR_ELx_SYS64_ISS_Op0_MASK | \
> +					 ESR_ELx_SYS64_ISS_Op1_MASK | \
> +					 ESR_ELx_SYS64_ISS_Op2_MASK | \
> +					 ESR_ELx_SYS64_ISS_CRn_MASK | \
> +					 ESR_ELx_SYS64_ISS_CRm_MASK)
> +#define ESR_ELx_SYS64_ISS_SYS_VAL(Op0, Op1, Op2, CRn, CRm) \
> +					(((Op0) << ESR_ELx_SYS64_ISS_Op0_SHIFT) | \
> +					 ((Op1) << ESR_ELx_SYS64_ISS_Op1_SHIFT) | \
> +					 ((Op2) << ESR_ELx_SYS64_ISS_Op2_SHIFT) | \
> +					 ((CRn) << ESR_ELx_SYS64_ISS_CRn_SHIFT) | \
> +					 ((CRm) << ESR_ELx_SYS64_ISS_CRm_SHIFT))
> +/*
> + * User space cache operations have the following sysreg encoding
> + * in System instructions.
> + * Op0=1, Op1=3, Op2=1, CRn=7, CRm={ 5, 10, 11, 14 }, WRITE (L=0)
> + */
> +#define ESR_ELx_SYS64_ISS_CRm_DC_CIVAC	14
> +#define ESR_ELx_SYS64_ISS_CRm_DC_CVAU	11
> +#define ESR_ELx_SYS64_ISS_CRm_DC_CVAC	10
> +#define ESR_ELx_SYS64_ISS_CRm_IC_IVAU	5
> +
> +#define ESR_ELx_SYS64_ISS_U_CACHE_OP_MASK	(ESR_ELx_SYS64_ISS_Op0_MASK | \
> +						 ESR_ELx_SYS64_ISS_Op1_MASK | \
> +						 ESR_ELx_SYS64_ISS_Op2_MASK | \
> +						 ESR_ELx_SYS64_ISS_CRn_MASK | \
> +						 ESR_ELx_SYS64_ISS_DIR_MASK)
> +#define ESR_ELx_SYS64_ISS_U_CACHE_OP_VAL \
> +				(ESR_ELx_SYS64_ISS_SYS_VAL(1, 3, 1, 7, 0) | \
> +				 ESR_ELx_SYS64_ISS_DIR_WRITE)

What is the _U_ for? Unified? User? If it's user, EL0 may be more
appropriate.

Will

^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: [PATCH 8/8] arm64: Work around systems with mismatched cache line sizes
  2016-08-18 13:10 ` [PATCH 8/8] arm64: Work around systems with mismatched cache line sizes Suzuki K Poulose
@ 2016-08-22 13:02   ` Will Deacon
  2016-08-24 13:23     ` Suzuki K Poulose
  0 siblings, 1 reply; 24+ messages in thread
From: Will Deacon @ 2016-08-22 13:02 UTC (permalink / raw)
  To: Suzuki K Poulose
  Cc: linux-arm-kernel, linux-kernel, catalin.marinas, mark.rutland,
	andre.przywara

On Thu, Aug 18, 2016 at 02:10:32PM +0100, Suzuki K Poulose wrote:
> Systems with differing CPU i-cache/d-cache line sizes can cause
> problems with the cache management by software when the execution
> is migrated from one to another. Usually, the application reads
> the cache size on a CPU and then uses that length to perform cache
> operations. However, if it gets migrated to another CPU with a smaller
> cache line size, things could go completely wrong. To prevent such
> cases, always use the smallest cache line size among the CPUs. The
> kernel CPU feature infrastructure already keeps track of the safe
> value for all CPUID registers including CTR. This patch works around
> the problem by :
> 
> For kernel, dynamically patch the kernel to read the cache size
> from the system wide copy of CTR_EL0.

Is it only CTR that is mismatched in practice, or do we need to worry
about DCZID_EL0 too?

> For applications, trap read accesses to CTR_EL0 (by clearing the SCTLR.UCT)
> and emulate the mrs instruction to return the system wide safe value
> of CTR_EL0.
> 
> For faster access (i.e, avoiding to lookup the system wide value of CTR_EL0
> via read_system_reg), we keep track of the pointer to table entry for
> CTR_EL0 in the CPU feature infrastructure.
> 
> Cc: Mark Rutland <mark.rutland@arm.com>
> Cc: Andre Przywara <andre.przywara@arm.com>
> Cc: Will Deacon <will.deacon@arm.com>
> Cc: Catalin Marinas <catalin.marinas@arm.com>
> Signed-off-by: Suzuki K Poulose <suzuki.poulose@arm.com>
> ---
>  arch/arm64/include/asm/assembler.h  | 25 +++++++++++++++++++++++--
>  arch/arm64/include/asm/cpufeature.h |  4 +++-
>  arch/arm64/include/asm/esr.h        |  8 ++++++++
>  arch/arm64/include/asm/sysreg.h     |  1 +
>  arch/arm64/kernel/asm-offsets.c     |  2 ++
>  arch/arm64/kernel/cpu_errata.c      | 22 ++++++++++++++++++++++
>  arch/arm64/kernel/cpufeature.c      |  9 +++++++++
>  arch/arm64/kernel/traps.c           | 14 ++++++++++++++
>  8 files changed, 82 insertions(+), 3 deletions(-)
> 
> diff --git a/arch/arm64/include/asm/assembler.h b/arch/arm64/include/asm/assembler.h
> index a4bb3f5..66bd268 100644
> --- a/arch/arm64/include/asm/assembler.h
> +++ b/arch/arm64/include/asm/assembler.h
> @@ -216,6 +216,21 @@ lr	.req	x30		// link register
>  	.macro	mmid, rd, rn
>  	ldr	\rd, [\rn, #MM_CONTEXT_ID]
>  	.endm
> +/*
> + * read_ctr - read CTR_EL0. If the system has mismatched
> + * cache line sizes, provide the system wide safe value.
> + */
> +	.macro	read_ctr, reg
> +alternative_if_not ARM64_MISMATCHED_CACHE_LINE_SIZE
> +	mrs	\reg, ctr_el0			// read CTR
> +	nop
> +	nop
> +alternative_else
> +	ldr_l	\reg, sys_ctr_ftr		// Read system wide safe CTR value
> +	ldr	\reg, [\reg, #ARM64_FTR_SYSVAL] // from sys_ctr_ftr->sys_val
> +alternative_endif
> +	.endm
> +
>  
>  /*
>   * raw_dcache_line_size - get the minimum D-cache line size on this CPU
> @@ -232,7 +247,10 @@ lr	.req	x30		// link register
>   * dcache_line_size - get the safe D-cache line size across all CPUs
>   */
>  	.macro	dcache_line_size, reg, tmp
> -	raw_dcache_line_size	\reg, \tmp
> +	read_ctr	\tmp
> +	ubfm		\tmp, \tmp, #16, #19	// cache line size encoding
> +	mov		\reg, #4		// bytes per word
> +	lsl		\reg, \reg, \tmp	// actual cache line size
>  	.endm
>  
>  /*
> @@ -250,7 +268,10 @@ lr	.req	x30		// link register
>   * icache_line_size - get the safe I-cache line size across all CPUs
>   */
>  	.macro	icache_line_size, reg, tmp
> -	raw_icache_line_size	\reg, \tmp
> +	read_ctr	\tmp
> +	and		\tmp, \tmp, #0xf	// cache line size encoding
> +	mov		\reg, #4		// bytes per word
> +	lsl		\reg, \reg, \tmp	// actual cache line size
>  	.endm
>  
>  /*
> diff --git a/arch/arm64/include/asm/cpufeature.h b/arch/arm64/include/asm/cpufeature.h
> index 692b8d3..e99f2af 100644
> --- a/arch/arm64/include/asm/cpufeature.h
> +++ b/arch/arm64/include/asm/cpufeature.h
> @@ -37,8 +37,9 @@
>  #define ARM64_WORKAROUND_CAVIUM_27456		12
>  #define ARM64_HAS_32BIT_EL0			13
>  #define ARM64_HYP_OFFSET_LOW			14
> +#define ARM64_MISMATCHED_CACHE_LINE_SIZE	15
>  
> -#define ARM64_NCAPS				15
> +#define ARM64_NCAPS				16
>  
>  #ifndef __ASSEMBLY__
>  
> @@ -109,6 +110,7 @@ struct arm64_cpu_capabilities {
>  };
>  
>  extern DECLARE_BITMAP(cpu_hwcaps, ARM64_NCAPS);
> +extern struct arm64_ftr_reg *sys_ctr_ftr;
>  
>  bool this_cpu_has_cap(unsigned int cap);
>  
> diff --git a/arch/arm64/include/asm/esr.h b/arch/arm64/include/asm/esr.h
> index 2a8f6c3..51aea89 100644
> --- a/arch/arm64/include/asm/esr.h
> +++ b/arch/arm64/include/asm/esr.h
> @@ -139,6 +139,9 @@
>  					 ((Op2) << ESR_ELx_SYS64_ISS_Op2_SHIFT) | \
>  					 ((CRn) << ESR_ELx_SYS64_ISS_CRn_SHIFT) | \
>  					 ((CRm) << ESR_ELx_SYS64_ISS_CRm_SHIFT))
> +
> +#define ESR_ELx_SYS64_ISS_SYS_OP_MASK	(ESR_ELx_SYS64_ISS_SYS_MASK | \
> +					 ESR_ELx_SYS64_ISS_DIR_MASK)
>  /*
>   * User space cache operations have the following sysreg encoding
>   * in System instructions.
> @@ -157,6 +160,11 @@
>  #define ESR_ELx_SYS64_ISS_U_CACHE_OP_VAL \
>  				(ESR_ELx_SYS64_ISS_SYS_VAL(1, 3, 1, 7, 0) | \
>  				 ESR_ELx_SYS64_ISS_DIR_WRITE)
> +
> +#define ESR_ELx_SYS64_ISS_SYS_CTR	ESR_ELx_SYS64_ISS_SYS_VAL(3, 3, 1, 0, 0)
> +#define ESR_ELx_SYS64_ISS_SYS_CTR_READ	(ESR_ELx_SYS64_ISS_SYS_CTR | \
> +					 ESR_ELx_SYS64_ISS_DIR_READ)
> +
>  #ifndef __ASSEMBLY__
>  #include <asm/types.h>
>  
> diff --git a/arch/arm64/include/asm/sysreg.h b/arch/arm64/include/asm/sysreg.h
> index cc06794..4f26ad5 100644
> --- a/arch/arm64/include/asm/sysreg.h
> +++ b/arch/arm64/include/asm/sysreg.h
> @@ -100,6 +100,7 @@
>  /* SCTLR_EL1 specific flags. */
>  #define SCTLR_EL1_UCI		(1 << 26)
>  #define SCTLR_EL1_SPAN		(1 << 23)
> +#define SCTLR_EL1_UCT		(1 << 15)
>  #define SCTLR_EL1_SED		(1 << 8)
>  #define SCTLR_EL1_CP15BEN	(1 << 5)
>  
> diff --git a/arch/arm64/kernel/asm-offsets.c b/arch/arm64/kernel/asm-offsets.c
> index 05070b7..4a2f0f0 100644
> --- a/arch/arm64/kernel/asm-offsets.c
> +++ b/arch/arm64/kernel/asm-offsets.c
> @@ -23,6 +23,7 @@
>  #include <linux/dma-mapping.h>
>  #include <linux/kvm_host.h>
>  #include <linux/suspend.h>
> +#include <asm/cpufeature.h>
>  #include <asm/thread_info.h>
>  #include <asm/memory.h>
>  #include <asm/smp_plat.h>
> @@ -145,5 +146,6 @@ int main(void)
>    DEFINE(HIBERN_PBE_ORIG,	offsetof(struct pbe, orig_address));
>    DEFINE(HIBERN_PBE_ADDR,	offsetof(struct pbe, address));
>    DEFINE(HIBERN_PBE_NEXT,	offsetof(struct pbe, next));
> +  DEFINE(ARM64_FTR_SYSVAL,	offsetof(struct arm64_ftr_reg, sys_val));
>    return 0;
>  }
> diff --git a/arch/arm64/kernel/cpu_errata.c b/arch/arm64/kernel/cpu_errata.c
> index 5836b3d..621fa7f 100644
> --- a/arch/arm64/kernel/cpu_errata.c
> +++ b/arch/arm64/kernel/cpu_errata.c
> @@ -30,6 +30,21 @@ is_affected_midr_range(const struct arm64_cpu_capabilities *entry, int scope)
>  				       entry->midr_range_max);
>  }
>  
> +static bool
> +has_mismatched_cache_line_size(const struct arm64_cpu_capabilities *entry,
> +				int scope)
> +{
> +	WARN_ON(scope != SCOPE_LOCAL_CPU || preemptible());
> +	return (read_cpuid_cachetype() & sys_ctr_ftr->strict_mask) !=
> +		 (sys_ctr_ftr->sys_val & sys_ctr_ftr->strict_mask);
> +}
> +
> +static void cpu_enable_trap_ctr_access(void *__unused)
> +{
> +	/* Clear SCTLR_EL1.UCT */
> +	config_sctlr_el1(SCTLR_EL1_UCT, 0);
> +}
> +
>  #define MIDR_RANGE(model, min, max) \
>  	.def_scope = SCOPE_LOCAL_CPU, \
>  	.matches = is_affected_midr_range, \
> @@ -108,6 +123,13 @@ const struct arm64_cpu_capabilities arm64_errata[] = {
>  	},
>  #endif
>  	{
> +		.desc = "Mismatched cache line size",
> +		.capability = ARM64_MISMATCHED_CACHE_LINE_SIZE,
> +		.matches = has_mismatched_cache_line_size,
> +		.def_scope = SCOPE_LOCAL_CPU,
> +		.enable = cpu_enable_trap_ctr_access,
> +	},
> +	{
>  	}
>  };
>  
> diff --git a/arch/arm64/kernel/cpufeature.c b/arch/arm64/kernel/cpufeature.c
> index fcf87ca..c1db313 100644
> --- a/arch/arm64/kernel/cpufeature.c
> +++ b/arch/arm64/kernel/cpufeature.c
> @@ -46,6 +46,13 @@ unsigned int compat_elf_hwcap2 __read_mostly;
>  
>  DECLARE_BITMAP(cpu_hwcaps, ARM64_NCAPS);
>  
> +/*
> + * If we have mismatched cache line sizes, we need to emulate access to
> + * CTRL_EL0. To avoid a lookup everytime (via read_system_reg()), cache
> + * the table entry for CTR_EL0.
> + */
> +struct arm64_ftr_reg *sys_ctr_ftr;
> +
>  #define __ARM64_FTR_BITS(SIGNED, STRICT, TYPE, SHIFT, WIDTH, SAFE_VAL) \
>  	{						\
>  		.sign = SIGNED,				\
> @@ -461,6 +468,8 @@ void __init init_cpu_features(struct cpuinfo_arm64 *info)
>  		init_cpu_ftr_reg(SYS_MVFR2_EL1, info->reg_mvfr2);
>  	}
>  
> +	/* Keep track of the CTR feature register */
> +	sys_ctr_ftr = get_arm64_ftr_reg(SYS_CTR_EL0);
>  }
>  
>  static void update_cpu_ftr_reg(struct arm64_ftr_reg *reg, u64 new)
> diff --git a/arch/arm64/kernel/traps.c b/arch/arm64/kernel/traps.c
> index 93c5287..db2d6cb 100644
> --- a/arch/arm64/kernel/traps.c
> +++ b/arch/arm64/kernel/traps.c
> @@ -480,6 +480,14 @@ static void user_cache_maint_handler(unsigned int esr, struct pt_regs *regs)
>  		regs->pc += 4;
>  }
>  
> +static void ctr_read_handler(unsigned int esr, struct pt_regs *regs)
> +{
> +	int rt = (esr & ESR_ELx_SYS64_ISS_RT_MASK) >> ESR_ELx_SYS64_ISS_RT_SHIFT;
> +
> +	regs->regs[rt] = sys_ctr_ftr->sys_val;
> +	regs->pc += 4;
> +}

Whilst this is correct, I wonder if there's any advantage in reporting a
*larger* size to userspace and avoid incurring additional trap overhead?

Any idea what sort of size typical JITs are using?

Will

^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: [PATCH 5/8] arm64: alternative: Add support for patching adrp instructions
  2016-08-22 11:45   ` Ard Biesheuvel
@ 2016-08-23  9:16     ` Suzuki K Poulose
  2016-08-23 11:32       ` Ard Biesheuvel
  0 siblings, 1 reply; 24+ messages in thread
From: Suzuki K Poulose @ 2016-08-23  9:16 UTC (permalink / raw)
  To: Ard Biesheuvel
  Cc: linux-arm-kernel, Mark Rutland, Marc Zyngier, Catalin Marinas,
	Will Deacon, linux-kernel, Andre Przywara

On 22/08/16 12:45, Ard Biesheuvel wrote:
> On 18 August 2016 at 15:10, Suzuki K Poulose <suzuki.poulose@arm.com> wrote:
>> adrp uses PC-relative address offset to a page (of 4K size) of
>> a symbol. If it appears in an alternative code patched in, we
>> should adjust the offset to reflect the address where it will
>> be run from. This patch adds support for fixing the offset
>> for adrp instructions.
>>
>> Cc: Will Deacon <will.deacon@arm.com>
>> Cc: Marc Zyngier <marc.zyngier@arm.com>
>> Cc: Andre Przywara <andre.przywara@arm.com>
>> Cc: Mark Rutland <mark.rutland@arm.com>
>> Signed-off-by: Suzuki K Poulose <suzuki.poulose@arm.com>
>> ---
>>  arch/arm64/kernel/alternative.c | 13 +++++++++++++
>>  1 file changed, 13 insertions(+)
>>
>> diff --git a/arch/arm64/kernel/alternative.c b/arch/arm64/kernel/alternative.c
>> index d2ee1b2..71c6962 100644
>> --- a/arch/arm64/kernel/alternative.c
>> +++ b/arch/arm64/kernel/alternative.c
>> @@ -80,6 +80,19 @@ static u32 get_alt_insn(struct alt_instr *alt, u32 *insnptr, u32 *altinsnptr)
>>                         offset = target - (unsigned long)insnptr;
>>                         insn = aarch64_set_branch_offset(insn, offset);
>>                 }
>> +       } else if (aarch64_insn_is_adrp(insn)) {
>> +               s32 orig_offset, new_offset;
>> +               unsigned long target;
>> +
>> +               /*
>> +                * If we're replacing an adrp instruction, which uses PC-relative
>> +                * immediate addressing, adjust the offset to reflect the new
>> +                * PC. adrp operates on 4K aligned addresses.
>> +                */
>> +               orig_offset  = aarch64_insn_adrp_get_offset(insn);
>> +               target = ((unsigned long)altinsnptr & ~0xfffUL) + orig_offset;
>> +               new_offset = target - ((unsigned long)insnptr & ~0xfffUL);
>> +               insn = aarch64_insn_adrp_set_offset(insn, new_offset);
>
> Are orig_offset and new_offset guaranteed to be equal modulo 4 KB?
> Otherwise, you will have to track down and patch the associated :lo12:
> add/ldr instruction as well.

We are modifying the alternative instruction to accommodate for the new PC,
where this instruction will be executed from, while the referenced symbol
remains the same. Hence the associated :lo12: doesn't change. Does that
address your concern ? Or did I miss something ?

Suzuki

^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: [PATCH 5/8] arm64: alternative: Add support for patching adrp instructions
  2016-08-22 11:19   ` Will Deacon
@ 2016-08-23  9:39     ` Suzuki K Poulose
  0 siblings, 0 replies; 24+ messages in thread
From: Suzuki K Poulose @ 2016-08-23  9:39 UTC (permalink / raw)
  To: Will Deacon
  Cc: linux-arm-kernel, linux-kernel, catalin.marinas, mark.rutland,
	andre.przywara, Marc Zyngier

On 22/08/16 12:19, Will Deacon wrote:
> On Thu, Aug 18, 2016 at 02:10:29PM +0100, Suzuki K Poulose wrote:
>> adrp uses PC-relative address offset to a page (of 4K size) of
>> a symbol. If it appears in an alternative code patched in, we
>> should adjust the offset to reflect the address where it will
>> be run from. This patch adds support for fixing the offset
>> for adrp instructions.
>>
>> Cc: Will Deacon <will.deacon@arm.com>
>> Cc: Marc Zyngier <marc.zyngier@arm.com>
>> Cc: Andre Przywara <andre.przywara@arm.com>
>> Cc: Mark Rutland <mark.rutland@arm.com>
>> Signed-off-by: Suzuki K Poulose <suzuki.poulose@arm.com>
>> ---
>>  arch/arm64/kernel/alternative.c | 13 +++++++++++++
>>  1 file changed, 13 insertions(+)
>>
>> diff --git a/arch/arm64/kernel/alternative.c b/arch/arm64/kernel/alternative.c
>> index d2ee1b2..71c6962 100644
>> --- a/arch/arm64/kernel/alternative.c
>> +++ b/arch/arm64/kernel/alternative.c
>> @@ -80,6 +80,19 @@ static u32 get_alt_insn(struct alt_instr *alt, u32 *insnptr, u32 *altinsnptr)
>>  			offset = target - (unsigned long)insnptr;
>>  			insn = aarch64_set_branch_offset(insn, offset);
>>  		}
>> +	} else if (aarch64_insn_is_adrp(insn)) {
>> +		s32 orig_offset, new_offset;
>> +		unsigned long target;
>> +
>> +		/*
>> +		 * If we're replacing an adrp instruction, which uses PC-relative
>> +		 * immediate addressing, adjust the offset to reflect the new
>> +		 * PC. adrp operates on 4K aligned addresses.
>> +		 */
>> +		orig_offset  = aarch64_insn_adrp_get_offset(insn);
>> +		target = ((unsigned long)altinsnptr & ~0xfffUL) + orig_offset;
>> +		new_offset = target - ((unsigned long)insnptr & ~0xfffUL);
>
> The masking with ~0xfffUL might be nicer if you write it as
> align_down(ptr, SZ_4K);

Right, that definitely looks better. Will change it.

>
>> +		insn = aarch64_insn_adrp_set_offset(insn, new_offset);
>>  	}
>>
>>  	return insn;
>
> I wonder if we shouldn't have a catch-all for any instructions performing
> PC-relative operations here, because silent corruption of the instruction

Correct, which is what happened initially when I didn't have the adrp handling ;-).

> stream is pretty horrible. What other instructions are there? ADR, LDR
> (literal), ... ?

 From a quick look, all the instructions under "Load register (literal)" :

i.e,
LDR (literal) for GPR/FP_SIMD/32bit/64bit
LDRSW (literal)
PRFM (literal)

and Data processing instructions - immediate group with PC-relative addressing:

ADR, ADRP

I will add a check to catch the unsupported instructions in the alternative code.

Thanks
Suzuki

^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: [PATCH 6/8] arm64: Introduce raw_{d,i}cache_line_size
  2016-08-22 10:00   ` Will Deacon
@ 2016-08-23 10:07     ` Suzuki K Poulose
  0 siblings, 0 replies; 24+ messages in thread
From: Suzuki K Poulose @ 2016-08-23 10:07 UTC (permalink / raw)
  To: Will Deacon
  Cc: linux-arm-kernel, linux-kernel, catalin.marinas, mark.rutland,
	andre.przywara, James Morse, Geoff Levand

On 22/08/16 11:00, Will Deacon wrote:
> On Thu, Aug 18, 2016 at 02:10:30PM +0100, Suzuki K Poulose wrote:
>> On systems with mismatched i/d cache min line sizes, we need to use
>> the smallest size possible across all CPUs. This will be done by fetching
>> the system wide safe value from CPU feature infrastructure.
>> However the some special users(e.g kexec, hibernate) would need the line
>> size on the CPU (rather than the system wide), when the system wide
>> feature may not be accessible. Provide another helper which will fetch
>> cache line size on the current CPU.
>
> Why are these users "special"? Using a smaller line size shouldn't affect

With the alternate patched code, we refer to the kernel data structure for
CTR value. At least for kexec, it may overwrite the existing kernel image/data where
our data was stored and could possibly end up in receiving corrupted code.

For all special cases where it is ensured that the code is run on a
single CPU and will not be migrated to another CPU they can rely on
the raw value of CTR, hence the change.

> correctness, and I don't see kexec and hibernate as being performance
> critical in their cache maintenance.

Its not for performance, but for the safety.

Suzuki

^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: [PATCH 7/8] arm64: Refactor sysinstr exception handling
  2016-08-22 12:53   ` Will Deacon
@ 2016-08-23 10:19     ` Suzuki K Poulose
  0 siblings, 0 replies; 24+ messages in thread
From: Suzuki K Poulose @ 2016-08-23 10:19 UTC (permalink / raw)
  To: Will Deacon
  Cc: linux-arm-kernel, linux-kernel, catalin.marinas, mark.rutland,
	andre.przywara

On 22/08/16 13:53, Will Deacon wrote:
> On Thu, Aug 18, 2016 at 02:10:31PM +0100, Suzuki K Poulose wrote:
>> Right now we trap some of the user space data cache operations
>> based on a few Errata (ARM 819472, 826319, 827319 and 824069).
>> We need to trap userspace access to CTR_EL0, if we detect mismatched
>> cache line size. Since both these traps share the EC, refactor
>> the handler a little bit to make it a bit more reader friendly.
>>
>> Cc: Andre Przywara <andre.przywara@arm.com>
>> Cc: Mark Rutland <mark.rutland@arm.com>
>> Cc: Will Deacon <will.deacon@arm.com>
>> Cc: Catalin Marinas <catalin.marinas@arm.com>
>> Signed-off-by: Suzuki K Poulose <suzuki.poulose@arm.com>
>> ---
>>  arch/arm64/include/asm/esr.h | 48 +++++++++++++++++++++++++++++
>>  arch/arm64/kernel/traps.c    | 73 ++++++++++++++++++++++++++++----------------
>>  2 files changed, 95 insertions(+), 26 deletions(-)
>>
>> diff --git a/arch/arm64/include/asm/esr.h b/arch/arm64/include/asm/esr.h
>> index f772e15..2a8f6c3 100644
>> --- a/arch/arm64/include/asm/esr.h
>> +++ b/arch/arm64/include/asm/esr.h
>> @@ -109,6 +109,54 @@
>>  	((ESR_ELx_EC_BRK64 << ESR_ELx_EC_SHIFT) | ESR_ELx_IL |	\
>>  	 ((imm) & 0xffff))
>>
>> +/* ISS field definitions for System instruction traps */
>
> Can you add a similar comment for the ESR_ELx_* encodings that we already
> have, please? Unfortunately, we've not namespaced things, so the
> data/instruction abort encodings are described as e.g. ESR_ELx_ISV.

Will do.


>> +#define ESR_ELx_SYS64_ISS_Op0_SHIFT	20
>> +#define ESR_ELx_SYS64_ISS_Op0_MASK	(UL(0x3) << ESR_ELx_SYS64_ISS_Op0_SHIFT)
>
> Inconsistent capitalisation in your macro naming (e.g. RT vs Op1). Maybe
> just stick to uppercase for #defines?

Sure

>> +
>> +#define ESR_ELx_SYS64_ISS_U_CACHE_OP_MASK	(ESR_ELx_SYS64_ISS_Op0_MASK | \
>> +						 ESR_ELx_SYS64_ISS_Op1_MASK | \
>> +						 ESR_ELx_SYS64_ISS_Op2_MASK | \
>> +						 ESR_ELx_SYS64_ISS_CRn_MASK | \
>> +						 ESR_ELx_SYS64_ISS_DIR_MASK)
>> +#define ESR_ELx_SYS64_ISS_U_CACHE_OP_VAL \
>> +				(ESR_ELx_SYS64_ISS_SYS_VAL(1, 3, 1, 7, 0) | \
>> +				 ESR_ELx_SYS64_ISS_DIR_WRITE)
>
> What is the _U_ for? Unified? User? If it's user, EL0 may be more
> appropriate.

Ok.
  


Thanks
Suzuki

^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: [PATCH 5/8] arm64: alternative: Add support for patching adrp instructions
  2016-08-23  9:16     ` Suzuki K Poulose
@ 2016-08-23 11:32       ` Ard Biesheuvel
  0 siblings, 0 replies; 24+ messages in thread
From: Ard Biesheuvel @ 2016-08-23 11:32 UTC (permalink / raw)
  To: Suzuki K Poulose
  Cc: linux-arm-kernel, Mark Rutland, Marc Zyngier, Catalin Marinas,
	Will Deacon, linux-kernel, Andre Przywara

On 23 August 2016 at 11:16, Suzuki K Poulose <Suzuki.Poulose@arm.com> wrote:
> On 22/08/16 12:45, Ard Biesheuvel wrote:
>>
>> On 18 August 2016 at 15:10, Suzuki K Poulose <suzuki.poulose@arm.com>
>> wrote:
>>>
>>> adrp uses PC-relative address offset to a page (of 4K size) of
>>> a symbol. If it appears in an alternative code patched in, we
>>> should adjust the offset to reflect the address where it will
>>> be run from. This patch adds support for fixing the offset
>>> for adrp instructions.
>>>
>>> Cc: Will Deacon <will.deacon@arm.com>
>>> Cc: Marc Zyngier <marc.zyngier@arm.com>
>>> Cc: Andre Przywara <andre.przywara@arm.com>
>>> Cc: Mark Rutland <mark.rutland@arm.com>
>>> Signed-off-by: Suzuki K Poulose <suzuki.poulose@arm.com>
>>> ---
>>>  arch/arm64/kernel/alternative.c | 13 +++++++++++++
>>>  1 file changed, 13 insertions(+)
>>>
>>> diff --git a/arch/arm64/kernel/alternative.c
>>> b/arch/arm64/kernel/alternative.c
>>> index d2ee1b2..71c6962 100644
>>> --- a/arch/arm64/kernel/alternative.c
>>> +++ b/arch/arm64/kernel/alternative.c
>>> @@ -80,6 +80,19 @@ static u32 get_alt_insn(struct alt_instr *alt, u32
>>> *insnptr, u32 *altinsnptr)
>>>                         offset = target - (unsigned long)insnptr;
>>>                         insn = aarch64_set_branch_offset(insn, offset);
>>>                 }
>>> +       } else if (aarch64_insn_is_adrp(insn)) {
>>> +               s32 orig_offset, new_offset;
>>> +               unsigned long target;
>>> +
>>> +               /*
>>> +                * If we're replacing an adrp instruction, which uses
>>> PC-relative
>>> +                * immediate addressing, adjust the offset to reflect the
>>> new
>>> +                * PC. adrp operates on 4K aligned addresses.
>>> +                */
>>> +               orig_offset  = aarch64_insn_adrp_get_offset(insn);
>>> +               target = ((unsigned long)altinsnptr & ~0xfffUL) +
>>> orig_offset;
>>> +               new_offset = target - ((unsigned long)insnptr &
>>> ~0xfffUL);
>>> +               insn = aarch64_insn_adrp_set_offset(insn, new_offset);
>>
>>
>> Are orig_offset and new_offset guaranteed to be equal modulo 4 KB?
>> Otherwise, you will have to track down and patch the associated :lo12:
>> add/ldr instruction as well.
>
>
> We are modifying the alternative instruction to accommodate for the new PC,
> where this instruction will be executed from, while the referenced symbol
> remains the same. Hence the associated :lo12: doesn't change. Does that
> address your concern ? Or did I miss something ?
>

Ah, of course. Yes, that should work fine, given that the symbol stays
in place, and so its offset into the containing 4 KB page does not
change either.

Thanks,
Ard.

^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: [PATCH 8/8] arm64: Work around systems with mismatched cache line sizes
  2016-08-22 13:02   ` Will Deacon
@ 2016-08-24 13:23     ` Suzuki K Poulose
  0 siblings, 0 replies; 24+ messages in thread
From: Suzuki K Poulose @ 2016-08-24 13:23 UTC (permalink / raw)
  To: Will Deacon
  Cc: linux-arm-kernel, linux-kernel, catalin.marinas, mark.rutland,
	andre.przywara, rodolph.perfetta, stuart.monteith

On 22/08/16 14:02, Will Deacon wrote:
> On Thu, Aug 18, 2016 at 02:10:32PM +0100, Suzuki K Poulose wrote:
>> Systems with differing CPU i-cache/d-cache line sizes can cause
>> problems with the cache management by software when the execution
>> is migrated from one to another. Usually, the application reads
>> the cache size on a CPU and then uses that length to perform cache
>> operations. However, if it gets migrated to another CPU with a smaller
>> cache line size, things could go completely wrong. To prevent such
>> cases, always use the smallest cache line size among the CPUs. The
>> kernel CPU feature infrastructure already keeps track of the safe
>> value for all CPUID registers including CTR. This patch works around
>> the problem by :
>>
>> For kernel, dynamically patch the kernel to read the cache size
>> from the system wide copy of CTR_EL0.
>
> Is it only CTR that is mismatched in practice, or do we need to worry
> about DCZID_EL0 too?

A mismatched DCZID_EL0 is quite possible. However, there is no way to
trap accesses to DCZID_EL0. Rather, we can trap DC ZVA if we clear
SCTLR_EL1.DZE. But then clearing the SCTLR_EL1.DZE implies reading DCZID.DZP
returns 1, indicating DC ZVA is not supported. So if a proper application
checks the DZP before issuing a DC ZVA, we may never be able to emulate it.
Or in other words, if there is a mismatch, the work around is to disable
the DC ZVA operations (which could possibly affect existing (incorrect) userspace
applications assuming DC ZVA is supported without checking the DZP bit).
  
>>  static void update_cpu_ftr_reg(struct arm64_ftr_reg *reg, u64 new)
>> diff --git a/arch/arm64/kernel/traps.c b/arch/arm64/kernel/traps.c
>> index 93c5287..db2d6cb 100644
>> --- a/arch/arm64/kernel/traps.c
>> +++ b/arch/arm64/kernel/traps.c
>> @@ -480,6 +480,14 @@ static void user_cache_maint_handler(unsigned int esr, struct pt_regs *regs)
>>  		regs->pc += 4;
>>  }
>>
>> +static void ctr_read_handler(unsigned int esr, struct pt_regs *regs)
>> +{
>> +	int rt = (esr & ESR_ELx_SYS64_ISS_RT_MASK) >> ESR_ELx_SYS64_ISS_RT_SHIFT;
>> +
>> +	regs->regs[rt] = sys_ctr_ftr->sys_val;
>> +	regs->pc += 4;
>> +}
>
> Whilst this is correct, I wonder if there's any advantage in reporting a
> *larger* size to userspace and avoid incurring additional trap overhead?

Combining the trapping of user space dc operations for Errata work around for
clean cache, we could possibly report a larger size and emulate it properly
in the kernel. But I think that can be a enhancement on top of this series.

>
> Any idea what sort of size typical JITs are using?

I have no clue about it. I have Cc-ed Rodolph and Stuart, who may have better
idea about the JIT's usage.

Suzuki

^ permalink raw reply	[flat|nested] 24+ messages in thread

* [PATCH 5/8] arm64: alternative: Add support for patching adrp instructions
  2016-07-08 11:37 [PATCH 0/8] arm64: Work around for mismatched cache line size Suzuki K Poulose
@ 2016-07-08 11:37 ` Suzuki K Poulose
  0 siblings, 0 replies; 24+ messages in thread
From: Suzuki K Poulose @ 2016-07-08 11:37 UTC (permalink / raw)
  To: linux-arm-kernel
  Cc: linux-kernel, catalin.marinas, mark.rutland, will.deacon,
	Suzuki K Poulose, Marc Zyngier, Andre Przywara

adrp uses PC-relative address offset to a page (of 4K size) of
a symbol. If it appears in an alternative code patched in, we
should adjust the offset to reflect the address where it will
be run from. This patch adds support for fixing the offset
for adrp instructions.

Cc: Will Deacon <will.deacon@arm.com>
Cc: Marc Zyngier <marc.zyngier@arm.com>
Cc: Andre Przywara <andre.przywara@arm.com>
Cc: Mark Rutland <mark.rutland@arm.com>
Signed-off-by: Suzuki K Poulose <suzuki.poulose@arm.com>
---
 arch/arm64/kernel/alternative.c | 13 +++++++++++++
 1 file changed, 13 insertions(+)

diff --git a/arch/arm64/kernel/alternative.c b/arch/arm64/kernel/alternative.c
index d2ee1b2..cd36950 100644
--- a/arch/arm64/kernel/alternative.c
+++ b/arch/arm64/kernel/alternative.c
@@ -80,6 +80,19 @@ static u32 get_alt_insn(struct alt_instr *alt, u32 *insnptr, u32 *altinsnptr)
 			offset = target - (unsigned long)insnptr;
 			insn = aarch64_set_branch_offset(insn, offset);
 		}
+	} else if (aarch64_insn_is_adrp(insn)) {
+		s32 orig_offset, new_offset;
+		unsigned long target;
+
+		/*
+		 * If we're replacing an adrp instruction, which uses PC-relative
+		 * immediate addressing, adjust the offset to reflect the new
+		 * PC. adrp operates on 4K aligned addresses.
+		 */
+		orig_offset  = aarch64_get_addr_offset(insn);
+		target = ((unsigned long)altinsnptr & ~0xfffUL) + orig_offset;
+		new_offset = target - ((unsigned long)insnptr & ~0xfffUL);
+		insn = aarch64_set_addr_offset(insn, new_offset);
 	}
 
 	return insn;
-- 
2.7.4

^ permalink raw reply related	[flat|nested] 24+ messages in thread

end of thread, other threads:[~2016-08-24 13:23 UTC | newest]

Thread overview: 24+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2016-08-18 13:10 [RESEND] [PATCH 0/8] arm64: Work around for mismatched cache line size Suzuki K Poulose
2016-08-18 13:10 ` [PATCH 1/8] arm64: Set the safe value for L1 icache policy Suzuki K Poulose
2016-08-18 13:10 ` [PATCH 2/8] arm64: Use consistent naming for errata handling Suzuki K Poulose
2016-08-18 13:10 ` [PATCH 3/8] arm64: Rearrange CPU errata workaround checks Suzuki K Poulose
2016-08-18 13:10 ` [PATCH 4/8] arm64: insn: Add helpers for adrp offsets Suzuki K Poulose
2016-08-18 14:47   ` Marc Zyngier
2016-08-18 14:52     ` Suzuki K Poulose
2016-08-18 13:10 ` [PATCH 5/8] arm64: alternative: Add support for patching adrp instructions Suzuki K Poulose
2016-08-22 11:19   ` Will Deacon
2016-08-23  9:39     ` Suzuki K Poulose
2016-08-22 11:45   ` Ard Biesheuvel
2016-08-23  9:16     ` Suzuki K Poulose
2016-08-23 11:32       ` Ard Biesheuvel
2016-08-18 13:10 ` [PATCH 6/8] arm64: Introduce raw_{d,i}cache_line_size Suzuki K Poulose
2016-08-18 17:57   ` Geoff Levand
2016-08-22 10:00   ` Will Deacon
2016-08-23 10:07     ` Suzuki K Poulose
2016-08-18 13:10 ` [PATCH 7/8] arm64: Refactor sysinstr exception handling Suzuki K Poulose
2016-08-22 12:53   ` Will Deacon
2016-08-23 10:19     ` Suzuki K Poulose
2016-08-18 13:10 ` [PATCH 8/8] arm64: Work around systems with mismatched cache line sizes Suzuki K Poulose
2016-08-22 13:02   ` Will Deacon
2016-08-24 13:23     ` Suzuki K Poulose
  -- strict thread matches above, loose matches on Subject: below --
2016-07-08 11:37 [PATCH 0/8] arm64: Work around for mismatched cache line size Suzuki K Poulose
2016-07-08 11:37 ` [PATCH 5/8] arm64: alternative: Add support for patching adrp instructions Suzuki K Poulose

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).