[PATCH 0/4] arm64: Support the TSO memory model

* [PATCH 0/4] arm64: Support the TSO memory model
@ 2024-04-11  0:51 Hector Martin
  2024-04-11  0:51 ` [PATCH 1/4] prctl: Introduce PR_{SET,GET}_MEM_MODEL Hector Martin
                   ` (6 more replies)
  0 siblings, 7 replies; 30+ messages in thread
From: Hector Martin @ 2024-04-11  0:51 UTC (permalink / raw)
  To: Catalin Marinas, Will Deacon, Marc Zyngier, Mark Rutland
  Cc: Zayd Qumsieh, Justin Lu, Ryan Houdek, Mark Brown, Ard Biesheuvel,
	Mateusz Guzik, Anshuman Khandual, Oliver Upton, Miguel Luis,
	Joey Gouly, Christoph Paasch, Kees Cook, Sami Tolvanen,
	Baoquan He, Joel Granados, Dawei Li, Andrew Morton,
	Florent Revest, David Hildenbrand, Stefan Roesch, Andy Chiu,
	Josh Triplett, Oleg Nesterov, Helge Deller, Zev Weiss,
	Ondrej Mosnacek, Miguel Ojeda, linux-arm-kernel, linux-kernel,
	Asahi Linux, Hector Martin

x86 CPUs implement a stricter memory modern than ARM64 (TSO). For this
reason, x86 emulation on baseline ARM64 systems requires very expensive
memory model emulation. Having hardware that supports this natively is
therefore very attractive. Such hardware, in fact, exists. This series
adds support for userspace to identify when TSO is available and
toggle it on, if supported.

Some ARM64 CPUs intrinsically implement the TSO memory model, while
others expose is as an IMPDEF control. Apple Silicon SoCs are in the
latter category. Using TSO for x86 emulation on chips that support it
has been shown to provide a massive performance boost [1].

Patch 1 introduces the PR_{SET,GET}_MEM_MODEL userspace control, which
is initially not implemented for any architectures.

Patch 2 implements it for CPUs which are known, to the best of my
knowledge, to always implement the TSO memory model unconditionally.
This uses the cpufeature mechanism to only enable this if *all* cores in
the system meet the requirements.

Patch 3 adds the scaffolding necesasry to save/restore the ACTLR_EL1
register across context switches. This register contains IMPDEF flags
related to CPU execution, and on Apple CPUs this is where the runtime
TSO toggle bit is implemented. Other CPUs could conceivably benefit from
this scaffolding if they also use ACTLR_EL1 for things that could
ostensibly be runtime controlled and context-switched. For this to work,
ACTLR_EL1 must have a uniform layout across all cores in the system.

Finally, patch 4 implements PR_{SET,GET}_MEM_MODEL for Apple CPUs by
hooking it up to flip the appropriate ACTLR_EL1 bit when the Apple TSO
feature is detected (on all CPUs, which also implies the uniform
ACTLR_EL1 layout).

This series has been brewing in the downstream Asahi Linux tree for a
while now, and ships to thousands of users. A subset have been using it
with FEX-Emu, which already supports this feature. This rebase on
v6.9-rc1 is only build-tested (all intermediate commits with and without
the config enabled, on ARM64) but I'll update the downstream branch soon
with this version and get it pushed out to users/testers.

The Apple support works on bare metal and *should* work exactly the same
way on macOS VMs (as alluded to by Zayd in his independent submission [3]),
though I haven't personally verified this. KVM support for this is left
for a future patchset.

(Apologies for the large Cc: list; I want to make sure nobody who got
Cced on Zayd's alternate take is left out of this one.) 

[1] https://fex-emu.com/FEX-2306/
[2] https://github.com/AsahiLinux/linux/tree/bits/220-tso
[3] https://lore.kernel.org/lkml/20240410211652.16640-1-zayd_qumsieh@apple.com/

To: Catalin Marinas <catalin.marinas@arm.com>
To: Will Deacon <will@kernel.org>
To: Marc Zyngier <maz@kernel.org>
To: Mark Rutland <mark.rutland@arm.com>
Cc: Zayd Qumsieh <zayd_qumsieh@apple.com>
Cc: Justin Lu <ih_justin@apple.com>
Cc: Ryan Houdek <Houdek.Ryan@fex-emu.org>
Cc: Mark Brown <broonie@kernel.org>
Cc: Ard Biesheuvel <ardb@kernel.org>
Cc: Mateusz Guzik <mjguzik@gmail.com>
Cc: Anshuman Khandual <anshuman.khandual@arm.com>
Cc: Oliver Upton <oliver.upton@linux.dev>
Cc: Miguel Luis <miguel.luis@oracle.com>
Cc: Joey Gouly <joey.gouly@arm.com>
Cc: Christoph Paasch <cpaasch@apple.com>
Cc: Kees Cook <keescook@chromium.org>
Cc: Sami Tolvanen <samitolvanen@google.com>
Cc: Baoquan He <bhe@redhat.com>
Cc: Joel Granados <j.granados@samsung.com>
Cc: Dawei Li <dawei.li@shingroup.cn>
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: Florent Revest <revest@chromium.org>
Cc: David Hildenbrand <david@redhat.com>
Cc: Stefan Roesch <shr@devkernel.io>
Cc: Andy Chiu <andy.chiu@sifive.com>
Cc: Josh Triplett <josh@joshtriplett.org>
Cc: Oleg Nesterov <oleg@redhat.com>
Cc: Helge Deller <deller@gmx.de>
Cc: Zev Weiss <zev@bewilderbeest.net>
Cc: Ondrej Mosnacek <omosnace@redhat.com>
Cc: Miguel Ojeda <ojeda@kernel.org>
Cc: linux-arm-kernel@lists.infradead.org
Cc: linux-kernel@vger.kernel.org
Cc: Asahi Linux <asahi@lists.linux.dev>

Signed-off-by: Hector Martin <marcan@marcan.st>
---
Hector Martin (4):
      prctl: Introduce PR_{SET,GET}_MEM_MODEL
      arm64: Implement PR_{GET,SET}_MEM_MODEL for always-TSO CPUs
      arm64: Introduce scaffolding to add ACTLR_EL1 to thread state
      arm64: Implement Apple IMPDEF TSO memory model control

 arch/arm64/Kconfig                        | 14 ++++++
 arch/arm64/include/asm/apple_cpufeature.h | 15 +++++++
 arch/arm64/include/asm/cpufeature.h       | 10 +++++
 arch/arm64/include/asm/processor.h        |  3 ++
 arch/arm64/kernel/Makefile                |  3 +-
 arch/arm64/kernel/cpufeature.c            | 11 ++---
 arch/arm64/kernel/cpufeature_impdef.c     | 61 ++++++++++++++++++++++++++
 arch/arm64/kernel/process.c               | 71 +++++++++++++++++++++++++++++++
 arch/arm64/kernel/setup.c                 |  8 ++++
 arch/arm64/tools/cpucaps                  |  2 +
 include/linux/memory_ordering_model.h     | 11 +++++
 include/uapi/linux/prctl.h                |  5 +++
 kernel/sys.c                              | 21 +++++++++
 13 files changed, 229 insertions(+), 6 deletions(-)
---
base-commit: 4cece764965020c22cff7665b18a012006359095
change-id: 20240411-tso-e86fdceb94b8

Best regards,
-- 
Hector Martin <marcan@marcan.st>

^ permalink raw reply	[flat|nested] 30+ messages in thread