All of lore.kernel.org
 help / color / mirror / Atom feed
From: Amanieu d'Antras <amanieu@gmail.com>
To: unlisted-recipients:; (no To-header on input)
Cc: Amanieu d'Antras <amanieu@gmail.com>,
	Ryan Houdek <Houdek.Ryan@fex-emu.org>,
	Catalin Marinas <catalin.marinas@arm.com>,
	Will Deacon <will@kernel.org>,
	Mark Rutland <mark.rutland@arm.com>,
	Steven Price <steven.price@arm.com>,
	Arnd Bergmann <arnd@kernel.org>,
	David Laight <David.Laight@aculab.com>,
	Mark Brown <broonie@kernel.org>,
	linux-arm-kernel@lists.infradead.org,
	linux-kernel@vger.kernel.org
Subject: [RESEND PATCH v4 0/8] arm64: Allow 64-bit tasks to invoke compat syscalls
Date: Tue, 18 May 2021 10:06:50 +0100	[thread overview]
Message-ID: <20210518090658.9519-1-amanieu@gmail.com> (raw)

This series allows AArch64 tasks to perform 32-bit syscalls by setting
the top bit of x8 and using AArch32 compat syscall numbers:

    syscall(0x80000000 | __ARM_NR_write, 1, "foo\n", 4);

Internally, setting this bit does the following:
- The remainder of x8 is treated as a compat syscall number and is used
  to index the compat syscall table.
- in_compat_syscall will return true for the duration of the syscall.
- VM allocations performed by the syscall will be located in the lower
  4G of the address space. A separate compat_mmap_base is used so that
  these allocations are still properly randomized.
- Interrupted compat syscalls are properly restarted as compat syscalls.
- Seccomp will treats the syscall as having AUDIT_ARCH_ARM instead of
  AUDIT_ARCH_AARCH64. This affects the arch value seen by seccomp
  filters and reported by SIGSYS.
- PTRACE_GET_SYSCALL_INFO also treats the syscall as having
  AUDIT_ARCH_ARM. Recent versions of strace will correctly report the
  syscall name and parameters when an AArch64 task mixes 32-bit and
  64-bit syscalls.

This feature is intended for use in software compatibility layers which
emulate a 32-bit program on AArch64. This patch has been tested on two
such emulators:
- Tango [1], which enables AArch32 binaries to run on AArch64 CPUs which
  do not have hardware support for AArch32. Tango is used to run virtual
  Android devices on AArch64 servers.
- FEX [2], an emulator for running x86 and x86_64 binaries on AArch64.
  FEX can already run many x86_64 programs including 3D games, but
  requires kernel support for running 32-bit x86 binaries.

Both FEX and Tango have previously attempted to translate 32-bit
syscalls purely in user mode like QEMU does for its user mode
emulation. While this works for simple programs, there are many
limitations which cannot be solved without kernel support, for example:
- There are a huge number of ioctls which behave differently in 32-bit
  mode. It is impractical and error prone to manually emulate them all
  in user mode. Specifically, the kernel already has a well-tested and
  reliable compatibility layer and it makes sense to reuse this. QEMU
  supports emulating some ioctls in userspace but this still does not
  cover devices like GPUs which are needed for accelerated rendering.
- The 64-bit set_robust_list is not compatible with the 32-bit ABI. The
  compat version of set_robust_list must be used. Emulating this in user
  mode is not reliable since SIGKILL cannot be caught.
- io_uring uses iovec structures as part of its API, which have
  different sizes on 32-bit and 64-bit.
- ext4 represents positions in directories as 64-bit hashes, which break
  if they are truncated to 32 bits. There is special support for 32-bit
  off_t in the ext4 driver but this is only used when in_compat_syscall
  is true: https://bugzilla.kernel.org/show_bug.cgi?id=205957
- The io_setup syscall allocates a VM area for the AIO context and
  returns it. But there is no way to control where this context is
  allocated so it will almost always end up above the 4GB limit.
- Some ioctls will also perform VM allocations, with the same issues as
  io_setup. Search for "vm_mmap" in drivers/.
- Some file descriptors have alignment requirements which are not known
  to userspace. For example, a hugetlbfs file can only be mmaped at a
  huge page alignment but there is no way for userspace to know this
  when it needs to manually select an address below 4GB for the mapping.

All of these issues are solved in FEX and Tango by invoking compat
syscalls directly. In the case of FEX, there remain some differences
between the arm and x86 ABIs due to alignment issues, but these are few
enough to be individually handled in userspace.

There is a precedent for exposing this functionality to userspace:
x86_64 has 2 ways to invoke 32-bit syscalls. The first is to use int
0x80 with a 32-bit syscall number and the second is to use
__X32_SYSCALL_BIT with a 64-bit syscall number. As such, the generic
kernel code is already able to properly handle tasks that invoke both
32-bit and 64-bit syscalls.

[1] https://www.amanieusystems.com/
[2] https://github.com/FEX-Emu/FEX

Changelog since v3:
- Renamed aarch64_compat_syscall to use_compat_syscall and enable it
  permanently for AArch32 tasks.

Changelog since v2:
- Complete rewrite, based on the patch that was previously posted as:
  [PATCH v2] [RFC] arm64: Exposes support for 32-bit syscalls

Amanieu d'Antras (8):
  mm: Add arch_get_mmap_base_topdown macro
  hugetlbfs: Use arch_get_mmap_* macros
  mm: Support mmap_compat_base with the generic layout
  arm64: Separate in_compat_syscall from is_compat_task
  arm64: mm: Use HAVE_ARCH_COMPAT_MMAP_BASES
  arm64: Add a compat syscall flag to thread_info
  arm64: Forbid calling compat sigreturn from 64-bit tasks
  arm64: Allow 64-bit tasks to invoke compat syscalls

 arch/arm64/Kconfig                   |  1 +
 arch/arm64/include/asm/compat.h      | 24 ++++++++++++---
 arch/arm64/include/asm/elf.h         | 21 ++++++++++---
 arch/arm64/include/asm/ftrace.h      |  2 +-
 arch/arm64/include/asm/processor.h   | 32 +++++++++++--------
 arch/arm64/include/asm/syscall.h     |  6 ++--
 arch/arm64/include/asm/thread_info.h |  6 ++++
 arch/arm64/include/uapi/asm/unistd.h |  2 ++
 arch/arm64/kernel/ptrace.c           |  2 +-
 arch/arm64/kernel/signal.c           |  5 +++
 arch/arm64/kernel/signal32.c         |  8 +++++
 arch/arm64/kernel/syscall.c          | 23 ++++++++++++--
 arch/arm64/mm/mmap.c                 | 33 ++++++++++++++++++++
 fs/hugetlbfs/inode.c                 | 22 ++++++++++---
 mm/mmap.c                            | 14 ++++++---
 mm/util.c                            | 46 +++++++++++++++++++++++-----
 16 files changed, 202 insertions(+), 45 deletions(-)

-- 
2.31.1


WARNING: multiple messages have this Message-ID (diff)
From: Amanieu d'Antras <amanieu@gmail.com>
Cc: Amanieu d'Antras <amanieu@gmail.com>,
	Ryan Houdek <Houdek.Ryan@fex-emu.org>,
	Catalin Marinas <catalin.marinas@arm.com>,
	Will Deacon <will@kernel.org>,
	Mark Rutland <mark.rutland@arm.com>,
	Steven Price <steven.price@arm.com>,
	Arnd Bergmann <arnd@kernel.org>,
	David Laight <David.Laight@aculab.com>,
	Mark Brown <broonie@kernel.org>,
	linux-arm-kernel@lists.infradead.org,
	linux-kernel@vger.kernel.org
Subject: [RESEND PATCH v4 0/8] arm64: Allow 64-bit tasks to invoke compat syscalls
Date: Tue, 18 May 2021 10:06:50 +0100	[thread overview]
Message-ID: <20210518090658.9519-1-amanieu@gmail.com> (raw)

This series allows AArch64 tasks to perform 32-bit syscalls by setting
the top bit of x8 and using AArch32 compat syscall numbers:

    syscall(0x80000000 | __ARM_NR_write, 1, "foo\n", 4);

Internally, setting this bit does the following:
- The remainder of x8 is treated as a compat syscall number and is used
  to index the compat syscall table.
- in_compat_syscall will return true for the duration of the syscall.
- VM allocations performed by the syscall will be located in the lower
  4G of the address space. A separate compat_mmap_base is used so that
  these allocations are still properly randomized.
- Interrupted compat syscalls are properly restarted as compat syscalls.
- Seccomp will treats the syscall as having AUDIT_ARCH_ARM instead of
  AUDIT_ARCH_AARCH64. This affects the arch value seen by seccomp
  filters and reported by SIGSYS.
- PTRACE_GET_SYSCALL_INFO also treats the syscall as having
  AUDIT_ARCH_ARM. Recent versions of strace will correctly report the
  syscall name and parameters when an AArch64 task mixes 32-bit and
  64-bit syscalls.

This feature is intended for use in software compatibility layers which
emulate a 32-bit program on AArch64. This patch has been tested on two
such emulators:
- Tango [1], which enables AArch32 binaries to run on AArch64 CPUs which
  do not have hardware support for AArch32. Tango is used to run virtual
  Android devices on AArch64 servers.
- FEX [2], an emulator for running x86 and x86_64 binaries on AArch64.
  FEX can already run many x86_64 programs including 3D games, but
  requires kernel support for running 32-bit x86 binaries.

Both FEX and Tango have previously attempted to translate 32-bit
syscalls purely in user mode like QEMU does for its user mode
emulation. While this works for simple programs, there are many
limitations which cannot be solved without kernel support, for example:
- There are a huge number of ioctls which behave differently in 32-bit
  mode. It is impractical and error prone to manually emulate them all
  in user mode. Specifically, the kernel already has a well-tested and
  reliable compatibility layer and it makes sense to reuse this. QEMU
  supports emulating some ioctls in userspace but this still does not
  cover devices like GPUs which are needed for accelerated rendering.
- The 64-bit set_robust_list is not compatible with the 32-bit ABI. The
  compat version of set_robust_list must be used. Emulating this in user
  mode is not reliable since SIGKILL cannot be caught.
- io_uring uses iovec structures as part of its API, which have
  different sizes on 32-bit and 64-bit.
- ext4 represents positions in directories as 64-bit hashes, which break
  if they are truncated to 32 bits. There is special support for 32-bit
  off_t in the ext4 driver but this is only used when in_compat_syscall
  is true: https://bugzilla.kernel.org/show_bug.cgi?id=205957
- The io_setup syscall allocates a VM area for the AIO context and
  returns it. But there is no way to control where this context is
  allocated so it will almost always end up above the 4GB limit.
- Some ioctls will also perform VM allocations, with the same issues as
  io_setup. Search for "vm_mmap" in drivers/.
- Some file descriptors have alignment requirements which are not known
  to userspace. For example, a hugetlbfs file can only be mmaped at a
  huge page alignment but there is no way for userspace to know this
  when it needs to manually select an address below 4GB for the mapping.

All of these issues are solved in FEX and Tango by invoking compat
syscalls directly. In the case of FEX, there remain some differences
between the arm and x86 ABIs due to alignment issues, but these are few
enough to be individually handled in userspace.

There is a precedent for exposing this functionality to userspace:
x86_64 has 2 ways to invoke 32-bit syscalls. The first is to use int
0x80 with a 32-bit syscall number and the second is to use
__X32_SYSCALL_BIT with a 64-bit syscall number. As such, the generic
kernel code is already able to properly handle tasks that invoke both
32-bit and 64-bit syscalls.

[1] https://www.amanieusystems.com/
[2] https://github.com/FEX-Emu/FEX

Changelog since v3:
- Renamed aarch64_compat_syscall to use_compat_syscall and enable it
  permanently for AArch32 tasks.

Changelog since v2:
- Complete rewrite, based on the patch that was previously posted as:
  [PATCH v2] [RFC] arm64: Exposes support for 32-bit syscalls

Amanieu d'Antras (8):
  mm: Add arch_get_mmap_base_topdown macro
  hugetlbfs: Use arch_get_mmap_* macros
  mm: Support mmap_compat_base with the generic layout
  arm64: Separate in_compat_syscall from is_compat_task
  arm64: mm: Use HAVE_ARCH_COMPAT_MMAP_BASES
  arm64: Add a compat syscall flag to thread_info
  arm64: Forbid calling compat sigreturn from 64-bit tasks
  arm64: Allow 64-bit tasks to invoke compat syscalls

 arch/arm64/Kconfig                   |  1 +
 arch/arm64/include/asm/compat.h      | 24 ++++++++++++---
 arch/arm64/include/asm/elf.h         | 21 ++++++++++---
 arch/arm64/include/asm/ftrace.h      |  2 +-
 arch/arm64/include/asm/processor.h   | 32 +++++++++++--------
 arch/arm64/include/asm/syscall.h     |  6 ++--
 arch/arm64/include/asm/thread_info.h |  6 ++++
 arch/arm64/include/uapi/asm/unistd.h |  2 ++
 arch/arm64/kernel/ptrace.c           |  2 +-
 arch/arm64/kernel/signal.c           |  5 +++
 arch/arm64/kernel/signal32.c         |  8 +++++
 arch/arm64/kernel/syscall.c          | 23 ++++++++++++--
 arch/arm64/mm/mmap.c                 | 33 ++++++++++++++++++++
 fs/hugetlbfs/inode.c                 | 22 ++++++++++---
 mm/mmap.c                            | 14 ++++++---
 mm/util.c                            | 46 +++++++++++++++++++++++-----
 16 files changed, 202 insertions(+), 45 deletions(-)

-- 
2.31.1


_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel

             reply	other threads:[~2021-05-18  9:07 UTC|newest]

Thread overview: 40+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2021-05-18  9:06 Amanieu d'Antras [this message]
2021-05-18  9:06 ` [RESEND PATCH v4 0/8] arm64: Allow 64-bit tasks to invoke compat syscalls Amanieu d'Antras
2021-05-18  9:06 ` [RESEND PATCH v4 1/8] mm: Add arch_get_mmap_base_topdown macro Amanieu d'Antras
2021-05-18  9:06   ` Amanieu d'Antras
2021-05-18  9:06 ` [RESEND PATCH v4 2/8] hugetlbfs: Use arch_get_mmap_* macros Amanieu d'Antras
2021-05-18  9:06   ` Amanieu d'Antras
2021-05-18  9:06 ` [RESEND PATCH v4 3/8] mm: Support mmap_compat_base with the generic layout Amanieu d'Antras
2021-05-18  9:06   ` Amanieu d'Antras
2021-05-18  9:06 ` [RESEND PATCH v4 4/8] arm64: Separate in_compat_syscall from is_compat_task Amanieu d'Antras
2021-05-18  9:06   ` Amanieu d'Antras
2021-05-18  9:06 ` [RESEND PATCH v4 5/8] arm64: mm: Use HAVE_ARCH_COMPAT_MMAP_BASES Amanieu d'Antras
2021-05-18  9:06   ` Amanieu d'Antras
2021-05-18  9:06 ` [RESEND PATCH v4 6/8] arm64: Add a compat syscall flag to thread_info Amanieu d'Antras
2021-05-18  9:06   ` Amanieu d'Antras
2021-05-18  9:06 ` [RESEND PATCH v4 7/8] arm64: Forbid calling compat sigreturn from 64-bit tasks Amanieu d'Antras
2021-05-18  9:06   ` Amanieu d'Antras
2021-05-18  9:06 ` [RESEND PATCH v4 8/8] arm64: Allow 64-bit tasks to invoke compat syscalls Amanieu d'Antras
2021-05-18  9:06   ` Amanieu d'Antras
2021-05-18 13:02   ` Arnd Bergmann
2021-05-18 13:02     ` Arnd Bergmann
2021-05-18 20:26     ` David Laight
2021-05-18 20:26       ` David Laight
2021-05-18 22:41       ` Ryan Houdek
2021-05-18 22:41         ` Ryan Houdek
2021-05-18 23:51     ` Amanieu d'Antras
2021-05-18 23:51       ` Amanieu d'Antras
2021-05-19 15:30       ` Steven Price
2021-05-19 15:30         ` Steven Price
2021-05-19 16:14         ` Amanieu d'Antras
2021-05-19 16:14           ` Amanieu d'Antras
2021-05-21  8:51           ` Steven Price
2021-05-21  8:51             ` Steven Price
2021-05-21 19:18             ` Amanieu d'Antras
2021-05-21 19:18               ` Amanieu d'Antras
2021-05-24 11:20               ` Steven Price
2021-05-24 11:20                 ` Steven Price
2021-05-24 12:38                 ` David Laight
2021-05-24 12:38                   ` David Laight
2021-05-18 23:52     ` Ryan Houdek
2021-05-18 23:52       ` Ryan Houdek

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20210518090658.9519-1-amanieu@gmail.com \
    --to=amanieu@gmail.com \
    --cc=David.Laight@aculab.com \
    --cc=Houdek.Ryan@fex-emu.org \
    --cc=arnd@kernel.org \
    --cc=broonie@kernel.org \
    --cc=catalin.marinas@arm.com \
    --cc=linux-arm-kernel@lists.infradead.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=mark.rutland@arm.com \
    --cc=steven.price@arm.com \
    --cc=will@kernel.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.