All of lore.kernel.org
 help / color / mirror / Atom feed
From: "Amanieu d'Antras" <amanieu@gmail.com>
To: Steven Price <steven.price@arm.com>
Cc: Arnd Bergmann <arnd@kernel.org>,
	Ryan Houdek <Houdek.Ryan@fex-emu.org>,
	Catalin Marinas <catalin.marinas@arm.com>,
	Will Deacon <will@kernel.org>,
	Mark Rutland <mark.rutland@arm.com>,
	David Laight <David.Laight@aculab.com>,
	Mark Brown <broonie@kernel.org>,
	Linux ARM <linux-arm-kernel@lists.infradead.org>,
	Linux Kernel Mailing List <linux-kernel@vger.kernel.org>
Subject: Re: [RESEND PATCH v4 8/8] arm64: Allow 64-bit tasks to invoke compat syscalls
Date: Fri, 21 May 2021 20:18:25 +0100	[thread overview]
Message-ID: <CA+y5pbSbky2kS+O5j9bn57nROdYaYeHcd2R-46X1cc388PKOvg@mail.gmail.com> (raw)
In-Reply-To: <1c2bd27a-1c96-f154-ed18-f33630b109b1@arm.com>

On Fri, May 21, 2021 at 9:51 AM Steven Price <steven.price@arm.com> wrote:
> >> In those cases to correctly emulate seccomp, isn't Tango is going to
> >> have to implement the seccomp filter in user space?
> >
> > I have not implemented user-mode seccomp emulation because it can
> > trivially be bypassed by spawning a 64-bit child process which runs
> > outside Tango. Even when spawning another translated process, the
> > user-mode filter will not be preserved across an execve.
>
> Clearly if you have user-mode seccomp emulation then you'd hook execve
> and either install the real BPF filter (if spawning a 64 bit child
> outside Tango) or ensure that the user-mode emulation is passed on to
> the child (if running within Tango).

Spawning another process is just an example. Fundamentally, Tango is
not intended or designed to be a sandbox around the 32-bit code. For
example, many of the newer ioctls use u64 instead of a pointer type to
avoid the need for a compat_ioctl handler. This means that such ioctls
could be abused to read/write any address in the process address
space, including the code that is performing the usermode seccomp
emulation.

> You already have a potential 'issue' here of a 64 bit process setting up
> a seccomp filter and then execve()ing a 32 bit (Tango'd) process. The
> set of syscalls needed for the system which supports AArch32 natively is
> going to be different from the syscalls needed for Tango. (Fundamentally
> this is a major limitation with the whole seccomp syscall filtering
> approach).

The specific example I had in mind here is Android which installs a
global seccomp filter on the zygote process from which app processes
are forked from. This filter is designed for mixed arm32/arm64 systems
and therefore has syscall whitelists for both AArch32 and AArch64.
This filter allows 32-bit processes to spawn 64-bit processes and
vice-versa: for example, many 32-bit apps will invoke another 32-bit
executable via system() which uses a 64-bit /system/bin/sh.

> >> I guess the question comes down to how big a hole is
> >> syscall_in_tango_whitelist() - if Tango only requires a small set of
> >> syscalls then there is still some security benefit, but otherwise this
> >> doesn't seem like a particularly big benefit considering you're already
> >> going to need the BPF infrastructure in user space.
> >
> > Currently Tango only whitelists ~50 syscalls, which is small enough to
> > provide security benefits and definitely better than not supporting
> > seccomp at all.
>
> Agreed, and I don't want to imply that this approach is necessarily
> wrong. But given that the approach of getting the kernel to do the
> compat syscall filtering is not perfect, I'm not sure in itself it's a
> great justification for needing the kernel to support all the compat
> syscalls.

I feel that exposing all compat syscalls to 64-bit processes is better
than the alternative of only exposing a subset of them. Of the top of
my head I can think of quite a few compat syscalls that cannot be
fully emulated in userspace and would need to be exposed in the
kernel:
- mmap/mremap/shmat/io_setup: anything that allocates VM space needs
to return a pointer in the low 4GB.
- ioctl: too many variants to reasonably maintain a separate compat
layer in userspace.
- getdents/lseek: ext4 uses 32-bit directory offsets for 32-bit processes.
- get_robust_list/set_robust_list: different in-memory ABI for
32/64-bit processes.
- open: don't force O_LARGEFILE for 32-bit processes.
- io_uring_create: different in-memory ABI for 32/64-bit processes.
- (and possibly many others)

Also consider the churn involved when adding a new syscall which
behaves differently in compat processes: rather than just using
in_compat_syscall() or wiring up a COMPAT_SYSCALL_DEFINE, a compat
variant of this syscall would also need to be added to the 64-bit
syscall table to support translation layers like Tango and FEX.

> One other thought: I suspect in practise there aren't actually many
> variations in the BPF programs used with seccomp. It may well be quite
> possible to convert the 32-bit syscall filtering programs to filter the
> equivalent 64-bit syscalls that Tango would use. Sadly this would be
> fragile if a program used a BPF program which didn't follow the "normal"
> pattern.

This might work for simple filters that only look at the syscall
number, but becomes much harder when the filter also inspects the
syscall arguments.

WARNING: multiple messages have this Message-ID (diff)
From: "Amanieu d'Antras" <amanieu@gmail.com>
To: Steven Price <steven.price@arm.com>
Cc: Arnd Bergmann <arnd@kernel.org>,
	Ryan Houdek <Houdek.Ryan@fex-emu.org>,
	 Catalin Marinas <catalin.marinas@arm.com>,
	Will Deacon <will@kernel.org>,
	 Mark Rutland <mark.rutland@arm.com>,
	David Laight <David.Laight@aculab.com>,
	 Mark Brown <broonie@kernel.org>,
	Linux ARM <linux-arm-kernel@lists.infradead.org>,
	 Linux Kernel Mailing List <linux-kernel@vger.kernel.org>
Subject: Re: [RESEND PATCH v4 8/8] arm64: Allow 64-bit tasks to invoke compat syscalls
Date: Fri, 21 May 2021 20:18:25 +0100	[thread overview]
Message-ID: <CA+y5pbSbky2kS+O5j9bn57nROdYaYeHcd2R-46X1cc388PKOvg@mail.gmail.com> (raw)
In-Reply-To: <1c2bd27a-1c96-f154-ed18-f33630b109b1@arm.com>

On Fri, May 21, 2021 at 9:51 AM Steven Price <steven.price@arm.com> wrote:
> >> In those cases to correctly emulate seccomp, isn't Tango is going to
> >> have to implement the seccomp filter in user space?
> >
> > I have not implemented user-mode seccomp emulation because it can
> > trivially be bypassed by spawning a 64-bit child process which runs
> > outside Tango. Even when spawning another translated process, the
> > user-mode filter will not be preserved across an execve.
>
> Clearly if you have user-mode seccomp emulation then you'd hook execve
> and either install the real BPF filter (if spawning a 64 bit child
> outside Tango) or ensure that the user-mode emulation is passed on to
> the child (if running within Tango).

Spawning another process is just an example. Fundamentally, Tango is
not intended or designed to be a sandbox around the 32-bit code. For
example, many of the newer ioctls use u64 instead of a pointer type to
avoid the need for a compat_ioctl handler. This means that such ioctls
could be abused to read/write any address in the process address
space, including the code that is performing the usermode seccomp
emulation.

> You already have a potential 'issue' here of a 64 bit process setting up
> a seccomp filter and then execve()ing a 32 bit (Tango'd) process. The
> set of syscalls needed for the system which supports AArch32 natively is
> going to be different from the syscalls needed for Tango. (Fundamentally
> this is a major limitation with the whole seccomp syscall filtering
> approach).

The specific example I had in mind here is Android which installs a
global seccomp filter on the zygote process from which app processes
are forked from. This filter is designed for mixed arm32/arm64 systems
and therefore has syscall whitelists for both AArch32 and AArch64.
This filter allows 32-bit processes to spawn 64-bit processes and
vice-versa: for example, many 32-bit apps will invoke another 32-bit
executable via system() which uses a 64-bit /system/bin/sh.

> >> I guess the question comes down to how big a hole is
> >> syscall_in_tango_whitelist() - if Tango only requires a small set of
> >> syscalls then there is still some security benefit, but otherwise this
> >> doesn't seem like a particularly big benefit considering you're already
> >> going to need the BPF infrastructure in user space.
> >
> > Currently Tango only whitelists ~50 syscalls, which is small enough to
> > provide security benefits and definitely better than not supporting
> > seccomp at all.
>
> Agreed, and I don't want to imply that this approach is necessarily
> wrong. But given that the approach of getting the kernel to do the
> compat syscall filtering is not perfect, I'm not sure in itself it's a
> great justification for needing the kernel to support all the compat
> syscalls.

I feel that exposing all compat syscalls to 64-bit processes is better
than the alternative of only exposing a subset of them. Of the top of
my head I can think of quite a few compat syscalls that cannot be
fully emulated in userspace and would need to be exposed in the
kernel:
- mmap/mremap/shmat/io_setup: anything that allocates VM space needs
to return a pointer in the low 4GB.
- ioctl: too many variants to reasonably maintain a separate compat
layer in userspace.
- getdents/lseek: ext4 uses 32-bit directory offsets for 32-bit processes.
- get_robust_list/set_robust_list: different in-memory ABI for
32/64-bit processes.
- open: don't force O_LARGEFILE for 32-bit processes.
- io_uring_create: different in-memory ABI for 32/64-bit processes.
- (and possibly many others)

Also consider the churn involved when adding a new syscall which
behaves differently in compat processes: rather than just using
in_compat_syscall() or wiring up a COMPAT_SYSCALL_DEFINE, a compat
variant of this syscall would also need to be added to the 64-bit
syscall table to support translation layers like Tango and FEX.

> One other thought: I suspect in practise there aren't actually many
> variations in the BPF programs used with seccomp. It may well be quite
> possible to convert the 32-bit syscall filtering programs to filter the
> equivalent 64-bit syscalls that Tango would use. Sadly this would be
> fragile if a program used a BPF program which didn't follow the "normal"
> pattern.

This might work for simple filters that only look at the syscall
number, but becomes much harder when the filter also inspects the
syscall arguments.

_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel

  reply	other threads:[~2021-05-21 19:19 UTC|newest]

Thread overview: 40+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2021-05-18  9:06 [RESEND PATCH v4 0/8] arm64: Allow 64-bit tasks to invoke compat syscalls Amanieu d'Antras
2021-05-18  9:06 ` Amanieu d'Antras
2021-05-18  9:06 ` [RESEND PATCH v4 1/8] mm: Add arch_get_mmap_base_topdown macro Amanieu d'Antras
2021-05-18  9:06   ` Amanieu d'Antras
2021-05-18  9:06 ` [RESEND PATCH v4 2/8] hugetlbfs: Use arch_get_mmap_* macros Amanieu d'Antras
2021-05-18  9:06   ` Amanieu d'Antras
2021-05-18  9:06 ` [RESEND PATCH v4 3/8] mm: Support mmap_compat_base with the generic layout Amanieu d'Antras
2021-05-18  9:06   ` Amanieu d'Antras
2021-05-18  9:06 ` [RESEND PATCH v4 4/8] arm64: Separate in_compat_syscall from is_compat_task Amanieu d'Antras
2021-05-18  9:06   ` Amanieu d'Antras
2021-05-18  9:06 ` [RESEND PATCH v4 5/8] arm64: mm: Use HAVE_ARCH_COMPAT_MMAP_BASES Amanieu d'Antras
2021-05-18  9:06   ` Amanieu d'Antras
2021-05-18  9:06 ` [RESEND PATCH v4 6/8] arm64: Add a compat syscall flag to thread_info Amanieu d'Antras
2021-05-18  9:06   ` Amanieu d'Antras
2021-05-18  9:06 ` [RESEND PATCH v4 7/8] arm64: Forbid calling compat sigreturn from 64-bit tasks Amanieu d'Antras
2021-05-18  9:06   ` Amanieu d'Antras
2021-05-18  9:06 ` [RESEND PATCH v4 8/8] arm64: Allow 64-bit tasks to invoke compat syscalls Amanieu d'Antras
2021-05-18  9:06   ` Amanieu d'Antras
2021-05-18 13:02   ` Arnd Bergmann
2021-05-18 13:02     ` Arnd Bergmann
2021-05-18 20:26     ` David Laight
2021-05-18 20:26       ` David Laight
2021-05-18 22:41       ` Ryan Houdek
2021-05-18 22:41         ` Ryan Houdek
2021-05-18 23:51     ` Amanieu d'Antras
2021-05-18 23:51       ` Amanieu d'Antras
2021-05-19 15:30       ` Steven Price
2021-05-19 15:30         ` Steven Price
2021-05-19 16:14         ` Amanieu d'Antras
2021-05-19 16:14           ` Amanieu d'Antras
2021-05-21  8:51           ` Steven Price
2021-05-21  8:51             ` Steven Price
2021-05-21 19:18             ` Amanieu d'Antras [this message]
2021-05-21 19:18               ` Amanieu d'Antras
2021-05-24 11:20               ` Steven Price
2021-05-24 11:20                 ` Steven Price
2021-05-24 12:38                 ` David Laight
2021-05-24 12:38                   ` David Laight
2021-05-18 23:52     ` Ryan Houdek
2021-05-18 23:52       ` Ryan Houdek

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=CA+y5pbSbky2kS+O5j9bn57nROdYaYeHcd2R-46X1cc388PKOvg@mail.gmail.com \
    --to=amanieu@gmail.com \
    --cc=David.Laight@aculab.com \
    --cc=Houdek.Ryan@fex-emu.org \
    --cc=arnd@kernel.org \
    --cc=broonie@kernel.org \
    --cc=catalin.marinas@arm.com \
    --cc=linux-arm-kernel@lists.infradead.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=mark.rutland@arm.com \
    --cc=steven.price@arm.com \
    --cc=will@kernel.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.