linux-api.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: Aleksa Sarai <cyphar@cyphar.com>
To: Florian Weimer <fweimer@redhat.com>
Cc: linux-api@vger.kernel.org, linux-doc@vger.kernel.org,
	linux-kernel@vger.kernel.org, dev@opencontainers.org,
	corbet@lwn.net, Carlos O'Donell <carlos@redhat.com>
Subject: Re: [PATCH] syscalls: Document OCI seccomp filter interactions & workaround
Date: Tue, 24 Nov 2020 23:58:06 +1100	[thread overview]
Message-ID: <20201124125806.nud2x5kfvnxdagqk@yavin.dot.cyphar.com> (raw)
In-Reply-To: <87lfer2c0b.fsf@oldenburg2.str.redhat.com>

[-- Attachment #1: Type: text/plain, Size: 4046 bytes --]

On 2020-11-24, Florian Weimer <fweimer@redhat.com> wrote:
> This documents a way to safely use new security-related system calls
> while preserving compatibility with container runtimes that require
> insecure emulation (because they filter the system call by default).
> Admittedly, it is somewhat hackish, but it can be implemented by
> userspace today, for existing system calls such as faccessat2,
> without kernel or container runtime changes.
> 
> Signed-off-by: Florian Weimer <fweimer@redhat.com>
> 
> ---
>  Documentation/process/adding-syscalls.rst | 37 +++++++++++++++++++++++++++++++
>  1 file changed, 37 insertions(+)
> 
> diff --git a/Documentation/process/adding-syscalls.rst b/Documentation/process/adding-syscalls.rst
> index a3ecb236576c..7d1e578a1df1 100644
> --- a/Documentation/process/adding-syscalls.rst
> +++ b/Documentation/process/adding-syscalls.rst
> @@ -436,6 +436,40 @@ simulates registers etc).  Fixing this is as simple as adding a #define to
>  
>      #define stub_xyzzy sys_xyzzy
>  
> +Container Compatibility and seccomp
> +-----------------------------------
> +
> +The Linux Foundation Open Container Initiative Runtime Specification
> +requires that by default, implementations install seccomp system call
> +filters which cause system calls to fail with ``EPERM``.  As a result,
> +all new system calls in such containers fail with ``EPERM`` instead of
> +``ENOSYS``.  This design is problematic because ``EPERM`` is a
> +legitimate system call result which should not trigger fallback to a
> +userspace emulation, particularly for security-related system calls.
> +(With ``ENOSYS``, it is clear that a fallback implementation has to be
> +used to maintain compatibility with older kernels or container
> +runtimes.)
> +
> +New system calls should therefore provide a way to reliably trigger an
> +error distinct from ``EPERM``, without any side effects.  Some ways to
> +achieve that are:
> +
> + - ``EBADFD`` for the invalid file descriptor -1
> + - ``EFAULT`` for a null pointer
> + - ``EINVAL`` for a contradictory set of flags that will remain invalid
> +   in the future
> +
> +If a system call has such error behavior, upon encountering an
> +``EPERM`` error, userspace applications can perform further
> +invocations of the same system call to check if the ``EPERM`` error
> +persists for those known error conditions.  If those also fail with
> +``EPERM``, that likely means that the original ``EPERM`` error was the
> +result of a seccomp filter, and should be treated like ``ENOSYS``
> +(e.g., trigger an alternative fallback implementation).  If those
> +probing system calls do not fail with ``EPERM``, the error likely came
> +from a real implementation, and should be reported to the caller
> +directly, without resorting to ``ENOSYS``-style fallback.
> +

As I mentioned in the runc thread[1], this is really down to Docker's
default policy configuration. The EPERM-everything behaviour in OCI was
inherited from Docker, and it boils down to not having an additional
seccomp rule which does ENOSYS for unknown syscall numbers (Docker can
just add the rule without modifying the OCI runtime-spec -- so it's
something Docker can fix entirely on their own). I'll prepare a patch
for Docker this week.

IMHO it's also slightly overkill to change the kernel API design
guidelines in response to this issue.

[1]: https://github.com/opencontainers/runc/issues/2151

>  Other Details
>  -------------
> @@ -575,3 +609,6 @@ References and Sources
>   - Recommendation from Linus Torvalds that x32 system calls should prefer
>     compatibility with 64-bit versions rather than 32-bit versions:
>     https://lkml.org/lkml/2011/8/31/244
> + - Linux Configuration section of the Open Container Initiative
> +   Runtime Specification:
> +   https://github.com/opencontainers/runtime-spec/blob/master/config-linux.md

-- 
Aleksa Sarai
Senior Software Engineer (Containers)
SUSE Linux GmbH
<https://www.cyphar.com/>

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 228 bytes --]

  parent reply	other threads:[~2020-11-24 12:58 UTC|newest]

Thread overview: 20+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2020-11-24 12:08 [PATCH] syscalls: Document OCI seccomp filter interactions & workaround Florian Weimer
2020-11-24 12:26 ` Christian Brauner
2020-11-24 12:54   ` Florian Weimer
2020-11-24 14:08     ` Mark Wielaard
2020-11-24 16:45       ` Christoph Hellwig
2020-11-24 17:06         ` Jann Horn
2020-11-24 17:15           ` Greg KH
2020-11-24 17:21             ` Christian Brauner
2020-11-24 17:30             ` Jann Horn
2020-11-24 17:44               ` Greg KH
2020-11-24 17:47                 ` Jann Horn
2020-11-24 18:17               ` Florian Weimer
2020-11-24 18:02           ` Florian Weimer
2020-11-24 18:09       ` Florian Weimer
2020-11-24 12:58 ` Aleksa Sarai [this message]
2020-11-24 13:05   ` Florian Weimer
2020-11-24 13:37 ` Christoph Hellwig
2020-11-24 14:08   ` Florian Weimer
2020-11-24 16:46     ` Christoph Hellwig
2020-11-24 16:52       ` Florian Weimer

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20201124125806.nud2x5kfvnxdagqk@yavin.dot.cyphar.com \
    --to=cyphar@cyphar.com \
    --cc=carlos@redhat.com \
    --cc=corbet@lwn.net \
    --cc=dev@opencontainers.org \
    --cc=fweimer@redhat.com \
    --cc=linux-api@vger.kernel.org \
    --cc=linux-doc@vger.kernel.org \
    --cc=linux-kernel@vger.kernel.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).