LinuxPPC-Dev Archive on lore.kernel.org
 help / color / Atom feed
From: Aleksa Sarai <cyphar@cyphar.com>
To: Rasmus Villemoes <linux@rasmusvillemoes.dk>
Cc: linux-ia64@vger.kernel.org, linux-sh@vger.kernel.org,
	Peter Zijlstra <peterz@infradead.org>,
	Alexei Starovoitov <ast@kernel.org>,
	linux-kernel@vger.kernel.org, David Howells <dhowells@redhat.com>,
	linux-kselftest@vger.kernel.org, sparclinux@vger.kernel.org,
	Jiri Olsa <jolsa@redhat.com>,
	linux-arch@vger.kernel.org, linux-s390@vger.kernel.org,
	Tycho Andersen <tycho@tycho.ws>, Aleksa Sarai <asarai@suse.de>,
	Shuah Khan <shuah@kernel.org>,
	Alexander Shishkin <alexander.shishkin@linux.intel.com>,
	Ingo Molnar <mingo@redhat.com>,
	linux-arm-kernel@lists.infradead.org, linux-mips@vger.kernel.org,
	linux-xtensa@linux-xtensa.org, Kees Cook <keescook@chromium.org>,
	Arnd Bergmann <arnd@arndb.de>, Jann Horn <jannh@google.com>,
	linuxppc-dev@lists.ozlabs.org, linux-m68k@lists.linux-m68k.org,
	Al Viro <viro@zeniv.linux.org.uk>,
	Andy Lutomirski <luto@kernel.org>,
	Shuah Khan <skhan@linuxfoundation.org>,
	Namhyung Kim <namhyung@kernel.org>,
	David Drysdale <drysdale@google.com>,
	Christian Brauner <christian@brauner.io>,
	"J. Bruce Fields" <bfields@fieldses.org>,
	linux-parisc@vger.kernel.org, linux-api@vger.kernel.org,
	Chanho Min <chanho.min@lge.com>, Jeff Layton <jlayton@kernel.org>,
	Oleg Nesterov <oleg@redhat.com>,
	Eric Biederman <ebiederm@xmission.com>,
	linux-alpha@vger.kernel.org, linux-fsdevel@vger.kernel.org,
	Andrew Morton <akpm@linux-foundation.org>,
	Linus Torvalds <torvalds@linux-foundation.org>,
	containers@lists.linux-foundation.org
Subject: Re: [PATCH v12 01/12] lib: introduce copy_struct_{to,from}_user helpers
Date: Thu, 5 Sep 2019 19:50:26 +1000
Message-ID: <20190905095026.gjemg2gqua2vufxb@yavin.dot.cyphar.com> (raw)
In-Reply-To: <57ba3752-c4a6-d2a4-1a4d-a0e13bccd473@rasmusvillemoes.dk>

[-- Attachment #1: Type: text/plain, Size: 11250 bytes --]

On 2019-09-05, Rasmus Villemoes <linux@rasmusvillemoes.dk> wrote:
> On 04/09/2019 22.19, Aleksa Sarai wrote:
> > A common pattern for syscall extensions is increasing the size of a
> > struct passed from userspace, such that the zero-value of the new fields
> > result in the old kernel behaviour (allowing for a mix of userspace and
> > kernel vintages to operate on one another in most cases). This is done
> > in both directions -- hence two helpers -- though it's more common to
> > have to copy user space structs into kernel space.
> > 
> > Previously there was no common lib/ function that implemented
> > the necessary extension-checking semantics (and different syscalls
> > implemented them slightly differently or incompletely[1]). A future
> > patch replaces all of the common uses of this pattern to use the new
> > copy_struct_{to,from}_user() helpers.
> > 
> > [1]: For instance {sched_setattr,perf_event_open,clone3}(2) all do do
> >      similar checks to copy_struct_from_user() while rt_sigprocmask(2)
> >      always rejects differently-sized struct arguments.
> > 
> > Suggested-by: Rasmus Villemoes <linux@rasmusvillemoes.dk>
> > Signed-off-by: Aleksa Sarai <cyphar@cyphar.com>
> > ---
> > diff --git a/lib/struct_user.c b/lib/struct_user.c
> > new file mode 100644
> > index 000000000000..7301ab1bbe98
> > --- /dev/null
> > +++ b/lib/struct_user.c
> > @@ -0,0 +1,182 @@
> > +// SPDX-License-Identifier: GPL-2.0-or-later
> > +/*
> > + * Copyright (C) 2019 SUSE LLC
> > + * Copyright (C) 2019 Aleksa Sarai <cyphar@cyphar.com>
> > + */
> > +
> > +#include <linux/types.h>
> > +#include <linux/export.h>
> > +#include <linux/uaccess.h>
> > +#include <linux/kernel.h>
> > +#include <linux/string.h>
> > +
> > +#define BUFFER_SIZE 64
> > +
> > +/*
> > + * "memset(p, 0, size)" but for user space buffers. Caller must have already
> > + * checked access_ok(p, size).
> > + */
> 
> Isn't this __clear_user() exactly (perhaps except for the return value)?
> Perhaps not every arch has that?

I didn't know about clear_user() -- I will switch to it.

> > +static int __memzero_user(void __user *p, size_t s)
> > +{
> > +	const char zeros[BUFFER_SIZE] = {};
> > +	while (s > 0) {
> > +		size_t n = min(s, sizeof(zeros));
> > +
> > +		if (__copy_to_user(p, zeros, n))
> > +			return -EFAULT;
> > +
> > +		p += n;
> > +		s -= n;
> > +	}
> > +	return 0;
> > +}
> > +
> > +/**
> > + * copy_struct_to_user: copy a struct to user space
> > + * @dst:   Destination address, in user space.
> > + * @usize: Size of @dst struct.
> > + * @src:   Source address, in kernel space.
> > + * @ksize: Size of @src struct.
> > + *
> > + * Returns (in all cases, some data may have been copied):
> > + *  * -EFBIG:  (@usize < @ksize) and there are non-zero trailing bytes in @src.
> > + *  * -EFAULT: access to user space failed.
> > + */
> > +int copy_struct_to_user(void __user *dst, size_t usize,
> > +			const void *src, size_t ksize)
> > +{
> > +	size_t size = min(ksize, usize);
> > +	size_t rest = abs(ksize - usize);
> 
> Eh, I'd avoid abs() here due to the funkiness of the implicit type
> conversions - ksize-usize has type size_t, then that's coerced to an int
> (or a long maybe?), the abs is applied which return an int/long (or
> unsigned versions?). Something like "rest = max(ksize, usize) - size;"
> is more obviously correct and doesn't fall into any
> narrowing/widening/sign extending traps.

Yeah, I originally used "max(ksize, usize) - size" for that reason but
was worried it looked too funky (and some quick tests showed that abs()
gives the right results in most cases -- though I just realised it would
probably not give the right results around SIZE_MAX). I'll switch back.

> > +	if (unlikely(usize > PAGE_SIZE))
> > +		return -EFAULT;
> 
> Please don't. That is a restriction on all future extensions - once a
> kernel is shipped with a syscall using this helper with that arbitrary
> restriction in place, that syscall is forever prevented from extending
> its arg struct beyond PAGE_SIZE (which is arch-dependent anyway). Sure,
> it's hard to imagine, but who'd have thought 32 O_* or CLONE_* bits
> weren't enough for everybody?
>
> This is only for future compatibility, and if someone runs an app
> compiled against 7.3 headers on a 5.4 kernel, they probably don't care
> about performance, but they would like their app to run.

I'm not sure I agree that the limit is in place *forever* -- it's
generally not a break in compatibility to convert an error into a
success (though, there are counterexamples such as mknod(2) -- but that
was a very specific case).

You're right that it would mean that some very new code won't run on
very ancient kernels (assuming we ever pass around structs that
massive), but there should be a reasonable trade-off here IMHO.

If we allow very large sizes, a program could probably DoS the kernel by
allocating a moderately-large block of memory and then spawning a bunch
of threads that all cause the kernel to re-check that the same 1GB block
of memory is zeroed. I haven't tried, but it seems like it's best to
avoid the possibility altogether.

> > +	}
> > +	/* Copy the interoperable parts of the struct. */
> > +	if (__copy_to_user(dst, src, size))
> > +		return -EFAULT;
> 
> I think I understand why you put this last instead of handling the
> buffer in the "natural" order. However,
> I'm wondering whether we should actually do this copy before checking
> that the extra kernel bytes are 0 - the user will still be told that
> there was some extra information via the -EFBIG/-E2BIG return, but maybe
> in some cases the part he understands is good enough. But I also guess
> we have to look to existing users to see whether that would prevent them
> from being converted to using this helper.
> 
> linux-api folks, WDYT?

Regarding the order, I just copied what sched and perf already do. I
wouldn't mind doing it the other way around -- though I am a little
cautious about implicitly making guarantees like that. The syscall that
uses copy_struct_to_user() might not want to make that guarantee (it
might not make sense for them), and there are some -E2BIG returns that
won't result in data being copied (usize > PAGE_SIZE).

As for feedback, this is syscall-dependent at the moment. The sched and
perf users explicitly return the size of the kernel structure (by
overwriting uattr->size if -E2BIG is returned) for copies in either
direction. So users arguably already have some kind of feedback about
size issues. clone3() on the other hand doesn't do that (though it
doesn't copy anything to user-space so this isn't relevant to this
particular question).

Effectively, I'd like to see someone argue that this is something that
they would personally want (before we do it).

> > +	return 0;
> 
> Maybe more useful to "return size;", some users might want to know/pass
> on how much was actually copied.

Even though it is "just" min(ksize, usize), I don't see any harm in
returning it. Will do.

> > +}
> > +EXPORT_SYMBOL(copy_struct_to_user);
> 
> Can't we wait with this until a modular user shows up? The primary users
> are syscalls, which can't be modular AFAIK.

Yeah, I'll drop it. You could use them for ioctl()s but we can always
add EXPORT_SYMBOL() later.

> > +/**
> > + * copy_struct_from_user: copy a struct from user space
> > + * @dst:   Destination address, in kernel space. This buffer must be @ksize
> > + *         bytes long.
> > + * @ksize: Size of @dst struct.
> > + * @src:   Source address, in user space.
> > + * @usize: (Alleged) size of @src struct.
> > + *
> > + * Copies a struct from user space to kernel space, in a way that guarantees
> > + * backwards-compatibility for struct syscall arguments (as long as future
> > + * struct extensions are made such that all new fields are *appended* to the
> > + * old struct, and zeroed-out new fields have the same meaning as the old
> > + * struct).
> > + *
> > + * @ksize is just sizeof(*dst), and @usize should've been passed by user space.
> > + * The recommended usage is something like the following:
> > + *
> > + *   SYSCALL_DEFINE2(foobar, const struct foo __user *, uarg, size_t, usize)
> > + *   {
> > + *      int err;
> > + *      struct foo karg = {};
> > + *
> > + *      err = copy_struct_from_user(&karg, sizeof(karg), uarg, size);
> > + *      if (err)
> > + *        return err;
> > + *
> > + *      // ...
> > + *   }
> > + *
> > + * There are three cases to consider:
> > + *  * If @usize == @ksize, then it's copied verbatim.
> > + *  * If @usize < @ksize, then the user space has passed an old struct to a
> > + *    newer kernel. The rest of the trailing bytes in @dst (@ksize - @usize)
> > + *    are to be zero-filled.
> > + *  * If @usize > @ksize, then the user space has passed a new struct to an
> > + *    older kernel. The trailing bytes unknown to the kernel (@usize - @ksize)
> > + *    are checked to ensure they are zeroed, otherwise -E2BIG is returned.
> > + *
> > + * Returns (in all cases, some data may have been copied):
> > + *  * -E2BIG:  (@usize > @ksize) and there are non-zero trailing bytes in @src.
> > + *  * -E2BIG:  @usize is "too big" (at time of writing, >PAGE_SIZE).
> > + *  * -EFAULT: access to user space failed.
> > + */
> > +int copy_struct_from_user(void *dst, size_t ksize,
> > +			  const void __user *src, size_t usize)
> > +{
> > +	size_t size = min(ksize, usize);
> > +	size_t rest = abs(ksize - usize);
> 
> As above.
> 
> > +	if (unlikely(usize > PAGE_SIZE))
> > +		return -EFAULT;
> 
> As above.
> 
> > +	if (unlikely(!access_ok(src, usize)))
> > +		return -EFAULT;
> > +
> > +	/* Deal with trailing bytes. */
> > +	if (usize < ksize)
> > +		memset(dst + size, 0, rest);
> > +	else if (usize > ksize) {
> > +		const void __user *addr = src + size;
> > +		char buffer[BUFFER_SIZE] = {};
> > +
> > +		while (rest > 0) {
> > +			size_t bufsize = min(rest, sizeof(buffer));
> > +
> > +			if (__copy_from_user(buffer, addr, bufsize))
> > +				return -EFAULT;
> > +			if (memchr_inv(buffer, 0, bufsize))
> > +				return -E2BIG;
> > +
> > +			addr += bufsize;
> > +			rest -= bufsize;
> > +		}
> 
> I'd create a __user_is_zero() helper for this - that way the two
> branches in the two helpers become nicely symmetric, each just calling a
> single helper that deals appropriately with the tail. And we can discuss
> how to implement __user_is_zero() in another bikeshed.

Will do.

> 
> > +	}
> > +	/* Copy the interoperable parts of the struct. */
> > +	if (__copy_from_user(dst, src, size))
> > +		return -EFAULT;
> 
> If you do move up the __copy_to_user(), please move this as well - on
> the kernel side, we certainly don't care that we copied some bytes to a
> local buffer which we then ignore because the user had a non-zero tail.
> But if __copy_to_user() is kept last in copy_struct_to_user(), this
> should stay for symmetry.

I will keep that in mind.

-- 
Aleksa Sarai
Senior Software Engineer (Containers)
SUSE Linux GmbH
<https://www.cyphar.com/>

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 228 bytes --]

  reply index

Thread overview: 65+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2019-09-04 20:19 [PATCH v12 00/12] namei: openat2(2) path resolution restrictions Aleksa Sarai
2019-09-04 20:19 ` [PATCH v12 01/12] lib: introduce copy_struct_{to,from}_user helpers Aleksa Sarai
2019-09-04 20:48   ` [PATCH v12 01/12] lib: introduce copy_struct_{to, from}_user helpers Linus Torvalds
2019-09-04 21:00   ` [PATCH v12 01/12] lib: introduce copy_struct_{to,from}_user helpers Randy Dunlap
2019-09-05  7:32   ` Peter Zijlstra
2019-09-05  9:26     ` Aleksa Sarai
2019-09-05  9:43       ` Peter Zijlstra
2019-09-05 10:57         ` Peter Zijlstra
2019-09-11 10:37           ` Aleksa Sarai
2019-09-05 13:35         ` Aleksa Sarai
2019-09-05 17:01         ` Aleksa Sarai
2019-09-05  8:43   ` Rasmus Villemoes
2019-09-05  9:50     ` Aleksa Sarai [this message]
2019-09-05 10:45       ` Christian Brauner
2019-09-05  9:09   ` [PATCH v12 01/12] lib: introduce copy_struct_{to, from}_user helpers Andreas Schwab
2019-09-05 10:13     ` Gabriel Paubert
2019-09-05 11:05   ` [PATCH v12 01/12] lib: introduce copy_struct_{to,from}_user helpers Christian Brauner
2019-09-05 11:17     ` Rasmus Villemoes
2019-09-05 11:29       ` Christian Brauner
2019-09-05 13:40     ` Aleksa Sarai
2019-09-05 11:09   ` Christian Brauner
2019-09-05 11:27     ` Aleksa Sarai
2019-09-05 11:40       ` Christian Brauner
2019-09-05 18:07   ` Al Viro
2019-09-05 18:23     ` Christian Brauner
2019-09-05 18:28       ` Al Viro
2019-09-05 18:35         ` Christian Brauner
2019-09-05 19:56         ` Aleksa Sarai
2019-09-05 22:31           ` Al Viro
2019-09-06  7:00           ` Christian Brauner
2019-09-05 23:00     ` Aleksa Sarai
2019-09-05 23:49       ` Al Viro
2019-09-06  0:09         ` Aleksa Sarai
2019-09-06  0:14         ` Al Viro
2019-09-04 20:19 ` [PATCH v12 02/12] clone3: switch to copy_struct_from_user() Aleksa Sarai
2019-09-04 20:19 ` [PATCH v12 03/12] sched_setattr: switch to copy_struct_{to, from}_user() Aleksa Sarai
2019-09-04 20:19 ` [PATCH v12 04/12] perf_event_open: switch to copy_struct_from_user() Aleksa Sarai
2019-09-04 20:19 ` [PATCH v12 05/12] namei: obey trailing magic-link DAC permissions Aleksa Sarai
2019-09-17 21:30   ` Jann Horn
2019-09-18 13:51     ` Aleksa Sarai
2019-09-18 15:46       ` Aleksa Sarai
2019-09-04 20:19 ` [PATCH v12 06/12] procfs: switch magic-link modes to be more sane Aleksa Sarai
2019-09-04 20:19 ` [PATCH v12 07/12] open: O_EMPTYPATH: procfs-less file descriptor re-opening Aleksa Sarai
2019-09-04 20:19 ` [PATCH v12 08/12] namei: O_BENEATH-style path resolution flags Aleksa Sarai
2019-09-04 20:19 ` [PATCH v12 09/12] namei: LOOKUP_IN_ROOT: chroot-like path resolution Aleksa Sarai
2019-09-04 20:19 ` [PATCH v12 10/12] namei: aggressively check for nd->root escape on ".." resolution Aleksa Sarai
2019-09-04 21:09   ` Linus Torvalds
2019-09-04 21:35     ` Linus Torvalds
2019-09-04 21:36       ` Linus Torvalds
2019-09-04 21:48     ` Aleksa Sarai
2019-09-04 22:16       ` Linus Torvalds
2019-09-04 22:31       ` David Howells
2019-09-04 22:38         ` Linus Torvalds
2019-09-04 23:29           ` Al Viro
2019-09-04 23:44             ` Linus Torvalds
2019-09-04 20:19 ` [PATCH v12 11/12] open: openat2(2) syscall Aleksa Sarai
2019-09-04 21:00   ` Randy Dunlap
2019-09-07 12:40   ` Jeff Layton
2019-09-07 16:58     ` Linus Torvalds
2019-09-07 17:42       ` Andy Lutomirski
2019-09-07 17:45         ` Linus Torvalds
2019-09-07 18:15           ` Andy Lutomirski
2019-09-10  6:35           ` Ingo Molnar
2019-09-08 16:24     ` Aleksa Sarai
2019-09-04 20:19 ` [PATCH v12 12/12] selftests: add openat2(2) selftests Aleksa Sarai

Reply instructions:

You may reply publically to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20190905095026.gjemg2gqua2vufxb@yavin.dot.cyphar.com \
    --to=cyphar@cyphar.com \
    --cc=akpm@linux-foundation.org \
    --cc=alexander.shishkin@linux.intel.com \
    --cc=arnd@arndb.de \
    --cc=asarai@suse.de \
    --cc=ast@kernel.org \
    --cc=bfields@fieldses.org \
    --cc=chanho.min@lge.com \
    --cc=christian@brauner.io \
    --cc=containers@lists.linux-foundation.org \
    --cc=dhowells@redhat.com \
    --cc=drysdale@google.com \
    --cc=ebiederm@xmission.com \
    --cc=jannh@google.com \
    --cc=jlayton@kernel.org \
    --cc=jolsa@redhat.com \
    --cc=keescook@chromium.org \
    --cc=linux-alpha@vger.kernel.org \
    --cc=linux-api@vger.kernel.org \
    --cc=linux-arch@vger.kernel.org \
    --cc=linux-arm-kernel@lists.infradead.org \
    --cc=linux-fsdevel@vger.kernel.org \
    --cc=linux-ia64@vger.kernel.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-kselftest@vger.kernel.org \
    --cc=linux-m68k@lists.linux-m68k.org \
    --cc=linux-mips@vger.kernel.org \
    --cc=linux-parisc@vger.kernel.org \
    --cc=linux-s390@vger.kernel.org \
    --cc=linux-sh@vger.kernel.org \
    --cc=linux-xtensa@linux-xtensa.org \
    --cc=linux@rasmusvillemoes.dk \
    --cc=linuxppc-dev@lists.ozlabs.org \
    --cc=luto@kernel.org \
    --cc=mingo@redhat.com \
    --cc=namhyung@kernel.org \
    --cc=oleg@redhat.com \
    --cc=peterz@infradead.org \
    --cc=shuah@kernel.org \
    --cc=skhan@linuxfoundation.org \
    --cc=sparclinux@vger.kernel.org \
    --cc=torvalds@linux-foundation.org \
    --cc=tycho@tycho.ws \
    --cc=viro@zeniv.linux.org.uk \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

LinuxPPC-Dev Archive on lore.kernel.org

Archives are clonable:
	git clone --mirror https://lore.kernel.org/linuxppc-dev/0 linuxppc-dev/git/0.git

	# If you have public-inbox 1.1+ installed, you may
	# initialize and index your mirror using the following commands:
	public-inbox-init -V2 linuxppc-dev linuxppc-dev/ https://lore.kernel.org/linuxppc-dev \
		linuxppc-dev@lists.ozlabs.org linuxppc-dev@ozlabs.org
	public-inbox-index linuxppc-dev

Example config snippet for mirrors

Newsgroup available over NNTP:
	nntp://nntp.lore.kernel.org/org.ozlabs.lists.linuxppc-dev


AGPL code for this site: git clone https://public-inbox.org/public-inbox.git