archive mirror
 help / color / mirror / Atom feed
From: Aleksa Sarai <>
To: Linus Torvalds <>
Cc: Al Viro <>,
	Jeff Layton <>,
	"J. Bruce Fields" <>,
	Arnd Bergmann <>,
	David Howells <>,
	Christian Brauner <>,
	Eric Biederman <>,
	Andy Lutomirski <>,
	Andrew Morton <>,
	Alexei Starovoitov <>,
	Kees Cook <>, Jann Horn <>,
	Tycho Andersen <>,
	David Drysdale <>,
	Chanho Min <>, Oleg Nesterov <>,
	Aleksa Sarai <>,
	Linux Containers <>,
	linux-fsdevel <>,
	Linux API <>,
	Linux List Kernel Mailing <>,
	linux-arch <>
Subject: Re: [PATCH v7 5/5] namei: resolveat(2) syscall
Date: Sun, 26 May 2019 15:46:38 +1000	[thread overview]
Message-ID: <20190526054638.jftyco7nviurn47h@yavin> (raw)
In-Reply-To: <>

[-- Attachment #1: Type: text/plain, Size: 6565 bytes --]

On 2019-05-25, Linus Torvalds <> wrote:
> In fact, I think resolveat() as a model is fundamentally wrong for yet
> another reason: O_CREAT. If you want to _create_ a new file, and you
> want to still have the path resolution modifiers in place, the
> resolveat() model is broken, because it only gives you path resolution
> for the lookup, and then when you do openat(O_CREAT) for the final
> component, you now don't have any way to limit that last component.
> Sure,  you can probably effectively hack around it with resolveat() on
> everything but the last component, and then
> openat(O_CREAT|O_EXCL|O_NOFOLLOW) on the last component, because the
> O_EXCL|O_NOFOLLOW should mean that it won't do any of the unsafe
> things. And then (if you didn't actually want the O_EXCL), you handle
> the race between "somebody else got there first" by re-trying etc. So
> I suspect the O_CREAT thing could be worked around with extra
> complexity, but it's an example of how the O_PATH model really screws
> you over.
> End result: I really think resolveat() is broken. It absolutely
> *needs* to be a full-fledged "openat()" system call, just with added
> path resolution flags.

That's a very good point. I was starting to work on O_CREAT via
resolveat(2) and it definitely was much harder than most people would be
happy to deal with -- I'm not even sure that I handled all the cases.

I'll go for an openat2(2) then. Thinking about it some more -- since
it's a new syscall, I could actually implement the O_PATH link-mode as
being just the regular mode argument (since openat(2) ignores the mode
for non-O_CREAT anyway). The openat(2) wrapper might be more than
one-line as a result but it should avoid polluting the resolution flags
with mode flags (since openat(O_PATH) needs to have a g+rwx mode for

> >   openat2(rootfd, "path/to/file", O_PATH, RESOLVE_IN_ROOT, &statbuf);
> Note that for O_CREAT, it either needs the 'mode' parameter too, or
> the statbuf would need to be an in-out thing. I think the in-out model
> might be nice (the varargs model with a conditional 'mode' parameter
> is horrid, I think), but at some point it's just bike-shedding.
> Also, I'm not absolutely married to the statbuf, but I do think it
> might be a useful extension. A *lot* of users need the size of the
> file for subsequent mmap() calls (or for buffer management for
> read/write interface) or for things like just headers (ie
> "Content-length:" in html etc).
> I'm not sure people actually want a full 'struct stat', but people
> historically also do st_ino/st_dev for indexing into existing
> user-space caches (or to check permissions like that it's on the right
> filesystem, but the resolve flags kind of make that less of an issue).
> And st_mode to verify that it's a proper regular file etc etc.
> You can always just leave it as NULL if you don't care, there's almost
> no downside to adding it, and I do think that people who want a "walk
> pathname carefully" model almost always would want to also check the
> end result anyway.

Yeah, I agree -- most folks would want to double-check what they've
opened or O_PATH'd. Though I'm still not clear what is the best way of
doing the "stat" argument is -- especially given how much fun
architecture-specific shenanigans are in fs/stat.c (should I only use
cp_new_stat64 or have a separate 64-bit syscall).

> > Is there a large amount of overhead or downside to the current
> > open->fstat way of doing things, or is this more of a "if we're going to
> > add more ways of opening we might as well add more useful things"?
> Right now, system calls are sadly very expensive on a lot of hardware.
> We used to be very proud of the fact that Linux system calls were
> fast, but with meltdown and retpoline etc, we're back to "system calls
> can be several thousand cycles each, just in overhead, on commonly
> available hardware".
> Is "several thousand cycles" fatal? Not necessarily. But I think that
> if we do add a new system call, particularly a fancy one for special
> security-conscious models, we should look at what people need and use,
> and want. And performance should always be a concern.
> I realize that people who come at this from primarily just a security
> issue background may think that security is the primary goal. But no.
> Security always needs to realize that the _primary_ goal is to have
> people _use_ it. Without users, security is entirely pointless. And
> the users part is partly performance, but mostly "it's convenient".

Yup, I agree. I was hoping to shunt most of the convenience to userspace
to avoid ruffling feathers in VFS-land, but I'm more than happy to make
the kernel ABI more convenient.

> The whole "this is Linux-specific" is a big inconvenience point

I hope that other OSes will take our lead and have a similar interface
so that the particular inconvenience can go away eventually (this was
one of the arguments for resolveat(2) -- it's a clear "this is a new
idea" interface rather than mixing it with other O_* flags).

> Talking about securely opening things - another flag that we may want
> to avoid issues is a "don't open device nodes" flag. Sure, O_NONBLOCK
> plus checking the st_mode of the result is usually sufficient, but
> it's actually fairly easy to get that wrong. Things like /dev/tty and
> /dev/zero often need to be available for various reasons, and have
> been used to screw careless "open and read" users up that missed a
> check.

Sure, this could be added -- though I'm sure folks would have
disagreements over whether it should be a resolution flag on an open

> I also do wonder that if the only actual user-facing interface for the
> resolution flags is a new system call, should we not make the
> *default* value be "don't open anything odd at all".
> So instead of saying RESOLVE_XDEV for "don't cross mount points",
> maybe the flags should be the other way around, and say "yes, allow
> mount point crossings", and "yes, explicitly allow device node
> opening", and "yes, allow DOTDOT" etc.

This seems like a reasonable default, except for the cases of
RESOLVE_BENEATH and RESOLVE_IN_ROOT (that would make using AT_FDCWD more
cumbersome than necessary). But other than that, I'd be happy to invert
all the other flags.

Aleksa Sarai
Senior Software Engineer (Containers)
SUSE Linux GmbH

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 833 bytes --]

      reply	other threads:[~2019-05-26  5:47 UTC|newest]

Thread overview: 10+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2019-05-07 16:43 [PATCH v7 0/5] namei: resolveat(2) path resolution restriction API Aleksa Sarai
2019-05-07 16:43 ` [PATCH v7 1/5] namei: split out nd->dfd handling to dirfd_path_init Aleksa Sarai
2019-05-07 16:43 ` [PATCH v7 2/5] namei: O_BENEATH-style path resolution flags Aleksa Sarai
2019-05-07 16:43 ` [PATCH v7 3/5] namei: LOOKUP_IN_ROOT: chroot-like path resolution Aleksa Sarai
2019-05-07 16:43 ` [PATCH v7 4/5] namei: aggressively check for nd->root escape on ".." resolution Aleksa Sarai
2019-05-07 16:43 ` [PATCH v7 5/5] namei: resolveat(2) syscall Aleksa Sarai
2019-05-24 22:59   ` Linus Torvalds
2019-05-25  7:03     ` Aleksa Sarai
2019-05-25 16:54       ` Linus Torvalds
2019-05-26  5:46         ` Aleksa Sarai [this message]

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20190526054638.jftyco7nviurn47h@yavin \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).