* Re: forkat(int pidfd), execveat(int pidfd), other awful things?
2021-02-01 17:47 forkat(int pidfd), execveat(int pidfd), other awful things? Jason A. Donenfeld
@ 2021-02-01 17:51 ` Jason A. Donenfeld
2021-02-01 18:20 ` Christian Brauner
` (2 subsequent siblings)
3 siblings, 0 replies; 13+ messages in thread
From: Jason A. Donenfeld @ 2021-02-01 17:51 UTC (permalink / raw)
To: Kernel Hardening, Andy Lutomirski; +Cc: LKML, Jann Horn, Christian Brauner
> int execve_parent(int parent_pidfd, int root_dirfd, int cgroup_fd, int
> namespace_fd, const char *pathname, char *const argv[], char *const
> envp[]);
A variant on the same scheme would be:
int execve_remote(int pidfd, int root_dirfd, int cgroup_fd, int
namespace_fd, const char *pathname, char *const argv[], char *const
envp[]);
Unpriv'd process calls fork(), and from that fork sends its pidfd
through a unix socket to systemd-sudod, which then calls execve_remote
on that pidfd.
There are a lot of (potentially very bad) ways to skin this cat.
^ permalink raw reply [flat|nested] 13+ messages in thread
* Re: forkat(int pidfd), execveat(int pidfd), other awful things?
2021-02-01 17:47 forkat(int pidfd), execveat(int pidfd), other awful things? Jason A. Donenfeld
2021-02-01 17:51 ` Jason A. Donenfeld
@ 2021-02-01 18:20 ` Christian Brauner
2021-02-01 18:29 ` Andy Lutomirski
2021-02-01 18:32 ` forkat(int pidfd), execveat(int pidfd), other awful things? Casey Schaufler
3 siblings, 0 replies; 13+ messages in thread
From: Christian Brauner @ 2021-02-01 18:20 UTC (permalink / raw)
To: Jason A. Donenfeld; +Cc: Kernel Hardening, Andy Lutomirski, LKML, Jann Horn
On Mon, Feb 01, 2021 at 06:47:17PM +0100, Jason A. Donenfeld wrote:
> Hi Andy & others,
>
> I was reversing some NT stuff recently and marveling over how wild and
> crazy things are over in Windows-land. A few things related to process
> creation caught my interest:
>
> - It's possible to create a new process with an *arbitrary parent
> process*, which means it'll then inherit various things like handles
> and security attributes and tokens from that new parent process.
>
> - It's possible to create a new process with the memory space handle
> of a different process. Consider this on Linux, and you have some
> abomination like `forkat(int pidfd)`.
>
> The big question is "why!?" At first I was just amused by its presence
> in NT. Everything is an object and you can usually freely mix and
> match things, and it's very flexible, which is cool. But this is NT,
> not Linux.
>
> Jann and I were discussing, though, that maybe some variant of these
> features might be useful to get rid of setuid executables. Imagine
> something like `systemd-sudod`, forked off of PID 1 very early.
> Subsequently all new processes on the system run with
> PR_SET_NO_NEW_PRIVS or similar policies to prevent non-root->root
> transition. Then, if you want to transition, you ask systemd-sudod (or
> polkitd, or whatever else you have in mind) to make you a new process,
> and it then does the various policy checks, and executes a new process
> for you as the parent of the requesting process.
>
> So how would that work? Well, executing processes with arbitrary
> parents would be part of it, as above. But we'd probably want to more
> carefully control that new process. Which chroot is it in? How do
> cgroups work? And so on. And ultimately this design leads to something
> like ZwCreateProcess, where you have several arguments, each to a
> handle to some part of the new process state, or null to be inherited
> from its parent.
>
> int execve_parent(int parent_pidfd, int root_dirfd, int cgroup_fd, int
> namespace_fd, const char *pathname, char *const argv[], char *const
> envp[]);
>
> One could imagine this growing pretty unwieldy. There's also this
> other design aspect of Linux that's worth considering. Namespaces and
> other process-inherited resources are generally hierarchical, with
> children getting the resource from their parent. This makes sense and
> is simple to conceptualize. Everytime we add a new thing_fd as a
> pointer to one of these resources, and allow it to be used outside of
> that hierarchy, it introduces a kind of "escape hatch". That might be
> considered "bad design" by some; it might not be by others. Seen this
> way, NT is one massive escape hatch, with pretty much everything being
> an object with a handle.
>
> But! Maybe this is nonetheless an interesting design avenue to
> explore. The introduction of pidfd is sort of just the "beginning" of
> that kind of design.
>
> Is any of this interesting to you as a future of privilege escalation
> and management on Linux?
A bunch of this was discussed in a breakout room during Linux Plumbers
last year and I also had discussions with Lennart about this a little
while ago.
One API I had proposed was to extend pidfd_open() to give you a
pidfd that does not yet refer to any process, i.e. instead of
int pidfd = pidfd_open(1234, 0);
you could do
int pidfd = pidfd_open(-1/-ESRCH, 0);
which would give you an empty process handle without any mentionable
properties.
A simple/dumb design would then be to let clone3() not just return
pidfds but also take pidfds as an argument. You could then hand-off the
pidfd to another process SCM_RIGHTS/pidfd_getfd() and have it create a
process for you with the privileges of the caller, you'd still be the
parent.
Or in addition to pidfd_open() we add new syscalls to configure a
process context pidfd_configure() or sm. This design I initially
proposed before we ended up with what we have now.
So yes, I would love to have at least the concept to create a process
for another process, delegated fork, essentially.
Christian
^ permalink raw reply [flat|nested] 13+ messages in thread
* Re: forkat(int pidfd), execveat(int pidfd), other awful things?
2021-02-01 17:47 forkat(int pidfd), execveat(int pidfd), other awful things? Jason A. Donenfeld
2021-02-01 17:51 ` Jason A. Donenfeld
2021-02-01 18:20 ` Christian Brauner
@ 2021-02-01 18:29 ` Andy Lutomirski
2021-02-02 9:23 ` David Laight
2021-02-01 18:32 ` forkat(int pidfd), execveat(int pidfd), other awful things? Casey Schaufler
3 siblings, 1 reply; 13+ messages in thread
From: Andy Lutomirski @ 2021-02-01 18:29 UTC (permalink / raw)
To: Jason A. Donenfeld; +Cc: Kernel Hardening, LKML, Jann Horn, Christian Brauner
On Mon, Feb 1, 2021 at 9:47 AM Jason A. Donenfeld <Jason@zx2c4.com> wrote:
>
> Hi Andy & others,
>
> I was reversing some NT stuff recently and marveling over how wild and
> crazy things are over in Windows-land. A few things related to process
> creation caught my interest:
>
> - It's possible to create a new process with an *arbitrary parent
> process*, which means it'll then inherit various things like handles
> and security attributes and tokens from that new parent process.
>
> - It's possible to create a new process with the memory space handle
> of a different process. Consider this on Linux, and you have some
> abomination like `forkat(int pidfd)`.
My general thought is that this is an excellent idea, but maybe not
quite in this form. I do rather like a lot about the NT design,
although I have to say that their actual taste in the structures
passed into APIs is baroque at best.
If we're going to do this, though, can we stay away from fork and and
exec entirely? Fork is cute but inefficient, and exec is the source
of neverending complexity and bugs in the kernel. But I also think
that whole project can be decoupled into two almost-orthogonal pieces:
1. Inserting new processes into unusual places in the process tree.
The only part of setuid that really needs kernel help to replace is
for the daemon to be able to make its newly-spawned child be a child
of the process that called out to the daemon. Christian's pidfd
proposal could help here, and there could be a new API that is only a
minor tweak to existing fork/exec to fork-and-reparent.
2. A sane process creation API. It would be delightful to be able to
create a fully-specified process without forking. This might end up
being a fairly complicated project, though -- there are a lot of
inherited process properties to be enumerated.
(Bonus #3): binfmts are a pretty big attack surface. Having a way to
handle all the binfmt magic in userspace might be a nice extension to
#2.
--Andy
^ permalink raw reply [flat|nested] 13+ messages in thread
* RE: forkat(int pidfd), execveat(int pidfd), other awful things?
2021-02-01 18:29 ` Andy Lutomirski
@ 2021-02-02 9:23 ` David Laight
2021-07-28 16:37 ` Leveraging pidfs for process creation without fork John Cotton Ericson
0 siblings, 1 reply; 13+ messages in thread
From: David Laight @ 2021-02-02 9:23 UTC (permalink / raw)
To: 'Andy Lutomirski', Jason A. Donenfeld
Cc: Kernel Hardening, LKML, Jann Horn, Christian Brauner
From: Andy Lutomirski
> Sent: 01 February 2021 18:30
...
> 2. A sane process creation API. It would be delightful to be able to
> create a fully-specified process without forking. This might end up
> being a fairly complicated project, though -- there are a lot of
> inherited process properties to be enumerated.
Since you are going to (eventually) load in a program image
have to do several system calls to create the process isn't
likely to be a problem.
So using separate calls for each property isn't really an issue
and solves the horrid problem of the API structure.
So you could create an embryonic process that inherits a lot
of stuff from the current process, the do actions that
sort out the fds, argv, namespace etc.
Finally running the new program.
It would probably make implement posix_spawn() easier.
David
-
Registered Address Lakeside, Bramley Road, Mount Farm, Milton Keynes, MK1 1PT, UK
Registration No: 1397386 (Wales)
^ permalink raw reply [flat|nested] 13+ messages in thread
* Leveraging pidfs for process creation without fork
2021-02-02 9:23 ` David Laight
@ 2021-07-28 16:37 ` John Cotton Ericson
2021-07-29 14:24 ` Christian Brauner
0 siblings, 1 reply; 13+ messages in thread
From: John Cotton Ericson @ 2021-07-28 16:37 UTC (permalink / raw)
To: LKML
Cc: David Laight, Andy Lutomirski, Jason A. Donenfeld,
Kernel Hardening, Jann Horn, Christian Brauner
Hi,
I was excited to learn about about pidfds the other day, precisely in
hopes that it would open the door to such a "sane process creation API".
I searched the LKML, found this thread, and now hope to rekindle the
discussion; my apologies if there has been more discussion since that I
missed and I am making redundant noise.
----
On Tue, Feb 2, 2021, at 4:23 AM, David Laight wrote:
> From: Andy Lutomirski
> > Sent: 01 February 2021 18:30
> ...
> > 2. A sane process creation API. It would be delightful to be able to
> > create a fully-specified process without forking. This might end up
> > being a fairly complicated project, though -- there are a lot of
> > inherited process properties to be enumerated.
>
> Since you are going to (eventually) load in a program image
> have to do several system calls to create the process isn't
> likely to be a problem.
> So using separate calls for each property isn't really an issue
> and solves the horrid problem of the API structure.
I definitely concur creating an embryonic process and then setting the
properties sounds separately like the right approach. I'm no expert, but
I gather from afar that between BPF and io_uring, plenty of people are
investigating general methods of batched/pipelined communication with
the kernel, and so there's little reason to go around making more ad-hoc
mammoth syscalls for specific sets of tasks.
----
> So you could create an embryonic process that inherits a lot
> of stuff from the current process, the do actions that
> sort out the fds, argv, namespace etc.
> Finally running the new program.
All that sounds good, but I wonder if it would be possible to have a
flag such that inheritance (where practical) would *not* be the default
for new processes. I'm convinced that better security will always be an
uphill battle until privileges/capabilities/resources are *not* shared
by default. Only when more sharing requires monotonically more
programmer effort will productivity/laziness align with the principle of
least privilege.
With fork/exec, there's no good way to achieve this, I think it's safe
to say. But with the embryonic processes method, where one has the
ability to e.g. set/unset file descriptors on the embryo under
construction, it seems quite natural.
This is one wrinkle of interface evolution --- as new sandboxing
mechanisms / namespaces are created, we would either need to create
yet-new "no really, default no-share" flags, or arguably be causing API
breakage as previously "leaking" privileges are patched up. I am hopeful
that either having versioned flags, or thoroughly documenting up-front
that the exact behavior is subject to change as "leaks are plugged" is
OK, but I recognize that the former might be too much complexity and the
latter to weasel-wordy, and therefore the whole idea of "opt-in sharing
only" will have to wait.
----
The security <-> ergonomics aspect is the main point of interest for me,
but there a few random ideas:
1. I originally thought an fd to an embryonic process should in fact
point to the task_struct rather than pid, since there is no risk of the
data becoming useless asynchronously --- an embryonic process is never
scheduled and cannot do anything like exiting on it's own. But there is
no reason an embryonic process need start with just one thread, so
allowing entire embryonic thread groups might actually be virtuous. I
don't know for sure, but I figure in that case it is simpler to just
stick with the pid indirection.
2. Embryonic processes can be "forked at rest" (i.e. just duplicated),
which would allow a regime where they are used as templates for process
creation, duplicated ("forked at rest"), and sent around for other tasks
to spawn processes themselves. If my idea for "opt-in sharing only"
fails per the above, sending around an "as isolated as possible" embryo
template could be a decent fallback.
That's all I got. I hope continuing this design process is of interest
to others.
Cheers,
John
^ permalink raw reply [flat|nested] 13+ messages in thread
* Re: Leveraging pidfs for process creation without fork
2021-07-28 16:37 ` Leveraging pidfs for process creation without fork John Cotton Ericson
@ 2021-07-29 14:24 ` Christian Brauner
2021-07-29 14:54 ` John Ericson
2021-07-30 1:41 ` Al Viro
0 siblings, 2 replies; 13+ messages in thread
From: Christian Brauner @ 2021-07-29 14:24 UTC (permalink / raw)
To: John Cotton Ericson
Cc: LKML, David Laight, Andy Lutomirski, Jason A. Donenfeld,
Kernel Hardening, Jann Horn, Christian Brauner
On Wed, Jul 28, 2021 at 12:37:57PM -0400, John Cotton Ericson wrote:
> Hi,
>
> I was excited to learn about about pidfds the other day, precisely in hopes
> that it would open the door to such a "sane process creation API". I
> searched the LKML, found this thread, and now hope to rekindle the
> discussion; my apologies if there has been more discussion since that I
Yeah, I haven't forgotten this discussion. A proposal is on my todo list
for this year. So far I've scheduled some time to work on this in the
fall.
Thanks!
Christian
^ permalink raw reply [flat|nested] 13+ messages in thread
* Re: Leveraging pidfs for process creation without fork
2021-07-29 14:24 ` Christian Brauner
@ 2021-07-29 14:54 ` John Ericson
2021-07-30 1:41 ` Al Viro
1 sibling, 0 replies; 13+ messages in thread
From: John Ericson @ 2021-07-29 14:54 UTC (permalink / raw)
To: Christian Brauner
Cc: LKML, David Laight, Andy Lutomirski, Jason A. Donenfeld,
Kernel Hardening, Jann Horn, Christian Brauner
Wonderful, looking forward to it reading it then!
John
On Thu, Jul 29, 2021, at 10:24 AM, Christian Brauner wrote:
> On Wed, Jul 28, 2021 at 12:37:57PM -0400, John Cotton Ericson wrote:
> > Hi,
> >
> > I was excited to learn about about pidfds the other day, precisely in hopes
> > that it would open the door to such a "sane process creation API". I
> > searched the LKML, found this thread, and now hope to rekindle the
> > discussion; my apologies if there has been more discussion since that I
>
> Yeah, I haven't forgotten this discussion. A proposal is on my todo list
> for this year. So far I've scheduled some time to work on this in the
> fall.
>
> Thanks!
> Christian
^ permalink raw reply [flat|nested] 13+ messages in thread
* Re: Leveraging pidfs for process creation without fork
2021-07-29 14:24 ` Christian Brauner
2021-07-29 14:54 ` John Ericson
@ 2021-07-30 1:41 ` Al Viro
[not found] ` <1468d75c-57ae-42aa-85ce-2bee8d403763@www.fastmail.com>
1 sibling, 1 reply; 13+ messages in thread
From: Al Viro @ 2021-07-30 1:41 UTC (permalink / raw)
To: Christian Brauner
Cc: John Cotton Ericson, LKML, David Laight, Andy Lutomirski,
Jason A. Donenfeld, Kernel Hardening, Jann Horn,
Christian Brauner
On Thu, Jul 29, 2021 at 04:24:15PM +0200, Christian Brauner wrote:
> On Wed, Jul 28, 2021 at 12:37:57PM -0400, John Cotton Ericson wrote:
> > Hi,
> >
> > I was excited to learn about about pidfds the other day, precisely in hopes
> > that it would open the door to such a "sane process creation API". I
> > searched the LKML, found this thread, and now hope to rekindle the
> > discussion; my apologies if there has been more discussion since that I
>
> Yeah, I haven't forgotten this discussion. A proposal is on my todo list
> for this year. So far I've scheduled some time to work on this in the
> fall.
Keep in mind that quite a few places in kernel/exit.c very much rely upon the
lack of anything outside of thread group adding threads into it. Same for
fs/exec.c.
^ permalink raw reply [flat|nested] 13+ messages in thread
* Re: forkat(int pidfd), execveat(int pidfd), other awful things?
2021-02-01 17:47 forkat(int pidfd), execveat(int pidfd), other awful things? Jason A. Donenfeld
` (2 preceding siblings ...)
2021-02-01 18:29 ` Andy Lutomirski
@ 2021-02-01 18:32 ` Casey Schaufler
3 siblings, 0 replies; 13+ messages in thread
From: Casey Schaufler @ 2021-02-01 18:32 UTC (permalink / raw)
To: Jason A. Donenfeld, Kernel Hardening, Andy Lutomirski
Cc: LKML, Jann Horn, Christian Brauner
On 2/1/2021 9:47 AM, Jason A. Donenfeld wrote:
> Hi Andy & others,
>
> I was reversing some NT stuff recently and marveling over how wild and
> crazy things are over in Windows-land. A few things related to process
> creation caught my interest:
>
> - It's possible to create a new process with an *arbitrary parent
> process*, which means it'll then inherit various things like handles
> and security attributes and tokens from that new parent process.
>
> - It's possible to create a new process with the memory space handle
> of a different process. Consider this on Linux, and you have some
> abomination like `forkat(int pidfd)`.
>
> The big question is "why!?" At first I was just amused by its presence
> in NT. Everything is an object and you can usually freely mix and
> match things, and it's very flexible, which is cool. But this is NT,
> not Linux.
>
> Jann and I were discussing, though, that maybe some variant of these
> features might be useful to get rid of setuid executables. Imagine
> something like `systemd-sudod`, forked off of PID 1 very early.
> Subsequently all new processes on the system run with
> PR_SET_NO_NEW_PRIVS or similar policies to prevent non-root->root
> transition. Then, if you want to transition, you ask systemd-sudod (or
> polkitd, or whatever else you have in mind) to make you a new process,
> and it then does the various policy checks, and executes a new process
> for you as the parent of the requesting process.
>
> So how would that work? Well, executing processes with arbitrary
> parents would be part of it, as above. But we'd probably want to more
> carefully control that new process. Which chroot is it in? How do
> cgroups work? And so on. And ultimately this design leads to something
> like ZwCreateProcess, where you have several arguments, each to a
> handle to some part of the new process state, or null to be inherited
> from its parent.
>
> int execve_parent(int parent_pidfd, int root_dirfd, int cgroup_fd, int
> namespace_fd, const char *pathname, char *const argv[], char *const
> envp[]);
>
> One could imagine this growing pretty unwieldy. There's also this
> other design aspect of Linux that's worth considering. Namespaces and
> other process-inherited resources are generally hierarchical, with
> children getting the resource from their parent. This makes sense and
> is simple to conceptualize. Everytime we add a new thing_fd as a
> pointer to one of these resources, and allow it to be used outside of
> that hierarchy, it introduces a kind of "escape hatch". That might be
> considered "bad design" by some; it might not be by others. Seen this
> way, NT is one massive escape hatch, with pretty much everything being
> an object with a handle.
>
> But! Maybe this is nonetheless an interesting design avenue to
> explore. The introduction of pidfd is sort of just the "beginning" of
> that kind of design.
>
> Is any of this interesting to you as a future of privilege escalation
> and management on Linux?
TL;DR - We have plenty of flayed cats.
My brief analysis of your proposal doesn't lead me to think
that there's anything you couldn't already do with systemd and
an application launcher. We already have a bunch of security
mechanisms and behaviors that the masses have decided are too
complicated or dangerous to use. And some that *are* too
complicated or dangerous to use. I wouldn't see these mechanisms
as "hardening" the kernel. I would see them as complicating
what passes for the Linux security policy.
>
> Jason
^ permalink raw reply [flat|nested] 13+ messages in thread