archive mirror
 help / color / mirror / Atom feed
From: John Cotton Ericson <>
To: LKML <>
Cc: David Laight <David.Laight@ACULAB.COM>,
	Andy Lutomirski <>,
	"Jason A. Donenfeld" <>,
	Kernel Hardening <>,
	Jann Horn <>,
	Christian Brauner <>
Subject: Leveraging pidfs for process creation without fork
Date: Wed, 28 Jul 2021 12:37:57 -0400	[thread overview]
Message-ID: <> (raw)
In-Reply-To: <>


I was excited to learn about about pidfds the other day, precisely in 
hopes that it would open the door to such a "sane process creation API". 
I searched the LKML, found this thread, and now hope to rekindle the 
discussion; my apologies if there has been more discussion since that I 
missed and I am making redundant noise.


On Tue, Feb 2, 2021, at 4:23 AM, David Laight wrote:
> From: Andy Lutomirski
> > Sent: 01 February 2021 18:30
> ...
> > 2. A sane process creation API.  It would be delightful to be able to
> > create a fully-specified process without forking.  This might end up
> > being a fairly complicated project, though -- there are a lot of
> > inherited process properties to be enumerated.
> Since you are going to (eventually) load in a program image
> have to do several system calls to create the process isn't
> likely to be a problem.
> So using separate calls for each property isn't really an issue
> and solves the horrid problem of the API structure.

I definitely concur creating an embryonic process and then setting the 
properties sounds separately like the right approach. I'm no expert, but 
I gather from afar that between BPF and io_uring, plenty of people are 
investigating general methods of batched/pipelined communication with 
the kernel, and so there's little reason to go around making more ad-hoc 
mammoth syscalls for specific sets of tasks.


> So you could create an embryonic process that inherits a lot
> of stuff from the current process, the do actions that
> sort out the fds, argv, namespace etc.
> Finally running the new program.

All that sounds good, but I wonder if it would be possible to have a 
flag such that inheritance (where practical) would *not* be the default 
for new processes. I'm convinced that better security will always be an 
uphill battle until privileges/capabilities/resources are *not* shared 
by default. Only when more sharing requires monotonically more 
programmer effort will productivity/laziness align with the principle of 
least privilege.

With fork/exec, there's no good way to achieve this, I think it's safe 
to say. But with the embryonic processes method, where one has the 
ability to e.g. set/unset file descriptors on the embryo under 
construction, it seems quite natural.

This is one wrinkle of interface evolution --- as new sandboxing 
mechanisms / namespaces are created, we would either need to create 
yet-new "no really, default no-share" flags, or arguably be causing API 
breakage as previously "leaking" privileges are patched up. I am hopeful 
that either having versioned flags, or thoroughly documenting up-front 
that the exact behavior is subject to change as "leaks are plugged" is 
OK, but I recognize that the former might be too much complexity and the 
latter to weasel-wordy, and therefore the whole idea of "opt-in sharing 
only" will have to wait.


The security <-> ergonomics aspect is the main point of interest for me, 
but there a few random ideas:

1. I originally thought an fd to an embryonic process should in fact 
point to the task_struct rather than pid, since there is no risk of the 
data becoming useless asynchronously --- an embryonic process is never 
scheduled and cannot do anything like exiting on it's own. But there is 
no reason an embryonic process need start with just one thread, so 
allowing entire embryonic thread groups might actually be virtuous. I 
don't know for sure, but I figure in that case it is simpler to just 
stick with the pid indirection.

2. Embryonic processes can be "forked at rest" (i.e. just duplicated), 
which would allow a regime where they are used as templates for process 
creation, duplicated ("forked at rest"), and sent around for other tasks 
to spawn processes themselves. If my idea for "opt-in sharing only" 
fails per the above, sending around an "as isolated as possible" embryo 
template could be a decent fallback.

That's all I got. I hope continuing this design process is of interest 
to others.



  reply	other threads:[~2021-07-28 16:38 UTC|newest]

Thread overview: 13+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2021-02-01 17:47 forkat(int pidfd), execveat(int pidfd), other awful things? Jason A. Donenfeld
2021-02-01 17:51 ` Jason A. Donenfeld
2021-02-01 18:20 ` Christian Brauner
2021-02-01 18:29 ` Andy Lutomirski
2021-02-02  9:23   ` David Laight
2021-07-28 16:37     ` John Cotton Ericson [this message]
2021-07-29 14:24       ` Leveraging pidfs for process creation without fork Christian Brauner
2021-07-29 14:54         ` John Ericson
2021-07-30  1:41         ` Al Viro
     [not found]           ` <>
2021-07-31 22:42             ` Al Viro
2021-08-02 12:19               ` Christian Brauner
2021-08-03  6:00                 ` John Cotton Ericson
2021-02-01 18:32 ` forkat(int pidfd), execveat(int pidfd), other awful things? Casey Schaufler

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \ \ \
    --cc=David.Laight@ACULAB.COM \ \ \ \ \ \ \

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).